LLMLingua-2 API Documentation

Overview

This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.

Base URL

ws://compress.ai-now.space/queue/join

WebSocket Connection

The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.

Authentication

No authentication is required for this API.

API Methods

Compress Text

Compresses a given text prompt using the LLMLingua-2 model.

Message Flow

  1. Connection Established

    • The server sends a message with msg: "send_hash".
    • Client should respond with a session hash.
  2. Send Data Request

    • The server sends a message with msg: "send_data".
    • Client should send the compression parameters.
  3. Process Completed

    • The server sends a message with msg: "process_completed" and the compressed text.

Request Format

{
  "data": [
    ["<original_text>"],
    "compression_rate": "<compression_rate>",
    ["<force_token1>", "<force_token2>", ...]
  ],
  "session_hash": "<session_hash>",
  "fn_index": 0
}
  • original_text: The text to be compressed.
  • compression_rate: A float between 0.1 and 1.0 representing the desired compression rate.
  • force_tokens: An array of tokens to be preserved during compression.
  • session_hash: A unique identifier for the session.

Response Format

{
  "msg": "process_completed",
  "output": {
    "data": ["<compressed_text>"]
  }
}
  • compressed_text: The resulting compressed text.

Error Handling

The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.

Detailed Functionality

Purpose

The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.

Underlying Technology

  • Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
  • Employs the PromptCompressor class from the llmlingua library
  • Built using Gradio, a Python library for creating web-based interfaces for machine learning models

Main Functionality (compress function)

Input:

  • original_prompt: The text to be compressed
  • compression_rate: A value between 0.1 and 1.0 that determines compression level
  • force_tokens: A list of tokens (e.g., punctuation marks) to preserve
  • chunk_end_tokens: Tokens indicating where text chunks can be split (default: periods and newlines)

Process:

  • Uses PromptCompressor to compress the input text
  • Applies the specified compression rate
  • Preserves the specified force_tokens
  • Uses chunk_end_tokens for appropriate text splitting
  • Avoids dropping consecutive important parts of the text

Output:

  • Returns the compressed version of the input text
  • Prints the runtime of the compression process

API Workflow

  1. Connection Establishment
  2. Session Initialization
  3. Data Submission
  4. Processing (queued)
  5. Result Delivery

Additional Features

  • Token Counting: Uses tiktoken library for GPT-4 token usage reference
  • Customizable Compression: Adjustable compression rate and preservable tokens
  • Queue System: Manages multiple requests for fair processing order

User Interface (Optional)

Compress.ai-now.space Includes a Gradio interface for direct user interaction with:

  • Input boxes for original and compressed text
  • Sliders and dropdowns for compression parameters
  • Compression trigger button

Scalability

Designed to handle multiple requests through a queue system (max size: 100)

Example Usage

const socket = new WebSocket('ws://compress.ai-now.space/queue/join');

socket.addEventListener('open', (event) => {
  console.log('WebSocket connection established');
});

socket.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);

  if (data.msg === "send_hash") {
    socket.send(JSON.stringify({
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "send_data") {
    socket.send(JSON.stringify({
      data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "process_completed") {
    console.log("Compressed text:", data.output.data[0]);
  }
});

socket.addEventListener('error', (error) => {
  console.error('WebSocket error:', error);
});

socket.addEventListener('close', (event) => {
  console.log('WebSocket connection closed');
});

Notes

  • The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
  • The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
  • Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
  • This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
  • The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.