LLMLingua-2 API Documentation

Overview

This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.

Base URL

ws://compress.ai-now.space/queue/join

WebSocket Connection

The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.

Authentication

No authentication is required for this API.

API Methods

Compress Text

Compresses a given text prompt using the LLMLingua-2 model.

Message Flow

Connection Established
- The server sends a message with msg: "send_hash".
- Client should respond with a session hash.
Send Data Request
- The server sends a message with msg: "send_data".
- Client should send the compression parameters.
Process Completed
- The server sends a message with msg: "process_completed" and the compressed text.

Request Format

{
  "data": [
    ["<original_text>"],
    "compression_rate": "<compression_rate>",
    ["<force_token1>", "<force_token2>", ...]
  ],
  "session_hash": "<session_hash>",
  "fn_index": 0
}

original_text: The text to be compressed.
compression_rate: A float between 0.1 and 1.0 representing the desired compression rate.
force_tokens: An array of tokens to be preserved during compression.
session_hash: A unique identifier for the session.

Response Format

{
  "msg": "process_completed",
  "output": {
    "data": ["<compressed_text>"]
  }
}

compressed_text: The resulting compressed text.

Error Handling

The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.

Detailed Functionality

Purpose

The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.

Underlying Technology

Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
Employs the PromptCompressor class from the llmlingua library
Built using Gradio, a Python library for creating web-based interfaces for machine learning models

Main Functionality (compress function)

Input:

original_prompt: The text to be compressed
compression_rate: A value between 0.1 and 1.0 that determines compression level
force_tokens: A list of tokens (e.g., punctuation marks) to preserve
chunk_end_tokens: Tokens indicating where text chunks can be split (default: periods and newlines)

Process:

Uses PromptCompressor to compress the input text
Applies the specified compression rate
Preserves the specified force_tokens
Uses chunk_end_tokens for appropriate text splitting
Avoids dropping consecutive important parts of the text

Output:

Returns the compressed version of the input text
Prints the runtime of the compression process

API Workflow

Connection Establishment
Session Initialization
Data Submission
Processing (queued)
Result Delivery

Additional Features

Token Counting: Uses tiktoken library for GPT-4 token usage reference
Customizable Compression: Adjustable compression rate and preservable tokens
Queue System: Manages multiple requests for fair processing order

User Interface (Optional)

Compress.ai-now.space Includes a Gradio interface for direct user interaction with:

Input boxes for original and compressed text
Sliders and dropdowns for compression parameters
Compression trigger button

Scalability

Designed to handle multiple requests through a queue system (max size: 100)

Example Usage

const socket = new WebSocket('ws://compress.ai-now.space/queue/join');

socket.addEventListener('open', (event) => {
  console.log('WebSocket connection established');
});

socket.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);

  if (data.msg === "send_hash") {
    socket.send(JSON.stringify({
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "send_data") {
    socket.send(JSON.stringify({
      data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "process_completed") {
    console.log("Compressed text:", data.output.data[0]);
  }
});

socket.addEventListener('error', (error) => {
  console.error('WebSocket error:', error);
});

socket.addEventListener('close', (event) => {
  console.log('WebSocket connection closed');
});

Notes

The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.