# LLMLingua-2 API Documentation

## Overview

This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.

## Base URL

`ws://compress.ai-now.space/queue/join`

## WebSocket Connection

The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.

## Authentication

No authentication is required for this API.

## API Methods

### Compress Text

Compresses a given text prompt using the LLMLingua-2 model.

#### Message Flow

1. **Connection Established**
   - The server sends a message with `msg: "send_hash"`.
   - Client should respond with a session hash.

2. **Send Data Request**
   - The server sends a message with `msg: "send_data"`.
   - Client should send the compression parameters.

3. **Process Completed**
   - The server sends a message with `msg: "process_completed"` and the compressed text.

#### Request Format

```json
{
  "data": [
    ["<original_text>"],
    "compression_rate": "<compression_rate>",
    ["<force_token1>", "<force_token2>", ...]
  ],
  "session_hash": "<session_hash>",
  "fn_index": 0
}
```

- `original_text`: The text to be compressed.
- `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate.
- `force_tokens`: An array of tokens to be preserved during compression.
- `session_hash`: A unique identifier for the session.

#### Response Format

```json
{
  "msg": "process_completed",
  "output": {
    "data": ["<compressed_text>"]
  }
}
```

- `compressed_text`: The resulting compressed text.

## Error Handling

The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.

## Detailed Functionality

### Purpose
The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.

### Underlying Technology
- Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
- Employs the PromptCompressor class from the llmlingua library
- Built using Gradio, a Python library for creating web-based interfaces for machine learning models

### Main Functionality (compress function)

#### Input:
- `original_prompt`: The text to be compressed
- `compression_rate`: A value between 0.1 and 1.0 that determines compression level
- `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve
- `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines)

#### Process:
- Uses PromptCompressor to compress the input text
- Applies the specified compression rate
- Preserves the specified force_tokens
- Uses chunk_end_tokens for appropriate text splitting
- Avoids dropping consecutive important parts of the text

#### Output:
- Returns the compressed version of the input text
- Prints the runtime of the compression process

### API Workflow
1. Connection Establishment
2. Session Initialization
3. Data Submission
4. Processing (queued)
5. Result Delivery

### Additional Features
- Token Counting: Uses tiktoken library for GPT-4 token usage reference
- Customizable Compression: Adjustable compression rate and preservable tokens
- Queue System: Manages multiple requests for fair processing order

### User Interface (Optional)
[Compress.ai-now.space](https://Compress.ai-now.space)
Includes a Gradio interface for direct user interaction with:
- Input boxes for original and compressed text
- Sliders and dropdowns for compression parameters
- Compression trigger button

### Scalability
Designed to handle multiple requests through a queue system (max size: 100)

## Example Usage

```javascript
const socket = new WebSocket('ws://compress.ai-now.space/queue/join');

socket.addEventListener('open', (event) => {
  console.log('WebSocket connection established');
});

socket.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  
  if (data.msg === "send_hash") {
    socket.send(JSON.stringify({
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "send_data") {
    socket.send(JSON.stringify({
      data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
      session_hash: "unique_session_hash",
      fn_index: 0
    }));
  } else if (data.msg === "process_completed") {
    console.log("Compressed text:", data.output.data[0]);
  }
});

socket.addEventListener('error', (error) => {
  console.error('WebSocket error:', error);
});

socket.addEventListener('close', (event) => {
  console.log('WebSocket connection closed');
});
```

## Notes

- The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
- The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
- Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
- This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
- The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.
```
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9