# LLMLingua-2 API Documentation
## Overview
This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.
## Base URL
`ws://compress.ai-now.space/queue/join`
## WebSocket Connection
The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.
## Authentication
No authentication is required for this API.
## API Methods
### Compress Text
Compresses a given text prompt using the LLMLingua-2 model.
#### Message Flow
1. **Connection Established**
- The server sends a message with `msg: "send_hash"`.
- Client should respond with a session hash.
2. **Send Data Request**
- The server sends a message with `msg: "send_data"`.
- Client should send the compression parameters.
3. **Process Completed**
- The server sends a message with `msg: "process_completed"` and the compressed text.
#### Request Format
```json
{
"data": [
["<original_text>"],
"compression_rate": "<compression_rate>",
["<force_token1>", "<force_token2>", ...]
],
"session_hash": "<session_hash>",
"fn_index": 0
}
```
- `original_text`: The text to be compressed.
- `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate.
- `force_tokens`: An array of tokens to be preserved during compression.
- `session_hash`: A unique identifier for the session.
#### Response Format
```json
{
"msg": "process_completed",
"output": {
"data": ["<compressed_text>"]
}
}
```
- `compressed_text`: The resulting compressed text.
## Error Handling
The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.
## Detailed Functionality
### Purpose
The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.
### Underlying Technology
- Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
- Employs the PromptCompressor class from the llmlingua library
- Built using Gradio, a Python library for creating web-based interfaces for machine learning models
### Main Functionality (compress function)
#### Input:
- `original_prompt`: The text to be compressed
- `compression_rate`: A value between 0.1 and 1.0 that determines compression level
- `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve
- `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines)
#### Process:
- Uses PromptCompressor to compress the input text
- Applies the specified compression rate
- Preserves the specified force_tokens
- Uses chunk_end_tokens for appropriate text splitting
- Avoids dropping consecutive important parts of the text
#### Output:
- Returns the compressed version of the input text
- Prints the runtime of the compression process
### API Workflow
1. Connection Establishment
2. Session Initialization
3. Data Submission
4. Processing (queued)
5. Result Delivery
### Additional Features
- Token Counting: Uses tiktoken library for GPT-4 token usage reference
- Customizable Compression: Adjustable compression rate and preservable tokens
- Queue System: Manages multiple requests for fair processing order
### User Interface (Optional)
[Compress.ai-now.space](https://Compress.ai-now.space)
Includes a Gradio interface for direct user interaction with:
- Input boxes for original and compressed text
- Sliders and dropdowns for compression parameters
- Compression trigger button
### Scalability
Designed to handle multiple requests through a queue system (max size: 100)
## Example Usage
```javascript
const socket = new WebSocket('ws://compress.ai-now.space/queue/join');
socket.addEventListener('open', (event) => {
console.log('WebSocket connection established');
});
socket.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
if (data.msg === "send_hash") {
socket.send(JSON.stringify({
session_hash: "unique_session_hash",
fn_index: 0
}));
} else if (data.msg === "send_data") {
socket.send(JSON.stringify({
data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
session_hash: "unique_session_hash",
fn_index: 0
}));
} else if (data.msg === "process_completed") {
console.log("Compressed text:", data.output.data[0]);
}
});
socket.addEventListener('error', (error) => {
console.error('WebSocket error:', error);
});
socket.addEventListener('close', (event) => {
console.log('WebSocket connection closed');
});
```
## Notes
- The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
- The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
- Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
- This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
- The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.
```