Commit c7ce8a
2024-08-19 16:21:57 admin: -/-/dev/null .. compress api.md | |
@@ 0,0 1,169 @@ | |
+ | # LLMLingua-2 API Documentation |
+ | |
+ | ## Overview |
+ | |
+ | This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability. |
+ | |
+ | ## Base URL |
+ | |
+ | `ws://compress.ai-now.space/queue/join` |
+ | |
+ | ## WebSocket Connection |
+ | |
+ | The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API. |
+ | |
+ | ## Authentication |
+ | |
+ | No authentication is required for this API. |
+ | |
+ | ## API Methods |
+ | |
+ | ### Compress Text |
+ | |
+ | Compresses a given text prompt using the LLMLingua-2 model. |
+ | |
+ | #### Message Flow |
+ | |
+ | 1. **Connection Established** |
+ | - The server sends a message with `msg: "send_hash"`. |
+ | - Client should respond with a session hash. |
+ | |
+ | 2. **Send Data Request** |
+ | - The server sends a message with `msg: "send_data"`. |
+ | - Client should send the compression parameters. |
+ | |
+ | 3. **Process Completed** |
+ | - The server sends a message with `msg: "process_completed"` and the compressed text. |
+ | |
+ | #### Request Format |
+ | |
+ | ```json |
+ | { |
+ | "data": [ |
+ | ["<original_text>"], |
+ | <compression_rate>, |
+ | ["<force_token1>", "<force_token2>", ...] |
+ | ], |
+ | "session_hash": "<session_hash>", |
+ | "fn_index": 0 |
+ | } |
+ | ``` |
+ | |
+ | - `original_text`: The text to be compressed. |
+ | - `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate. |
+ | - `force_tokens`: An array of tokens to be preserved during compression. |
+ | - `session_hash`: A unique identifier for the session. |
+ | |
+ | #### Response Format |
+ | |
+ | ```json |
+ | { |
+ | "msg": "process_completed", |
+ | "output": { |
+ | "data": ["<compressed_text>"] |
+ | } |
+ | } |
+ | ``` |
+ | |
+ | - `compressed_text`: The resulting compressed text. |
+ | |
+ | ## Error Handling |
+ | |
+ | The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately. |
+ | |
+ | ## Detailed Functionality |
+ | |
+ | ### Purpose |
+ | The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability. |
+ | |
+ | ### Underlying Technology |
+ | - Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank" |
+ | - Employs the PromptCompressor class from the llmlingua library |
+ | - Built using Gradio, a Python library for creating web-based interfaces for machine learning models |
+ | |
+ | ### Main Functionality (compress function) |
+ | |
+ | #### Input: |
+ | - `original_prompt`: The text to be compressed |
+ | - `compression_rate`: A value between 0.1 and 1.0 that determines compression level |
+ | - `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve |
+ | - `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines) |
+ | |
+ | #### Process: |
+ | - Uses PromptCompressor to compress the input text |
+ | - Applies the specified compression rate |
+ | - Preserves the specified force_tokens |
+ | - Uses chunk_end_tokens for appropriate text splitting |
+ | - Avoids dropping consecutive important parts of the text |
+ | |
+ | #### Output: |
+ | - Returns the compressed version of the input text |
+ | - Prints the runtime of the compression process |
+ | |
+ | ### API Workflow |
+ | 1. Connection Establishment |
+ | 2. Session Initialization |
+ | 3. Data Submission |
+ | 4. Processing (queued) |
+ | 5. Result Delivery |
+ | |
+ | ### Additional Features |
+ | - Token Counting: Uses tiktoken library for GPT-4 token usage reference |
+ | - Customizable Compression: Adjustable compression rate and preservable tokens |
+ | - Queue System: Manages multiple requests for fair processing order |
+ | |
+ | ### User Interface (Optional) |
+ | [Compress.ai-now.space](https://Compress.ai-now.space) |
+ | Includes a Gradio interface for direct user interaction with: |
+ | - Input boxes for original and compressed text |
+ | - Sliders and dropdowns for compression parameters |
+ | - Compression trigger button |
+ | |
+ | ### Scalability |
+ | Designed to handle multiple requests through a queue system (max size: 100) |
+ | |
+ | ## Example Usage |
+ | |
+ | ```javascript |
+ | const socket = new WebSocket('ws://compress.ai-now.space/queue/join'); |
+ | |
+ | socket.addEventListener('open', (event) => { |
+ | console.log('WebSocket connection established'); |
+ | }); |
+ | |
+ | socket.addEventListener('message', (event) => { |
+ | const data = JSON.parse(event.data); |
+ | |
+ | if (data.msg === "send_hash") { |
+ | socket.send(JSON.stringify({ |
+ | session_hash: "unique_session_hash", |
+ | fn_index: 0 |
+ | })); |
+ | } else if (data.msg === "send_data") { |
+ | socket.send(JSON.stringify({ |
+ | data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]], |
+ | session_hash: "unique_session_hash", |
+ | fn_index: 0 |
+ | })); |
+ | } else if (data.msg === "process_completed") { |
+ | console.log("Compressed text:", data.output.data[0]); |
+ | } |
+ | }); |
+ | |
+ | socket.addEventListener('error', (error) => { |
+ | console.error('WebSocket error:', error); |
+ | }); |
+ | |
+ | socket.addEventListener('close', (event) => { |
+ | console.log('WebSocket connection closed'); |
+ | }); |
+ | ``` |
+ | |
+ | ## Notes |
+ | |
+ | - The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed. |
+ | - The compression rate affects the level of text reduction. A lower rate results in more aggressive compression. |
+ | - Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained. |
+ | - This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models. |
+ | - The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs. |
+ | ``` |