Commit c7ce8a

2024-08-19 16:21:57 admin: -/-
/dev/null .. compress api.md
@@ 0,0 1,169 @@
+ # LLMLingua-2 API Documentation
+
+ ## Overview
+
+ This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.
+
+ ## Base URL
+
+ `ws://compress.ai-now.space/queue/join`
+
+ ## WebSocket Connection
+
+ The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.
+
+ ## Authentication
+
+ No authentication is required for this API.
+
+ ## API Methods
+
+ ### Compress Text
+
+ Compresses a given text prompt using the LLMLingua-2 model.
+
+ #### Message Flow
+
+ 1. **Connection Established**
+ - The server sends a message with `msg: "send_hash"`.
+ - Client should respond with a session hash.
+
+ 2. **Send Data Request**
+ - The server sends a message with `msg: "send_data"`.
+ - Client should send the compression parameters.
+
+ 3. **Process Completed**
+ - The server sends a message with `msg: "process_completed"` and the compressed text.
+
+ #### Request Format
+
+ ```json
+ {
+ "data": [
+ ["<original_text>"],
+ <compression_rate>,
+ ["<force_token1>", "<force_token2>", ...]
+ ],
+ "session_hash": "<session_hash>",
+ "fn_index": 0
+ }
+ ```
+
+ - `original_text`: The text to be compressed.
+ - `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate.
+ - `force_tokens`: An array of tokens to be preserved during compression.
+ - `session_hash`: A unique identifier for the session.
+
+ #### Response Format
+
+ ```json
+ {
+ "msg": "process_completed",
+ "output": {
+ "data": ["<compressed_text>"]
+ }
+ }
+ ```
+
+ - `compressed_text`: The resulting compressed text.
+
+ ## Error Handling
+
+ The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.
+
+ ## Detailed Functionality
+
+ ### Purpose
+ The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.
+
+ ### Underlying Technology
+ - Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
+ - Employs the PromptCompressor class from the llmlingua library
+ - Built using Gradio, a Python library for creating web-based interfaces for machine learning models
+
+ ### Main Functionality (compress function)
+
+ #### Input:
+ - `original_prompt`: The text to be compressed
+ - `compression_rate`: A value between 0.1 and 1.0 that determines compression level
+ - `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve
+ - `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines)
+
+ #### Process:
+ - Uses PromptCompressor to compress the input text
+ - Applies the specified compression rate
+ - Preserves the specified force_tokens
+ - Uses chunk_end_tokens for appropriate text splitting
+ - Avoids dropping consecutive important parts of the text
+
+ #### Output:
+ - Returns the compressed version of the input text
+ - Prints the runtime of the compression process
+
+ ### API Workflow
+ 1. Connection Establishment
+ 2. Session Initialization
+ 3. Data Submission
+ 4. Processing (queued)
+ 5. Result Delivery
+
+ ### Additional Features
+ - Token Counting: Uses tiktoken library for GPT-4 token usage reference
+ - Customizable Compression: Adjustable compression rate and preservable tokens
+ - Queue System: Manages multiple requests for fair processing order
+
+ ### User Interface (Optional)
+ [Compress.ai-now.space](https://Compress.ai-now.space)
+ Includes a Gradio interface for direct user interaction with:
+ - Input boxes for original and compressed text
+ - Sliders and dropdowns for compression parameters
+ - Compression trigger button
+
+ ### Scalability
+ Designed to handle multiple requests through a queue system (max size: 100)
+
+ ## Example Usage
+
+ ```javascript
+ const socket = new WebSocket('ws://compress.ai-now.space/queue/join');
+
+ socket.addEventListener('open', (event) => {
+ console.log('WebSocket connection established');
+ });
+
+ socket.addEventListener('message', (event) => {
+ const data = JSON.parse(event.data);
+
+ if (data.msg === "send_hash") {
+ socket.send(JSON.stringify({
+ session_hash: "unique_session_hash",
+ fn_index: 0
+ }));
+ } else if (data.msg === "send_data") {
+ socket.send(JSON.stringify({
+ data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
+ session_hash: "unique_session_hash",
+ fn_index: 0
+ }));
+ } else if (data.msg === "process_completed") {
+ console.log("Compressed text:", data.output.data[0]);
+ }
+ });
+
+ socket.addEventListener('error', (error) => {
+ console.error('WebSocket error:', error);
+ });
+
+ socket.addEventListener('close', (event) => {
+ console.log('WebSocket connection closed');
+ });
+ ```
+
+ ## Notes
+
+ - The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
+ - The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
+ - Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
+ - This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
+ - The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.
+ ```
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9