Blame
c7ce8a | admin | 2024-08-19 16:21:57 | 1 | # LLMLingua-2 API Documentation |
2 | ||||
3 | ## Overview | |||
4 | ||||
5 | This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability. | |||
6 | ||||
7 | ## Base URL | |||
8 | ||||
9 | `ws://compress.ai-now.space/queue/join` | |||
10 | ||||
11 | ## WebSocket Connection | |||
12 | ||||
13 | The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API. | |||
14 | ||||
15 | ## Authentication | |||
16 | ||||
17 | No authentication is required for this API. | |||
18 | ||||
19 | ## API Methods | |||
20 | ||||
21 | ### Compress Text | |||
22 | ||||
23 | Compresses a given text prompt using the LLMLingua-2 model. | |||
24 | ||||
25 | #### Message Flow | |||
26 | ||||
27 | 1. **Connection Established** | |||
28 | - The server sends a message with `msg: "send_hash"`. | |||
29 | - Client should respond with a session hash. | |||
30 | ||||
31 | 2. **Send Data Request** | |||
32 | - The server sends a message with `msg: "send_data"`. | |||
33 | - Client should send the compression parameters. | |||
34 | ||||
35 | 3. **Process Completed** | |||
36 | - The server sends a message with `msg: "process_completed"` and the compressed text. | |||
37 | ||||
38 | #### Request Format | |||
39 | ||||
40 | ```json | |||
41 | { | |||
42 | "data": [ | |||
43 | ["<original_text>"], | |||
dbf0b5 | admin | 2024-08-19 16:22:53 | 44 | "compression_rate": "<compression_rate>", |
c7ce8a | admin | 2024-08-19 16:21:57 | 45 | ["<force_token1>", "<force_token2>", ...] |
46 | ], | |||
47 | "session_hash": "<session_hash>", | |||
48 | "fn_index": 0 | |||
49 | } | |||
50 | ``` | |||
51 | ||||
52 | - `original_text`: The text to be compressed. | |||
53 | - `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate. | |||
54 | - `force_tokens`: An array of tokens to be preserved during compression. | |||
55 | - `session_hash`: A unique identifier for the session. | |||
56 | ||||
57 | #### Response Format | |||
58 | ||||
59 | ```json | |||
60 | { | |||
61 | "msg": "process_completed", | |||
62 | "output": { | |||
63 | "data": ["<compressed_text>"] | |||
64 | } | |||
65 | } | |||
66 | ``` | |||
67 | ||||
68 | - `compressed_text`: The resulting compressed text. | |||
69 | ||||
70 | ## Error Handling | |||
71 | ||||
72 | The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately. | |||
73 | ||||
74 | ## Detailed Functionality | |||
75 | ||||
76 | ### Purpose | |||
77 | The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability. | |||
78 | ||||
79 | ### Underlying Technology | |||
80 | - Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank" | |||
81 | - Employs the PromptCompressor class from the llmlingua library | |||
82 | - Built using Gradio, a Python library for creating web-based interfaces for machine learning models | |||
83 | ||||
84 | ### Main Functionality (compress function) | |||
85 | ||||
86 | #### Input: | |||
87 | - `original_prompt`: The text to be compressed | |||
88 | - `compression_rate`: A value between 0.1 and 1.0 that determines compression level | |||
89 | - `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve | |||
90 | - `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines) | |||
91 | ||||
92 | #### Process: | |||
93 | - Uses PromptCompressor to compress the input text | |||
94 | - Applies the specified compression rate | |||
95 | - Preserves the specified force_tokens | |||
96 | - Uses chunk_end_tokens for appropriate text splitting | |||
97 | - Avoids dropping consecutive important parts of the text | |||
98 | ||||
99 | #### Output: | |||
100 | - Returns the compressed version of the input text | |||
101 | - Prints the runtime of the compression process | |||
102 | ||||
103 | ### API Workflow | |||
104 | 1. Connection Establishment | |||
105 | 2. Session Initialization | |||
106 | 3. Data Submission | |||
107 | 4. Processing (queued) | |||
108 | 5. Result Delivery | |||
109 | ||||
110 | ### Additional Features | |||
111 | - Token Counting: Uses tiktoken library for GPT-4 token usage reference | |||
112 | - Customizable Compression: Adjustable compression rate and preservable tokens | |||
113 | - Queue System: Manages multiple requests for fair processing order | |||
114 | ||||
115 | ### User Interface (Optional) | |||
116 | [Compress.ai-now.space](https://Compress.ai-now.space) | |||
117 | Includes a Gradio interface for direct user interaction with: | |||
118 | - Input boxes for original and compressed text | |||
119 | - Sliders and dropdowns for compression parameters | |||
120 | - Compression trigger button | |||
121 | ||||
122 | ### Scalability | |||
123 | Designed to handle multiple requests through a queue system (max size: 100) | |||
124 | ||||
125 | ## Example Usage | |||
126 | ||||
127 | ```javascript | |||
128 | const socket = new WebSocket('ws://compress.ai-now.space/queue/join'); | |||
129 | ||||
130 | socket.addEventListener('open', (event) => { | |||
131 | console.log('WebSocket connection established'); | |||
132 | }); | |||
133 | ||||
134 | socket.addEventListener('message', (event) => { | |||
135 | const data = JSON.parse(event.data); | |||
136 | ||||
137 | if (data.msg === "send_hash") { | |||
138 | socket.send(JSON.stringify({ | |||
139 | session_hash: "unique_session_hash", | |||
140 | fn_index: 0 | |||
141 | })); | |||
142 | } else if (data.msg === "send_data") { | |||
143 | socket.send(JSON.stringify({ | |||
144 | data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]], | |||
145 | session_hash: "unique_session_hash", | |||
146 | fn_index: 0 | |||
147 | })); | |||
148 | } else if (data.msg === "process_completed") { | |||
149 | console.log("Compressed text:", data.output.data[0]); | |||
150 | } | |||
151 | }); | |||
152 | ||||
153 | socket.addEventListener('error', (error) => { | |||
154 | console.error('WebSocket error:', error); | |||
155 | }); | |||
156 | ||||
157 | socket.addEventListener('close', (event) => { | |||
158 | console.log('WebSocket connection closed'); | |||
159 | }); | |||
160 | ``` | |||
161 | ||||
162 | ## Notes | |||
163 | ||||
164 | - The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed. | |||
165 | - The compression rate affects the level of text reduction. A lower rate results in more aggressive compression. | |||
166 | - Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained. | |||
167 | - This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models. | |||
168 | - The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs. | |||
169 | ``` |