Blame

c7ce8a admin 2024-08-19 16:21:57 1
# LLMLingua-2 API Documentation
2
3
## Overview
4
5
This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.
6
7
## Base URL
8
9
`ws://compress.ai-now.space/queue/join`
10
11
## WebSocket Connection
12
13
The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.
14
15
## Authentication
16
17
No authentication is required for this API.
18
19
## API Methods
20
21
### Compress Text
22
23
Compresses a given text prompt using the LLMLingua-2 model.
24
25
#### Message Flow
26
27
1. **Connection Established**
28
- The server sends a message with `msg: "send_hash"`.
29
- Client should respond with a session hash.
30
31
2. **Send Data Request**
32
- The server sends a message with `msg: "send_data"`.
33
- Client should send the compression parameters.
34
35
3. **Process Completed**
36
- The server sends a message with `msg: "process_completed"` and the compressed text.
37
38
#### Request Format
39
40
```json
41
{
42
"data": [
43
["<original_text>"],
dbf0b5 admin 2024-08-19 16:22:53 44
"compression_rate": "<compression_rate>",
c7ce8a admin 2024-08-19 16:21:57 45
["<force_token1>", "<force_token2>", ...]
46
],
47
"session_hash": "<session_hash>",
48
"fn_index": 0
49
}
50
```
51
52
- `original_text`: The text to be compressed.
53
- `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate.
54
- `force_tokens`: An array of tokens to be preserved during compression.
55
- `session_hash`: A unique identifier for the session.
56
57
#### Response Format
58
59
```json
60
{
61
"msg": "process_completed",
62
"output": {
63
"data": ["<compressed_text>"]
64
}
65
}
66
```
67
68
- `compressed_text`: The resulting compressed text.
69
70
## Error Handling
71
72
The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.
73
74
## Detailed Functionality
75
76
### Purpose
77
The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.
78
79
### Underlying Technology
80
- Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
81
- Employs the PromptCompressor class from the llmlingua library
82
- Built using Gradio, a Python library for creating web-based interfaces for machine learning models
83
84
### Main Functionality (compress function)
85
86
#### Input:
87
- `original_prompt`: The text to be compressed
88
- `compression_rate`: A value between 0.1 and 1.0 that determines compression level
89
- `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve
90
- `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines)
91
92
#### Process:
93
- Uses PromptCompressor to compress the input text
94
- Applies the specified compression rate
95
- Preserves the specified force_tokens
96
- Uses chunk_end_tokens for appropriate text splitting
97
- Avoids dropping consecutive important parts of the text
98
99
#### Output:
100
- Returns the compressed version of the input text
101
- Prints the runtime of the compression process
102
103
### API Workflow
104
1. Connection Establishment
105
2. Session Initialization
106
3. Data Submission
107
4. Processing (queued)
108
5. Result Delivery
109
110
### Additional Features
111
- Token Counting: Uses tiktoken library for GPT-4 token usage reference
112
- Customizable Compression: Adjustable compression rate and preservable tokens
113
- Queue System: Manages multiple requests for fair processing order
114
115
### User Interface (Optional)
116
[Compress.ai-now.space](https://Compress.ai-now.space)
117
Includes a Gradio interface for direct user interaction with:
118
- Input boxes for original and compressed text
119
- Sliders and dropdowns for compression parameters
120
- Compression trigger button
121
122
### Scalability
123
Designed to handle multiple requests through a queue system (max size: 100)
124
125
## Example Usage
126
127
```javascript
128
const socket = new WebSocket('ws://compress.ai-now.space/queue/join');
129
130
socket.addEventListener('open', (event) => {
131
console.log('WebSocket connection established');
132
});
133
134
socket.addEventListener('message', (event) => {
135
const data = JSON.parse(event.data);
136
137
if (data.msg === "send_hash") {
138
socket.send(JSON.stringify({
139
session_hash: "unique_session_hash",
140
fn_index: 0
141
}));
142
} else if (data.msg === "send_data") {
143
socket.send(JSON.stringify({
144
data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
145
session_hash: "unique_session_hash",
146
fn_index: 0
147
}));
148
} else if (data.msg === "process_completed") {
149
console.log("Compressed text:", data.output.data[0]);
150
}
151
});
152
153
socket.addEventListener('error', (error) => {
154
console.error('WebSocket error:', error);
155
});
156
157
socket.addEventListener('close', (event) => {
158
console.log('WebSocket connection closed');
159
});
160
```
161
162
## Notes
163
164
- The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
165
- The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
166
- Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
167
- This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
168
- The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.
169
```
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9