Compress Api - blame None

Blame

c7ce8a	admin	2024-08-19 16:21:57	1	# LLMLingua-2 API Documentation
			2
			3	## Overview
			4
			5	This API provides a text compression service using the LLMLingua-2 model. It allows users to compress text prompts while preserving specified tokens and maintaining readability.
			6
			7	## Base URL
			8
			9	`ws://compress.ai-now.space/queue/join`
			10
			11	## WebSocket Connection
			12
			13	The API uses WebSocket for real-time communication. Connect to the WebSocket endpoint to interact with the API.
			14
			15	## Authentication
			16
			17	No authentication is required for this API.
			18
			19	## API Methods
			20
			21	### Compress Text
			22
			23	Compresses a given text prompt using the LLMLingua-2 model.
			24
			25	#### Message Flow
			26
			27	1. Connection Established
			28	- The server sends a message with `msg: "send_hash"`.
			29	- Client should respond with a session hash.
			30
			31	2. Send Data Request
			32	- The server sends a message with `msg: "send_data"`.
			33	- Client should send the compression parameters.
			34
			35	3. Process Completed
			36	- The server sends a message with `msg: "process_completed"` and the compressed text.
			37
			38	#### Request Format
			39
			40	```json
			41	{
			42	"data": [
			43	["<original_text>"],
dbf0b5	admin	2024-08-19 16:22:53	44	"compression_rate": "<compression_rate>",
c7ce8a	admin	2024-08-19 16:21:57	45	["<force_token1>", "<force_token2>", ...]
			46	],
			47	"session_hash": "<session_hash>",
			48	"fn_index": 0
			49	}
			50	```
			51
			52	- `original_text`: The text to be compressed.
			53	- `compression_rate`: A float between 0.1 and 1.0 representing the desired compression rate.
			54	- `force_tokens`: An array of tokens to be preserved during compression.
			55	- `session_hash`: A unique identifier for the session.
			56
			57	#### Response Format
			58
			59	```json
			60	{
			61	"msg": "process_completed",
			62	"output": {
			63	"data": ["<compressed_text>"]
			64	}
			65	}
			66	```
			67
			68	- `compressed_text`: The resulting compressed text.
			69
			70	## Error Handling
			71
			72	The API may emit error events through the WebSocket connection. Clients should listen for and handle these error events appropriately.
			73
			74	## Detailed Functionality
			75
			76	### Purpose
			77	The primary function of this API is to compress text using the LLMLingua-2 model. It's designed to reduce the length of a given text while maintaining its core meaning and readability.
			78
			79	### Underlying Technology
			80	- Uses the LLMLingua-2 model: "microsoft/llmlingua-2-xlm-roberta-large-meetingbank"
			81	- Employs the PromptCompressor class from the llmlingua library
			82	- Built using Gradio, a Python library for creating web-based interfaces for machine learning models
			83
			84	### Main Functionality (compress function)
			85
			86	#### Input:
			87	- `original_prompt`: The text to be compressed
			88	- `compression_rate`: A value between 0.1 and 1.0 that determines compression level
			89	- `force_tokens`: A list of tokens (e.g., punctuation marks) to preserve
			90	- `chunk_end_tokens`: Tokens indicating where text chunks can be split (default: periods and newlines)
			91
			92	#### Process:
			93	- Uses PromptCompressor to compress the input text
			94	- Applies the specified compression rate
			95	- Preserves the specified force_tokens
			96	- Uses chunk_end_tokens for appropriate text splitting
			97	- Avoids dropping consecutive important parts of the text
			98
			99	#### Output:
			100	- Returns the compressed version of the input text
			101	- Prints the runtime of the compression process
			102
			103	### API Workflow
			104	1. Connection Establishment
			105	2. Session Initialization
			106	3. Data Submission
			107	4. Processing (queued)
			108	5. Result Delivery
			109
			110	### Additional Features
			111	- Token Counting: Uses tiktoken library for GPT-4 token usage reference
			112	- Customizable Compression: Adjustable compression rate and preservable tokens
			113	- Queue System: Manages multiple requests for fair processing order
			114
			115	### User Interface (Optional)
			116	[Compress.ai-now.space](https://Compress.ai-now.space)
			117	Includes a Gradio interface for direct user interaction with:
			118	- Input boxes for original and compressed text
			119	- Sliders and dropdowns for compression parameters
			120	- Compression trigger button
			121
			122	### Scalability
			123	Designed to handle multiple requests through a queue system (max size: 100)
			124
			125	## Example Usage
			126
			127	```javascript
			128	const socket = new WebSocket('ws://compress.ai-now.space/queue/join');
			129
			130	socket.addEventListener('open', (event) => {
			131	console.log('WebSocket connection established');
			132	});
			133
			134	socket.addEventListener('message', (event) => {
			135	const data = JSON.parse(event.data);
			136
			137	if (data.msg === "send_hash") {
			138	socket.send(JSON.stringify({
			139	session_hash: "unique_session_hash",
			140	fn_index: 0
			141	}));
			142	} else if (data.msg === "send_data") {
			143	socket.send(JSON.stringify({
			144	data: [["Text to compress"], 0.7, ["\\n", ".", "!", "?", ","]],
			145	session_hash: "unique_session_hash",
			146	fn_index: 0
			147	}));
			148	} else if (data.msg === "process_completed") {
			149	console.log("Compressed text:", data.output.data[0]);
			150	}
			151	});
			152
			153	socket.addEventListener('error', (error) => {
			154	console.error('WebSocket error:', error);
			155	});
			156
			157	socket.addEventListener('close', (event) => {
			158	console.log('WebSocket connection closed');
			159	});
			160	```
			161
			162	## Notes
			163
			164	- The API uses a queue system to manage requests. Clients may need to wait in the queue before their request is processed.
			165	- The compression rate affects the level of text reduction. A lower rate results in more aggressive compression.
			166	- Force tokens are preserved in the compressed output, ensuring important elements like newlines and punctuation are retained.
			167	- This API is suitable for various applications, including chatbots, content summarization tools, and data preprocessing for large language models.
			168	- The flexibility in compression parameters and real-time processing make it adaptable to a wide range of text processing needs.
			169	```