Cao - Textual Frequency Adaptive Encoding 2.0.1

MAMatt-38•Created May 31, 2026

45 views

Instructions

Cao - Textual Frequency Adaptive Encoding v2.0.1 (Yes v2) TFAE is an algorithm optimized for text compression, based on a frequency-based representation of words (tokenization). Like v1, it compresses text using a token dictionary, but v2 introduces a simplified numerical representation and high-base encoding to significantly improve performance. This translates to: ~20% better compression than v1 ~9× faster decompression only ~30% slower compression Principle (TLDR) - The text is split into words (tokens) - Each word is replaced by a numerical ID based on its frequency (frequent words → small numbers / rare words → large numbers) - The IDs are concatenated into a numerical sequence - The sequence is split into blocks (e.g., 16 digits) - Each block is converted to base 215 - A separator (£) enables reverse decoding - Text reconstruction via the token dictionary Benchmark Note: Time and Compression Ratio values follow the rule “lower is better”. Encoding - Long Text | Time | Compression Ratio V1.0 | 128.975 s | 0.59900053 V2.0 | 176.267 s | 0.4869199 Encoding - Short Text | Time | Compression Ratio V1.0 | 0 s | 0.782643043 V2.0 | 0.511 s | 0.65978256 Decoding - Long Text | Time V1.0 | 60.645 s V2.0 | 6.585 s Decoding - Short Text | Time V1.0 | 0 s V2.0 | 0 s

Project Details

Project ID1326941629

CreatedMay 31, 2026

Last ModifiedJune 4, 2026

SharedJune 3, 2026

CommentsAllowed