Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
12 by bhavnicksm | 1 comments on Hacker News.
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/sn1FdWw Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/sn1FdWw Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications? 1 https://ift.tt/HRWXe5c 12 Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
12 by bhavnicksm | 1 comments on Hacker News.
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/sn1FdWw Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/sn1FdWw Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications? 1 https://ift.tt/HRWXe5c 12 Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
Comments
Post a Comment