Decision Workspace
segtok vs charabia vs text-splitter
Side-by-side comparison of Rust crates
45
segtok
growingv0.1.5
Sentence segmentation and word tokenization tools
62
charabia
growingv0.9.9
A simple library to detect the language, tokenize the text and normalize the tokens
59
text-splitter
growingv0.29.3
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
Core Metrics
| segtok | charabia | text-splitter | |
|---|---|---|---|
| Health Score | 45 | 62 | 59 |
| Total Downloads | 275.0K | 761.2K | 1.1M |
| 30d Downloads | 183.2K | 83.1K | 113.6K |
| Dependents | 10 | 140 | 654 |
| Releases | 6 | 31 | 60 |
| Last Updated | 402d ago | 123d ago | 87d ago |
| Age | 1y 2m | 3y 11m | 2y 10m |
Health Breakdown
segtok
Maintenance
8
Quality
13
Community
8
Popularity
6
Documentation
10
charabia
Maintenance
16
Quality
13
Community
13
Popularity
7
Documentation
13
text-splitter
Maintenance
14
Quality
13
Community
13
Popularity
7
Documentation
12
Technical Details
| segtok | charabia | text-splitter | |
|---|---|---|---|
| Version | 0.1.5 | 0.9.9 | 0.29.3 |
| Stable (≥1.0) | ✗ No | ✗ No | ✗ No |
| License | MIT | MIT | MIT |
| Dependencies | 8 | 18 | 21 |
| Crate Size | 36KB | 1.1MB | 59KB |
| Features | 0 | 20 | 4 |
| Yanked % | 0.0% | 3.2% | 1.7% |
| Edition | 2021 | 2021 | 2021 |
| MSRV | — | — | 1.83.0 |
| Owners | 1 | 2 | 1 |
Links
Quick Verdict
- •charabia leads with a health score of 62/100, but none of the options score above 80.
- •text-splitter has the most downloads (1.1M), suggesting wider adoption.
- •text-splitter is depended on by 654 crates — strongest ecosystem trust.
- •⚠ segtok has not been updated in over a year.