Decision Workspace
language-tokenizer vs charabia vs unobtanium-segmenter
Side-by-side comparison of Rust crates
53
language-tokenizer
experimentalv0.3.0
Text tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.
60
charabia
growingv0.9.9
A simple library to detect the language, tokenize the text and normalize the tokens
48
unobtanium-segmenter
experimentalv0.5.2
A text segmentation toolbox for search applications inspired by charabia and tantivy.
Core Metrics
| language-tokenizer | charabia | unobtanium-segmenter | |
|---|---|---|---|
| Health Score | 53 | 60 | 48 |
| Total Downloads | 4.9K | 914.9K | 2.4K |
| 30d Downloads | 4.1K | 85.6K | 0 |
| Dependents | 2 | 145 | 3 |
| Releases | 3 | 31 | 9 |
| Last Updated | 45d ago | 182d ago | 92d ago |
| Age | 4m | 4y | 11m |
Health Breakdown
language-tokenizer
Maintenance
16
Quality
16
Community
4
Popularity
4
Documentation
13
charabia
Maintenance
14
Quality
13
Community
13
Popularity
7
Documentation
13
unobtanium-segmenter
Maintenance
14
Quality
13
Community
7
Popularity
4
Documentation
10
Technical Details
| language-tokenizer | charabia | unobtanium-segmenter | |
|---|---|---|---|
| Version | 0.3.0 | 0.9.9 | 0.5.2 |
| Stable (≥1.0) | ✗ No | ✗ No | ✗ No |
| License | non-standard | MIT | LGPL-3.0-only |
| Dependencies | 12 | 18 | 7 |
| Crate Size | 27KB | 1.1MB | 47KB |
| Features | 11 | 20 | 0 |
| Yanked % | 0.0% | 3.2% | 0.0% |
| Edition | 2024 | 2021 | 2024 |
| MSRV | 1.88.0 | — | — |
| Owners | 1 | 2 | 1 |
Links
Quick Verdict
- •charabia leads with a health score of 60/100, but none of the options score above 80.
- •charabia is depended on by 145 crates — strongest ecosystem trust.