Decision Workspace
language-tokenizer vs charabia vs unobtanium-segmenter
Side-by-side comparison of Rust crates
45
language-tokenizer
experimentalv0.1.0
Text tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.
62
charabia
growingv0.9.9
A simple library to detect the language, tokenize the text and normalize the tokens
51
unobtanium-segmenter
experimentalv0.5.2
A text segmentation toolbox for search applications inspired by charabia and tantivy.
Core Metrics
| language-tokenizer | charabia | unobtanium-segmenter | |
|---|---|---|---|
| Health Score | 45 | 62 | 51 |
| Total Downloads | 447 | 761.2K | 2.3K |
| 30d Downloads | 137 | 83.1K | 68 |
| Dependents | 2 | 140 | 3 |
| Releases | 1 | 31 | 9 |
| Last Updated | 79d ago | 123d ago | 33d ago |
| Age | 2m | 3y 11m | 9m |
Health Breakdown
language-tokenizer
Maintenance
9
Quality
16
Community
4
Popularity
3
Documentation
13
charabia
Maintenance
16
Quality
13
Community
13
Popularity
7
Documentation
13
unobtanium-segmenter
Maintenance
17
Quality
13
Community
7
Popularity
4
Documentation
10
Technical Details
| language-tokenizer | charabia | unobtanium-segmenter | |
|---|---|---|---|
| Version | 0.1.0 | 0.9.9 | 0.5.2 |
| Stable (≥1.0) | ✗ No | ✗ No | ✗ No |
| License | non-standard | MIT | LGPL-3.0-only |
| Dependencies | 12 | 18 | 7 |
| Crate Size | 27KB | 1.1MB | 47KB |
| Features | 11 | 20 | 0 |
| Yanked % | 0.0% | 3.2% | 0.0% |
| Edition | 2021 | 2021 | 2024 |
| MSRV | 1.83.0 | — | — |
| Owners | 1 | 2 | 1 |
Links
Quick Verdict
- •charabia leads with a health score of 62/100, but none of the options score above 80.
- •charabia is depended on by 140 crates — strongest ecosystem trust.