rustio rustio.net

tokenizer vs unicode-segmentation vs text-splitter

Side-by-side comparison of Rust crates

Core Metrics

tokenizerunicode-segmentationtext-splitter
Health Score396959
Total Downloads3.9K335.5M1.1M
30d Downloads4823.8M113.6K
Dependents012.0K654
Releases22660
Last Updated2126d ago1d ago87d ago
Age5y 10m10y 11m2y 10m

Health Breakdown

tokenizer
Maintenance
3
Quality
17
Community
6
Popularity
4
Documentation
9
unicode-segmentation
Maintenance
17
Quality
17
Community
16
Popularity
8
Documentation
11
text-splitter
Maintenance
14
Quality
13
Community
13
Popularity
7
Documentation
12

Technical Details

tokenizerunicode-segmentationtext-splitter
Version0.1.21.13.20.29.3
Stable (≥1.0)✗ No✓ Yes✗ No
LicenseBSD-3-ClauseMIT OR Apache-2.0MIT
Dependencies2321
Crate Size17KB112KB59KB
Features314
Yanked %0.0%7.7%1.7%
Edition201820182021
MSRV1.85.01.83.0
Owners161

Quick Verdict

  • unicode-segmentation leads with a health score of 69/100, but none of the options score above 80.
  • unicode-segmentation is depended on by 12.0K crates — strongest ecosystem trust.
  • ⚠ tokenizer has not been updated in over a year.
  • tokenizer, text-splitter are pre-1.0 — API may change.