Decision Workspace
bpetok vs tokenizers vs text-splitter
Side-by-side comparison of Rust crates
36
bpetok
growingv0.1.2
A simple CLI for tokenizing text input using Byte Pair Encoding (BPE).
60
tokenizers
growingv0.22.2
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
59
text-splitter
growingv0.29.3
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
Core Metrics
| bpetok | tokenizers | text-splitter | |
|---|---|---|---|
| Health Score | 36 | 60 | 59 |
| Total Downloads | 3.1K | 12.7M | 1.1M |
| 30d Downloads | 5 | 1.9M | 110.8K |
| Dependents | 0 | 3.6K | 654 |
| Releases | 3 | 39 | 60 |
| Last Updated | 548d ago | 115d ago | 87d ago |
| Age | 1y 6m | 6y 7m | 2y 10m |
Health Breakdown
bpetok
Maintenance
6
Quality
15
Community
6
Popularity
4
Documentation
5
tokenizers
Maintenance
12
Quality
12
Community
16
Popularity
8
Documentation
12
text-splitter
Maintenance
14
Quality
13
Community
13
Popularity
7
Documentation
12
Technical Details
| bpetok | tokenizers | text-splitter | |
|---|---|---|---|
| Version | 0.1.2 | 0.22.2 | 0.29.3 |
| Stable (≥1.0) | ✗ No | ✗ No | ✗ No |
| License | non-standard | Apache-2.0 | MIT |
| Dependencies | 3 | 33 | 21 |
| Crate Size | 11KB | 186KB | 59KB |
| Features | 0 | 6 | 4 |
| Yanked % | 0.0% | 2.6% | 1.7% |
| Edition | 2021 | 2018 | 2021 |
| MSRV | — | — | 1.83.0 |
| Owners | 1 | 4 | 1 |
Links
Quick Verdict
- •tokenizers leads with a health score of 60/100, but none of the options score above 80.
- •tokenizers is depended on by 3.6K crates — strongest ecosystem trust.
- •⚠ bpetok has not been updated in over a year.