tokenizers
v0.22.2 GrowingProvides an implementation of today's most used tokenizers, with a focus on performances and versatility.
Quick Verdict
- !Pre-1.0: API may have breaking changes
- βMassive adoption (3.6K crates depend on it)
- !Heavy dependency tree (33 direct deps)
- βPermissive license (Apache-2.0)
Security
Deep Insights
2.1M downloads in the last 30 days (69.8K/day), up 57% from the previous period.
3.6K crates depend on tokenizers. Strong ecosystem adoption means battle-tested code and long-term stability.
Despite being 6+ years old, tokenizers hasn't reached 1.0 yet. Expect potential API changes between versions.
33 direct dependencies. Consider the impact on compile times and supply chain complexity.
Notable dependents include text-splitter, fastembed, toktrie_hf_tokenizers, xgrammar, lancedb. When high-quality crates choose tokenizers, it's a strong quality signal.
Health Breakdown
Recency, release consistency, active ratio
Yanked ratio, deps, size, maturity, features
Reverse deps, ownership, ecosystem
Downloads, momentum, growth trend
Docs, repo, license, metadata
Download Trend
Top Dependents
Most downloaded crates that depend on tokenizers
Version Adoption
Release Timeline
Feature Flags
default =["progressbar", "onig", "esaxx_fast"]