tokenizers

v0.23.1 Growing

Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.

Apache-2.0 Edition 2018

#tokenizer#nlp#wordpiece#huggingface#bpe

GitHub ↗ Docs ↗ crates.io ↗ Compare with…

Quick Verdict

✓Actively maintained (updated 77d ago)
!Pre-1.0: API may have breaking changes
✓Massive adoption (7.9K crates depend on it)
!Heavy dependency tree (33 direct deps)
✓Permissive license (Apache-2.0)

Security

Checking security advisories...

Downloads

22.3M

Dependents

7.9K

Releases

Size

196KB

Deep Insights

📊

Steady growth

3.3M downloads in the last 30 days (109.7K/day), up 9% from the previous period.

🔗

Widely adopted

7.9K crates depend on tokenizers. Strong ecosystem adoption means battle-tested code and long-term stability.

🔬

Pre-1.0 for over a year

Despite being 6+ years old, tokenizers hasn't reached 1.0 yet. Expect potential API changes between versions.

📦

Heavy dependency tree

33 direct dependencies. Consider the impact on compile times and supply chain complexity.

🌟

Used by top crates

Notable dependents include candle-core, fastembed, text-splitter, llm-tokenizer, xgrammar. When high-quality crates choose tokenizers, it's a strong quality signal.

Health Breakdown

Maintenance 14/25

Recency, release consistency, active ratio

Quality 12/25

Yanked ratio, deps, size, maturity, features

Community 16/20

Reverse deps, ownership, ecosystem

Popularity 8/15

Downloads, momentum, growth trend

Documentation 12/15

Docs, repo, license, metadata

Download Trend

Daily downloads · last 90 days

88K/day avg-22%

Top Dependents

Most downloaded crates that depend on tokenizers

toktrie_hf_tokenizers

Version Adoption

v0.22.2

39%

v0.21.4

38%

v0.21.2

10%

v0.21.1

v0.22.1

Release Timeline

10 releasessince 2024

2024

2025

2026

Less

Feature Flags

default =["progressbar", "onig", "esaxx_fast"]

httpesaxx_fast*rustls-tlsprogressbar*unstable_wasm

README

Loading README...

Maintainers

Dependencies

direct dependencies