53
language-tokenizer
v0.3.0 ExperimentalText tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.
non-standard Edition 2024 MSRV 1.88.0
#text#language#tokenizer#segmenter#linguistic
Quick Verdict
- โActively maintained (updated 45d ago)
- !Pre-1.0: API may have breaking changes
Security
Checking security advisories...
Downloads
4.9K
Dependents
2
Releases
3
Size
27KB
Deep Insights
๐
Strong growth momentum
4.1K downloads in the last 30 days (136/day), up 887% from the previous period.
๐
Compact crate
At 26KB, language-tokenizer is lightweight. Small crate size correlates with focused, well-scoped functionality.
Health Breakdown
Maintenance 16/25
Recency, release consistency, active ratio
Quality 16/25
Yanked ratio, deps, size, maturity, features
Community 4/20
Reverse deps, ownership, ecosystem
Popularity 4/15
Downloads, momentum, growth trend
Documentation 13/15
Docs, repo, license, metadata
Download Trend
Daily downloads ยท last 90 days
51/day avg+485%
Top Dependents
Most downloaded crates that depend on language-tokenizer
Version Adoption
v0.1.0
98%
v0.3.0
2%
v0.2.0
0%
Release Timeline
3 releasessince 2026
J
F
M
A
M
J
J
A
S
O
N
D
2026
3
LessMore
Feature Flags
fullserdesnowballchinese-icujapanese-icukorean-linderachinese-linderasoutheast-asianjapanese-ipadic-linderajapanese-unidic-linderajapanese-ipadic-neologd-lindera
README
Loading README...
Maintainers
Dependencies
12
direct dependencies
Dependents
2
crates depend on language-tokenizer