language-tokenizer

v0.3.0 Experimental

Text tokenizer for linguistic purposes, such as text matching. Supports more than 40 languages, including English, French, Russian, Japanese, Thai etc.

non-standard Edition 2024 MSRV 1.88.0

#text#language#tokenizer#segmenter#linguistic

GitHub ↗ Docs ↗ crates.io ↗ Compare with…

Quick Verdict

!Pre-1.0: API may have breaking changes

Security

Checking security advisories...

Downloads

27.5K

Dependents

Releases

Size

27KB

Deep Insights

📈

Strong growth momentum

15.5K downloads in the last 30 days (518/day), up 46% from the previous period.

📐

Compact crate

At 26KB, language-tokenizer is lightweight. Small crate size correlates with focused, well-scoped functionality.

Health Breakdown

Maintenance 14/25

Recency, release consistency, active ratio

Quality 16/25

Yanked ratio, deps, size, maturity, features

Community 4/20

Reverse deps, ownership, ecosystem

Popularity 5/15

Downloads, momentum, growth trend

Documentation 13/15

Docs, repo, license, metadata

Download Trend

Daily downloads · last 90 days

297/day avg+145%

Top Dependents

Most downloaded crates that depend on language-tokenizer

misaki-rs

21.9K dl 40

Version Adoption

v0.1.0

99%

v0.3.0

v0.2.0

Release Timeline

3 releasessince 2026

2026

Less

Feature Flags

fullserdesnowballchinese-icujapanese-icukorean-linderachinese-linderasoutheast-asianjapanese-ipadic-linderajapanese-unidic-linderajapanese-ipadic-neologd-lindera

README

Loading README...

Maintainers

savannstm

6 crates

Dependencies

direct dependencies