Merge pull request #379 from missinglink/patch-1

minor typo
This commit is contained in:
Al B
2025-02-08 12:01:43 -05:00
committed by GitHub

View File

@@ -519,7 +519,7 @@ language (IX => 9) which occur in the names of many monarchs, popes, etc.
- **Fast, accurate tokenization/lexing**: clocked at > 1M tokens / sec,
implements the TR-29 spec for UTF8 word segmentation, tokenizes East Asian
languages chracter by character instead of on whitespace.
languages character by character instead of on whitespace.
- **UTF8 normalization**: optionally decompose UTF8 to NFD normalization form,
strips accent marks e.g. à => a and/or applies Latin-ASCII transliteration.