Al
|
58661c9f27
|
[languages] adding replace_hyphens and split_alpha_from_numeric in language classifier input normalization
|
2017-04-02 23:32:24 -04:00 |
|
Al
|
0dfd8d6439
|
[language_classification] Adding script feature for any non-Latin script. Even if the script doesn't directly identify the language, it can act as a modified intercept (all Han script addresses will share the Han feature, even if we haven't seen one of the > 80k Han characters)
|
2016-01-17 21:37:45 -05:00 |
|
Al
|
b13462f8ef
|
[language_classifier] Features for address languages classification, quadgrams for most languages, unigrams for ideographic characters, script for single-script languages like Thai, Hebrew, etc.
|
2016-01-09 03:42:57 -05:00 |
|