Al
|
66a71ab70d
|
[normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes
|
2015-08-11 23:36:08 -04:00 |
|
Al
|
4bc6adf669
|
[normalize] Adding the original script as an alternative in transliteration mode as well
|
2015-08-10 17:48:48 -04:00 |
|
Al
|
0f77ca1213
|
[normalize] Adding a char_array version of normalize token
|
2015-08-10 16:11:34 -04:00 |
|
Al
|
46141a6c36
|
[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion
|
2015-08-02 14:34:36 -06:00 |
|
Al
|
551904d202
|
[normalize] cstring_array instead of string_tree for token-based normalization
|
2015-07-28 19:09:50 -04:00 |
|
Al
|
053b987d58
|
[normalize] adding an option for string trimming in normalize
|
2015-07-27 01:59:14 -04:00 |
|
Al
|
a38b924c5d
|
[fix] add_token_alternatives
|
2015-07-21 17:26:59 -04:00 |
|
Al
|
6ff91fef6b
|
[normalization] adding a normalize_string_latin method
|
2015-07-05 23:38:01 -04:00 |
|
Al
|
a08d59c277
|
[fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string
|
2015-07-05 15:28:07 -04:00 |
|
Al
|
6cfbab9969
|
[normalization] string normalization module for tokens and full strings
|
2015-07-01 14:52:28 -04:00 |
|