Al
|
46141a6c36
|
[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion
|
2015-08-02 14:34:36 -06:00 |
|
Al
|
551904d202
|
[normalize] cstring_array instead of string_tree for token-based normalization
|
2015-07-28 19:09:50 -04:00 |
|
Al
|
053b987d58
|
[normalize] adding an option for string trimming in normalize
|
2015-07-27 01:59:14 -04:00 |
|
Al
|
a38b924c5d
|
[fix] add_token_alternatives
|
2015-07-21 17:26:59 -04:00 |
|
Al
|
6ff91fef6b
|
[normalization] adding a normalize_string_latin method
|
2015-07-05 23:38:01 -04:00 |
|
Al
|
a08d59c277
|
[fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string
|
2015-07-05 15:28:07 -04:00 |
|
Al
|
6cfbab9969
|
[normalization] string normalization module for tokens and full strings
|
2015-07-01 14:52:28 -04:00 |
|