Al
|
f6c30778bf
|
[normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling.
|
2015-09-23 19:41:01 -04:00 |
|
Al
|
0f77ca1213
|
[normalize] Adding a char_array version of normalize token
|
2015-08-10 16:11:34 -04:00 |
|
Al
|
9b69d1f67a
|
[fix] Removing C++ checks from all but the main API functions
|
2015-08-07 17:15:39 -04:00 |
|
Al
|
359a1efb03
|
[fix] Adding stdint.h include to most of the header files for portability
|
2015-08-07 02:43:44 -04:00 |
|
Al
|
46141a6c36
|
[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion
|
2015-08-02 14:34:36 -06:00 |
|
Al
|
551904d202
|
[normalize] cstring_array instead of string_tree for token-based normalization
|
2015-07-28 19:09:50 -04:00 |
|
Al
|
053b987d58
|
[normalize] adding an option for string trimming in normalize
|
2015-07-27 01:59:14 -04:00 |
|
Al
|
ee96dab93c
|
[fix] unnecessary headers
|
2015-07-25 13:49:42 -04:00 |
|
Al
|
5239c365d0
|
[docs] Adding some documentation for normalize.h options
|
2015-07-24 15:23:25 -04:00 |
|
Al
|
a38b924c5d
|
[fix] add_token_alternatives
|
2015-07-21 17:26:59 -04:00 |
|
Al
|
6ff91fef6b
|
[normalization] adding a normalize_string_latin method
|
2015-07-05 23:38:01 -04:00 |
|
Al
|
6cfbab9969
|
[normalization] string normalization module for tokens and full strings
|
2015-07-01 14:52:28 -04:00 |
|