Commit Graph

17 Commits

Author SHA1 Message Date
Al
6c39c663ff [normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization 2016-07-21 17:04:57 -04:00
Al
6dad58c696 [fix][ci skip] last remaining instance of vignt in libpostal 2016-03-29 12:51:19 -04:00
Al
afd5844f21 [normalize] Permuting transliterators only once on the entire string rather than at each script break (so # permutations is bounded and can't get huge). Fixing some spacing issues. Adding method to check for an alpha+numeric token in normalization. 2016-02-08 01:16:47 -05:00
Al
ff75c5cc50 [normalize] Adding normalize_string_languages method which can use additional transliterators 2015-12-31 03:50:36 -05:00
Al
40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) 2015-10-27 13:32:47 -04:00
Al
f6c30778bf [normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling. 2015-09-23 19:41:01 -04:00
Al
0f77ca1213 [normalize] Adding a char_array version of normalize token 2015-08-10 16:11:34 -04:00
Al
9b69d1f67a [fix] Removing C++ checks from all but the main API functions 2015-08-07 17:15:39 -04:00
Al
359a1efb03 [fix] Adding stdint.h include to most of the header files for portability 2015-08-07 02:43:44 -04:00
Al
46141a6c36 [normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion 2015-08-02 14:34:36 -06:00
Al
551904d202 [normalize] cstring_array instead of string_tree for token-based normalization 2015-07-28 19:09:50 -04:00
Al
053b987d58 [normalize] adding an option for string trimming in normalize 2015-07-27 01:59:14 -04:00
Al
ee96dab93c [fix] unnecessary headers 2015-07-25 13:49:42 -04:00
Al
5239c365d0 [docs] Adding some documentation for normalize.h options 2015-07-24 15:23:25 -04:00
Al
a38b924c5d [fix] add_token_alternatives 2015-07-21 17:26:59 -04:00
Al
6ff91fef6b [normalization] adding a normalize_string_latin method 2015-07-05 23:38:01 -04:00
Al
6cfbab9969 [normalization] string normalization module for tokens and full strings 2015-07-01 14:52:28 -04:00