libpostal

Author	SHA1	Message	Date
Al	66a71ab70d	[normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes	2015-08-11 23:36:08 -04:00
Al	4bc6adf669	[normalize] Adding the original script as an alternative in transliteration mode as well	2015-08-10 17:48:48 -04:00
Al	0f77ca1213	[normalize] Adding a char_array version of normalize token	2015-08-10 16:11:34 -04:00
Al	46141a6c36	[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion	2015-08-02 14:34:36 -06:00
Al	551904d202	[normalize] cstring_array instead of string_tree for token-based normalization	2015-07-28 19:09:50 -04:00
Al	053b987d58	[normalize] adding an option for string trimming in normalize	2015-07-27 01:59:14 -04:00
Al	a38b924c5d	[fix] add_token_alternatives	2015-07-21 17:26:59 -04:00
Al	6ff91fef6b	[normalization] adding a normalize_string_latin method	2015-07-05 23:38:01 -04:00
Al	a08d59c277	[fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string	2015-07-05 15:28:07 -04:00
Al	6cfbab9969	[normalization] string normalization module for tokens and full strings	2015-07-01 14:52:28 -04:00