libpostal

Author	SHA1	Message	Date
Al	122a81b610	[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib	2015-08-23 02:26:06 -04:00
Al	a419dad630	[languages] Adding canonical back in to language disambiguation (for prefixes/suffixes too), using non-canonicals/abbreviations in non-default languages if there are no other abbreviations found, adding in stopwords dictionaries	2015-08-23 00:43:37 -04:00
Al	a7d9cc1782	[fix] No longer using abbreviations for default languages, can be stopwords, etc.	2015-08-22 23:34:15 -04:00
Al	723058886a	[languages] Disambiguation uses language defaults, unicode normalized canonicals are treated as canonicals	2015-08-22 23:18:09 -04:00
Al	6231e17f2b	[languages] Disambiguation in language labeling better handles default languages and only uses canonical forms for non-default languages	2015-08-22 20:26:39 -04:00
Al	3902715258	[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases	2015-08-22 14:11:49 -04:00
Al	c5a9c392d4	[languages] Refactorying street_types_gazetteer a bit so dictionaries are configurable	2015-08-21 09:23:05 -04:00
Al	baa60aab65	[fix] language dismabiguation module	2015-08-21 08:03:20 -04:00
Al	ca6d802a43	[languages] Moving language id methods into a separate package	2015-08-21 08:00:56 -04:00

1 2

59 Commits