libpostal

Author	SHA1	Message	Date
Al	c770468d03	[expansion] Regenerated address_expansion_data.c	2016-01-27 03:17:59 -05:00
Al	36f52d9707	[fix] Removing feature printing	2016-01-26 15:34:56 -05:00
Al	239f8adec6	[docs] README updates now that the Python repo is separate	2016-01-26 02:40:07 -05:00
Al	cffc7e1034	[rm] Removing Python bindings from this project, moving to https://github.com/openvenues/pypostal	2016-01-26 02:17:23 -05:00
Al	5077462754	[fix] temporary files for language classifier training	2016-01-26 01:42:21 -05:00
Al	426edccbf8	[language_classification] Simple accuracy-based test program for language classifier.	2016-01-26 01:29:56 -05:00
Al	9abbf42bf4	[language_classifier] Command-line client for language classification	2016-01-26 01:20:59 -05:00
Al	314b65e192	[build] Adding shuffle.c to language_classifier_train	2016-01-26 01:18:35 -05:00
Al	ababb8f2d0	[fix] sign comparison in regularized gradient computation for logistic regression	2016-01-26 01:16:16 -05:00
Al	ae2b839f17	[build] Adding language classifier train/test/cli programs to the build	2016-01-26 00:09:07 -05:00
Al	299998d8b5	[languages] Making Basque the only default in the Basque region.	2016-01-24 19:35:03 -05:00
Al	b4dcb83e10	[fix] sets of potential languages in case phrase matches multiple dictionaries	2016-01-24 17:57:12 -05:00
Al	b713d102d1	[languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language	2016-01-24 17:43:14 -05:00
Al	b3e730d83f	[languages] If there's a single default language, assume ambiguous abbreviations are the default	2016-01-24 17:15:02 -05:00
Al	fffaeecfc6	[languages] Only count regional defaults when returning languages	2016-01-24 16:35:14 -05:00
Al	b735c79326	[languages] Adding Spanish in as a secondary default in Spain to supplement regional language defaults so we're more careful in disambiguation	2016-01-24 16:34:23 -05:00
Al	f8a0463aa0	[languages] Language disambiguation treats the national languages as non-default	2016-01-24 15:10:04 -05:00
Al	87aff60a7e	[dictionaries] Gulch	2016-01-24 03:23:40 -05:00
Al	f04360732c	[languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example)	2016-01-24 03:17:21 -05:00
Al	cb914ae85b	[dictionaries] Adding a few terms to English dictionaries for automated disambiguation in the US/Canada	2016-01-24 03:15:10 -05:00
Al	00ce71223f	[osm] Using the default probabilities for abbreviations in ways training data	2016-01-24 00:53:41 -05:00
Al	bab7a0f961	[osm] splitting streets (way names) on semicolons	2016-01-24 00:42:25 -05:00
Al	3485738c2b	[fix] regional languages in French Canada	2016-01-24 00:20:34 -05:00
Al	7646adfc0f	[osm] Adding abbreviated street names in addition to the originals	2016-01-23 23:23:58 -05:00
Al	67130383ce	[fix] converting semicolons to commas in OSM house numbers and picking one at random	2016-01-23 23:16:19 -05:00
Al	1bb797f783	[fix] spacing in phrases	2016-01-23 21:59:49 -05:00
Al	3a8c3dfcf6	[fix] spacing in phrases at end of string	2016-01-23 21:51:40 -05:00
Al	78450bfad9	[fix] Spaces in abbreviation	2016-01-23 21:36:20 -05:00
Al	308ceb5a5f	[fix] convert UTF8 slices back to unicode before using with the Python trie	2016-01-23 20:20:23 -05:00
Al	5eb6bb309b	[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string	2016-01-23 20:09:45 -05:00
Al	d61207e95a	[fix] var name	2016-01-23 18:01:02 -05:00
Al	e44cba1d06	[fix] geonames db not required in OSM training data	2016-01-23 17:59:55 -05:00
Al	4f03711e60	[osm] Adding abbreviated training examples to ways language training data	2016-01-23 14:10:47 -05:00
Al	c9fb4ee69d	[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used	2016-01-22 17:58:24 -05:00
Al	ea9bb3f2d5	[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled	2016-01-22 15:48:21 -05:00
Al	f9f6558e06	[fix] simple whitespace field splits for the limited format training data (used for language classification)	2016-01-22 04:34:42 -05:00
Al	cd1db7b288	[fix] Making sure rare components are dropped first, adding state and country back in	2016-01-22 04:17:19 -05:00
Al	adc3a00264	[fix] var name	2016-01-22 04:10:16 -05:00
Al	261beffa36	[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities	2016-01-22 04:00:45 -05:00
Al	a6cc3d0114	[fix] Adding state to the more frequently dropped components	2016-01-22 03:56:38 -05:00
Al	bca3dae004	[fix] state full name probabilities for limited vs. full formatted OSM training sets	2016-01-22 03:54:20 -05:00
Al	d1cf253092	[osm/formatting] Higher probability of dropout for rare components like counties, etc.	2016-01-22 03:39:35 -05:00
Al	9dd965a6fa	[fix] removing gazetteer configuration from disambiguation module	2016-01-22 03:18:18 -05:00
Al	b22646ee30	[mv] Moving gazetteers into their own module	2016-01-22 03:15:56 -05:00
Al	5a68e7aeef	[fix] import	2016-01-22 03:00:43 -05:00
Al	6ac72576bc	[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK	2016-01-22 02:56:39 -05:00
Al	f4995d4f0f	[languages] Adding several different types of dictionaries for name expansion/abbreviation in OSM	2016-01-22 00:51:32 -05:00
Al	89aa039692	[dictionaries] Adding some Italian month abbreviations	2016-01-21 15:12:46 -05:00
Al	26cbb1eb8d	[languages] Fixing multiple expansions in the same dictionary for Python trie, adding length for prefixes/suffixes	2016-01-21 04:29:14 -05:00
Al	0269d92e3d	[languages] Adding canonical string and dictionary type to Python trie, modifying disambiguate_languages accordingly, and adding lists of alternate forms	2016-01-21 02:30:59 -05:00

1 2 3 4 5 ...

1387 Commits