libpostal

Author	SHA1	Message	Date
Al	d318db15f1	[tests] Numeric expression parsing tests	2016-01-28 16:36:20 -05:00
Al	aa4272ed9e	[tests] Transliteration tests	2016-01-28 16:36:09 -05:00
Al	f963e175e4	[tests] Expansion tests with and without language classifier	2016-01-28 16:35:32 -05:00
Al	fed599ac39	[version] bumping version to 0.3 for consistency	2016-01-28 16:34:41 -05:00
Al	87899050b2	[tests] Using greatest (https://github.com/silentbicycle/greatest ) for automated testing	2016-01-28 16:31:32 -05:00
Al	0bad3adf07	[docs] Removing the coming soon label from language classification, cleaning up the README a bit	2016-01-27 14:44:48 -05:00
Al	95a7978131	[build] Adding relevant language_classifier sources to build	2016-01-27 03:34:35 -05:00
Al	93ed2bf15b	[api] Making language optional in libpostal cli	2016-01-27 03:32:29 -05:00
Al	789db8f582	[build] Adding language classifier to data file download script. As the current file is rather large, added multipart downloads from S3 to speed things up	2016-01-27 03:31:45 -05:00
Al	42d169feee	[api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding!	2016-01-27 03:23:51 -05:00
Al	71c51f2e45	[language_classification] Making directory optional on language_classifier client/test program	2016-01-27 03:18:53 -05:00
Al	c770468d03	[expansion] Regenerated address_expansion_data.c	2016-01-27 03:17:59 -05:00
Al	36f52d9707	[fix] Removing feature printing	2016-01-26 15:34:56 -05:00
Al	239f8adec6	[docs] README updates now that the Python repo is separate	2016-01-26 02:40:07 -05:00
Al	cffc7e1034	[rm] Removing Python bindings from this project, moving to https://github.com/openvenues/pypostal	2016-01-26 02:17:23 -05:00
Al	5077462754	[fix] temporary files for language classifier training	2016-01-26 01:42:21 -05:00
Al	426edccbf8	[language_classification] Simple accuracy-based test program for language classifier.	2016-01-26 01:29:56 -05:00
Al	9abbf42bf4	[language_classifier] Command-line client for language classification	2016-01-26 01:20:59 -05:00
Al	314b65e192	[build] Adding shuffle.c to language_classifier_train	2016-01-26 01:18:35 -05:00
Al	ababb8f2d0	[fix] sign comparison in regularized gradient computation for logistic regression	2016-01-26 01:16:16 -05:00
Al	ae2b839f17	[build] Adding language classifier train/test/cli programs to the build	2016-01-26 00:09:07 -05:00
Al	299998d8b5	[languages] Making Basque the only default in the Basque region.	2016-01-24 19:35:03 -05:00
Al	b4dcb83e10	[fix] sets of potential languages in case phrase matches multiple dictionaries	2016-01-24 17:57:12 -05:00
Al	b713d102d1	[languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language	2016-01-24 17:43:14 -05:00
Al	b3e730d83f	[languages] If there's a single default language, assume ambiguous abbreviations are the default	2016-01-24 17:15:02 -05:00
Al	fffaeecfc6	[languages] Only count regional defaults when returning languages	2016-01-24 16:35:14 -05:00
Al	b735c79326	[languages] Adding Spanish in as a secondary default in Spain to supplement regional language defaults so we're more careful in disambiguation	2016-01-24 16:34:23 -05:00
Al	f8a0463aa0	[languages] Language disambiguation treats the national languages as non-default	2016-01-24 15:10:04 -05:00
Al	87aff60a7e	[dictionaries] Gulch	2016-01-24 03:23:40 -05:00
Al	f04360732c	[languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example)	2016-01-24 03:17:21 -05:00
Al	cb914ae85b	[dictionaries] Adding a few terms to English dictionaries for automated disambiguation in the US/Canada	2016-01-24 03:15:10 -05:00
Al	00ce71223f	[osm] Using the default probabilities for abbreviations in ways training data	2016-01-24 00:53:41 -05:00
Al	bab7a0f961	[osm] splitting streets (way names) on semicolons	2016-01-24 00:42:25 -05:00
Al	3485738c2b	[fix] regional languages in French Canada	2016-01-24 00:20:34 -05:00
Al	7646adfc0f	[osm] Adding abbreviated street names in addition to the originals	2016-01-23 23:23:58 -05:00
Al	67130383ce	[fix] converting semicolons to commas in OSM house numbers and picking one at random	2016-01-23 23:16:19 -05:00
Al	1bb797f783	[fix] spacing in phrases	2016-01-23 21:59:49 -05:00
Al	3a8c3dfcf6	[fix] spacing in phrases at end of string	2016-01-23 21:51:40 -05:00
Al	78450bfad9	[fix] Spaces in abbreviation	2016-01-23 21:36:20 -05:00
Al	308ceb5a5f	[fix] convert UTF8 slices back to unicode before using with the Python trie	2016-01-23 20:20:23 -05:00
Al	5eb6bb309b	[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string	2016-01-23 20:09:45 -05:00
Al	d61207e95a	[fix] var name	2016-01-23 18:01:02 -05:00
Al	e44cba1d06	[fix] geonames db not required in OSM training data	2016-01-23 17:59:55 -05:00
Al	4f03711e60	[osm] Adding abbreviated training examples to ways language training data	2016-01-23 14:10:47 -05:00
Al	c9fb4ee69d	[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used	2016-01-22 17:58:24 -05:00
Al	ea9bb3f2d5	[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled	2016-01-22 15:48:21 -05:00
Al	f9f6558e06	[fix] simple whitespace field splits for the limited format training data (used for language classification)	2016-01-22 04:34:42 -05:00
Al	cd1db7b288	[fix] Making sure rare components are dropped first, adding state and country back in	2016-01-22 04:17:19 -05:00
Al	adc3a00264	[fix] var name	2016-01-22 04:10:16 -05:00
Al	261beffa36	[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities	2016-01-22 04:00:45 -05:00

1 2 3 4 5 ...

1398 Commits