libpostal

Author	SHA1	Message	Date
Al	20567bf9a3	[polygons] Adding full quattroshapes-backed reverse geocoder to add to OSM training data	2015-10-12 15:37:21 -05:00
Al	1b2642fe58	[polygons] Addindg ability to specify include properties by filename	2015-10-12 15:36:24 -05:00
Al	151161cab3	[fix] Raising error in geonames output if a country cannot be localized	2015-10-07 03:45:56 -04:00
Al	1917816b80	[countries] Not relying on pycountry alpha 2 codes for localized country names as it doesn't contain Kosovo which was causing problems	2015-10-07 03:44:49 -04:00
Al	cfa57c96a3	[fix] untagged formatted addresses	2015-10-04 02:02:59 -04:00
Al	5d2a24872a	[osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode}	2015-10-03 15:22:26 -04:00
Al	77be2fe433	[osm] Adjusting priors for country code expansion	2015-10-03 15:13:16 -04:00
Al	0b98a26426	[fix] keeping name tag in address components	2015-10-03 15:10:14 -04:00
Al	0f9ad259dc	[osm] Doing initial formatting after replacing country/state	2015-10-03 14:40:38 -04:00
Al	71233c9c02	[fix] import, initialization	2015-10-03 14:37:08 -04:00
Al	85b17d9b27	[fix] file encoding	2015-10-03 14:34:29 -04:00
Al	1948aa87ea	[fix] typo	2015-10-03 14:33:45 -04:00
Al	22efce7337	[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input	2015-10-03 14:31:51 -04:00
Al	8920812055	[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data	2015-10-03 14:25:30 -04:00
Al	7eb18f3538	[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)	2015-10-03 13:20:23 -04:00
Al	db71b65412	[fix] checking validity of component combination	2015-10-02 20:28:45 -04:00
Al	a2fd6e25f8	[fix] import	2015-10-02 20:25:48 -04:00
Al	49abb70b59	[fix] dictionary	2015-10-02 20:24:21 -04:00
Al	521f33d892	[fix] bitset for address components, only looking at valid component keys	2015-10-02 20:21:59 -04:00
Al	528285f735	[fix] only OSM tagged addresses need extra logic	2015-10-02 20:18:30 -04:00
Al	83aecb9f2c	[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)	2015-10-02 19:54:28 -04:00
Al	c790a2b87f	[fix] spoken/official	2015-10-02 19:50:11 -04:00
Al	db3364be30	[geonames] Using official country languages in GeoNames	2015-10-01 02:21:14 -04:00
Al	7dfbcce9ec	[languages] options for get_country_languages	2015-09-30 04:09:07 -04:00
Al	86e9166ae8	[doc] doumentation for country_names module, fixing variable name	2015-09-30 03:08:04 -04:00
Al	42e77cb570	[countries] Making country official names align better with OSM/Wikipedia, plugging holes	2015-09-30 01:03:03 -04:00
Al	40cf247655	[formatting] Constants for field names, a few options in format_address	2015-09-29 23:03:37 -04:00
Al	22e8178a97	[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names	2015-09-29 21:10:38 -04:00
Al	daad1a1313	[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)	2015-09-28 17:46:53 -04:00
Al	f29f2f091b	[fix] PEBCAK	2015-09-27 22:49:27 -04:00
Al	93b3110a49	[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting	2015-09-27 19:25:34 -04:00
Al	d3bfaf6b43	[osm/formatting] Fixing formatting tagged addresses with comma separated fields	2015-09-27 03:19:23 -04:00
Al	d512201e2c	[fix] removing space from tokens in address formatting	2015-09-27 02:18:34 -04:00
Al	5b829cd5a7	[fix] blank values containing punctuation in formatting	2015-09-26 21:49:28 -04:00
Al	dac0440be8	[fix] rsplit	2015-09-26 21:07:54 -04:00
Al	ae93552455	[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue	2015-09-26 03:56:52 -04:00
Al	0c792a2cc3	[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge	2015-09-26 03:21:26 -04:00
Al	5417b4e602	[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories	2015-09-25 23:59:38 -04:00
Al	8fe791a14a	[fix] ensure_dir in file downloads	2015-09-25 17:05:22 -04:00
Al	646b9f7248	[osm/formatting] Continuing to use openvenues formatter for the India fix	2015-09-25 13:36:24 -04:00
Al	9901dd2aac	[fix] Switching address formatter back to OpenCageData repo	2015-09-24 18:42:17 -04:00
Al	3ce1669c30	[fix] import	2015-09-24 01:25:00 -04:00
Al	c85ce0b11d	[osm/formatting] Tagging separators as well in tagged output of the address formatter	2015-09-24 01:22:49 -04:00
Al	abfb1d4a60	[transliteration] Wide char support in transliteration data generator	2015-09-23 03:56:12 -04:00
Al	7e057b0fb8	[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)	2015-09-23 00:42:54 -04:00
Al	8562c7a5cb	[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.	2015-09-23 00:37:59 -04:00
Al	13bcc35523	[unicode] Allowing wide chars in unicode properties	2015-09-23 00:34:07 -04:00
Al	b4593b6f88	[unicode/tokenization] Using new character classes including wide chars in scanner	2015-09-23 00:33:14 -04:00
Al	a76831df7a	[unicode] Wide version of word breaks	2015-09-22 18:55:33 -04:00
Al	25917cfb17	[fix] scripts	2015-09-22 15:15:30 -04:00

1 2 3 4 5 ...

280 Commits