Commit Graph

144 Commits

Author SHA1 Message Date
Al
8abbb273b2 [osm] adding the excellent ftfy (https://github.com/LuminosoInsight/python-ftfy) to fix Mojibake, etc. in address components 2016-12-26 21:18:14 -05:00
Al
151287856d [openaddresses] fixing regexes for house number validation 2016-12-23 01:18:46 -05:00
Al
043dafc12a [openaddresses] add osm_neighborhood_overrides_city option for some countries that list what-we-otherwise-think-are-suburbs as the city 2016-12-22 17:50:21 -05:00
Al
7d195ca331 [fix] not allowing postal codes to pass validation if they are simply float zero 2016-12-22 02:59:54 -05:00
Al
cc4098fb05 [openaddresses] abbreviate states as well in OpenAddresses when full version is specified 2016-12-20 17:24:12 -05:00
Al
9e44fcb2bb [addresses] abbreviating neighborhoods/city_districts 2016-12-20 03:01:34 -05:00
Al
56ca37d1f3 [fix] openaddresses config reading 2016-12-19 02:18:24 -05:00
Al
86a8315b9d [openaddresses] adding new config option to OA config for aliasing fields based on a regex 2016-12-18 01:50:58 -05:00
Al
3c6ed7489c [openaddresses] adding regex replacement to remove "*" from any field 2016-12-16 17:09:41 -05:00
Al
ba96f68b62 [fix] openaddresses formatter 2016-12-16 14:22:15 -05:00
Al
da3240d5f6 [openaddresses] making field maps in OpenAddresses config a dictionary rather than a list to make inheritance easier 2016-12-16 06:54:36 -05:00
Al
83aab5a46a [openaddresses] adding option to map values for a particular field 2016-12-16 06:44:19 -05:00
Al
5098599ed6 [addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now 2016-12-10 02:08:40 -05:00
Al
e92963de50 [openaddresses] adding new counties from OpenAddresses, strip commas option for thousands separators 2016-12-09 01:57:21 -05:00
Al
3ff472c8cf [openaddresses] fixing house numbers with multiple consecutive hyphens 2016-12-06 22:50:14 -05:00
Al
da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data 2016-12-05 18:36:17 -05:00
Al
cdbc102821 [boundaries] in addition to population, check if a city has an unambiguous Wikipedia 2016-11-25 13:36:49 -08:00
Al
87634a36e1 [openaddresses] for cases where city populations are not known (i.e. not getting boundaries from OSM, most of the sources in OpenAddresses), place-only records should have at least two identifying components. Helps when city names, etc. are highly ambiguous and need to be qualified 2016-11-25 00:56:38 -08:00
Al
e07c74f077 [fix] config 2016-11-24 03:57:52 -05:00
Al
46b7043dc7 [fix] typo 2016-11-24 03:50:11 -05:00
Al
fcf4717335 [openaddresses] adding city_replacements handling to OA formatter 2016-11-23 20:16:48 -05:00
Al
5cabd9b4f7 [fix] country languages in OpenAddresses 2016-10-24 17:35:39 -04:00
Al
35d3d8cc73 [openaddresses] countries are known a priori, so if the boundaries don't quite line up with OSM, use the country from the path 2016-10-23 19:50:54 -04:00
Al
1658c425c5 [fix] clear country cache only at each new country, not each file 2016-10-23 00:57:52 -04:00
Al
7199ff17e0 [fix] truncate postcodes that are longer than specified length 2016-10-23 00:52:24 -04:00
Al
889e914dfc [openaddresses] clear all polygon caches 2016-10-23 00:11:54 -04:00
Al
63edd53fb3 [openaddresses] adding clear_cache method to clear the LRU cache for point-in-polygon indices and using it in OpenAddresses import since it heavily reuses polygons and only for the current file 2016-10-22 20:28:59 -04:00
Al
2a355b2cf8 [openaddresses] adding address only 10% of the time in OpenAddresses 2016-10-20 23:57:30 -04:00
Al
d965ea9371 [openaddresses] adding hyphenation/dehyphenation to the OpenAddresses formatter 2016-10-20 20:55:17 -04:00
Al
ecd71ee10d [fix] var name 2016-10-06 15:36:51 -04:00
Al
6b0186782d [openaddresses] doing country-specific cleanups in OpenAddresses 2016-10-05 17:07:29 -04:00
Al
432f9dd42e [fix] format of candidate_languages in the new OSM rtree 2016-10-05 03:12:07 -04:00
Al
faf418decb [languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses 2016-10-05 02:49:55 -04:00
Al
ad6ddd1ede [fix] var names 2016-10-04 14:35:45 -04:00
Al
373708b595 [openaddresses] replace name affixes (remove things like "city of"), prune duplicate names, remove numeric boundary names, cleanup boundary names, and add house number + postcode phrases where appropriate 2016-09-22 00:57:11 -04:00
Al
d667039397 [openaddresses] for configs with add_osm_boundaries=true, skip adding boundary fields from the OA file altogether when they're specified 2016-09-16 01:55:36 -04:00
Al
95cf6ad0fa [fix] default again 2016-09-16 01:11:59 -04:00
Al
d5a5104de9 [fix] default 2016-09-16 01:10:19 -04:00
Al
32ad1d7bd0 [fix] var name 2016-09-16 01:07:10 -04:00
Al
b618d1eaf2 [fix] var name 2016-09-16 01:02:47 -04:00
Al
9b250a9393 [openaddresses] adding zero-padding option for postcodes and using in Puerto Rico 2016-09-15 11:22:55 -04:00
Al
e8408d39fd [fix] unzip_file checks status code 2016-09-12 16:42:02 -04:00
Al
551cce8cb1 [fix] making a separate gazetteer for toponym abbreviations 2016-09-10 01:08:58 -04:00
Al
bcde9e2fe7 [fix] toponym abbreviations after country name, may want to use it 2016-09-10 00:49:31 -04:00
Al
bbc5131cb6 [fix] toponym abbreviations 2016-09-10 00:48:31 -04:00
Al
19a044f7f3 [fix] imports 2016-09-10 00:09:11 -04:00
Al
ae02b0769d [openaddresses] abbreviating boundary components for OpenAddresses 2016-09-10 00:04:11 -04:00
Al
5d26ab41e7 [openaddresses] removing OpenAddresses hacks now that upstream changes are merged 2016-09-09 09:40:45 -04:00
Al
4c6bcda3b2 [fix] config 2016-09-08 15:21:19 -04:00
Al
d1e3c6a24a [openaddresses] adding Italy countrywide to a pre_release_downloads set so it can be used in libpostal without having been merged yet 2016-09-08 15:16:35 -04:00