Commit Graph

1542 Commits

Author SHA1 Message Date
Al
be6c01f5fd [fix] csv 2016-08-31 17:45:04 -04:00
Al
d3da513375 [fix] import 2016-08-31 17:44:16 -04:00
Al
4ed362d5f8 [openaddresses] adding script option to download all completed OA files instead of just what's in the config 2016-08-31 17:43:07 -04:00
Al
e4e35d0593 [osm] adding no_global_overrides option for boundary configs 2016-08-30 12:44:24 -04:00
Al
e98cf67f0e [openaddresses] also allowing house numbers like "37/A" 2016-08-29 22:56:36 -04:00
Al
78a210c409 [openaddresses] replacing backticks with apostrophe, comes up in several countries 2016-08-29 21:42:10 -04:00
Al
3f5b3dcb1d [openaddresses] Allowing slashes in house numbers in OpenAddresses 2016-08-29 21:26:33 -04:00
Al
ebb34bcc2f [openaddresses] config option to skip rows missing specific fields 2016-08-29 19:19:32 -04:00
Al
9b9036243c [fix] overwrite on unzip, logging 2016-08-29 00:40:11 -04:00
Al
5b5af04a44 [fix] redundant line 2016-08-29 00:37:17 -04:00
Al
6284ec39db [fix] name 2016-08-29 00:36:45 -04:00
Al
75ece5f5e9 [fix] import 2016-08-29 00:36:22 -04:00
Al
f5b2b6327e [openaddresses] Using a download script to download the individual OA files of interest rather than the collected file with expansions applied 2016-08-29 00:34:39 -04:00
Al
4d36e2553a [utils] Using curl with redirects and retries for download_file 2016-08-29 00:32:29 -04:00
Al
a0cf6ff225 [openaddresses] Allowing house numbers like "11 C" 2016-08-28 19:11:41 -04:00
Al
ac403bbe49 [openaddresses] Adding sin numero validator (sem numero in this case) for Portuguese 2016-08-28 18:39:19 -04:00
Al
27c5c8536a [openaddresses] adding debug argument to OpenAddresses training data 2016-08-28 17:58:41 -04:00
Al
6740e5a1c6 [fix] var name 2016-08-28 17:55:10 -04:00
Al
7ea47126ba [fix] logging 2016-08-28 15:54:55 -04:00
Al
a58194ca2e [fix] add_admin_boundaries and adding cleaned up house number 2016-08-28 15:15:57 -04:00
Al
bae04eb543 [fix] int 2016-08-28 14:11:25 -04:00
Al
de0a7bfe4f [fix] /or/and/ 2016-08-28 14:09:30 -04:00
Al
51590825ee [fix] do component dropout anyway 2016-08-28 14:07:49 -04:00
Al
44e59e8daf [fix] return the original for already abbreviated tokens 2016-08-28 14:05:58 -04:00
Al
f69e63e311 [openaddresses] Place component dropout. Obtain population from OSM components when we have them but otherwise assume it's actually 0 (not unknown), that way the more conservative probabilities will be used i.e. state names will be included more often rather than unqualified cities 2016-08-28 13:59:28 -04:00
Al
dea5fbbf2e [logging] printing off filenames in constructing OpenAddresses training data 2016-08-28 12:11:53 -04:00
Al
3cf3e401db [fix] abbreviation recasing 2016-08-28 12:04:36 -04:00
Al
3da80b0706 [fix] typo 2016-08-28 11:55:40 -04:00
Al
aa62b8e8b4 [fix] indentation 2016-08-28 11:48:27 -04:00
Al
b8b1ac1261 [openaddresses] Handling validation after cleanup, adding per-field regex replacements 2016-08-28 11:47:30 -04:00
Al
3ae7a15960 [openaddresses] Adding a few special cases for Spanish. Rewrite simple numeric street names to include the oft-omitted Calle (e.g. 27 => Calle 27), which is uniformly omitted in the Spanish-language data in OpenAddresses while still being valid for grid-based cities like Mérida. Humans and signs usually add Calle for numeric streets while it may be omitted for named streets 2016-08-27 15:03:23 -04:00
Al
15f9817933 [openaddresses] Replacing number sign in house number 2016-08-27 02:42:06 -04:00
Al
01ac1371b5 [openaddresses] Cleaning up house numbers as well, which can sometimes be stored as floats 2016-08-27 01:50:05 -04:00
Al
4ed394cc1c [openaddresses] Omitting fields with the value "unknown" 2016-08-27 00:46:21 -04:00
Al
6723fff9b4 [fix] unit phrases 2016-08-27 00:23:51 -04:00
Al
d29e4f3b2e [openaddresses] Adding optional hyphen between unit number 2016-08-26 23:46:19 -04:00
Al
8c6a4c763c [openaddresses] Increasing limit to 3 characters for unit abbreviations in case anything clashes (not a huge issue if a few units are tacked on, but this seems more common in OpenAddresses than OSM) 2016-08-26 23:43:53 -04:00
Al
12d429b63d [openaddresses] Simple regex-based method to strip unit phrases tacked onto the end of a street 2016-08-26 22:39:13 -04:00
Al
318ad2a0c4 [openaddresses] Removing <Null> tag from values in OpenAddresses, seeing it in Colorado county files 2016-08-26 21:42:00 -04:00
Al
0f9e8ee95d [openaddresses] Better handling of float postcodes 2016-08-26 20:16:04 -04:00
Al
56329439af [openaddresses] some postcodes in OpenAddresses are stored as floats, convert to int and then to string if that's the case 2016-08-26 19:12:48 -04:00
Al
2b9d58dcbe [openaddresses] Ignoring fields with null-like values as well (there appear to be no valid places named Null or None...yet) 2016-08-26 15:49:36 -04:00
Al
2654683af4 [openaddresses] Adding quick-and-dirty regex-based exclusion list for fields containing various patterns in OpenAddresses, to be used sparingly 2016-08-26 15:35:51 -04:00
Al
4e9f9e8957 [openaddresses] Replace multiple spaces with single space 2016-08-26 12:45:49 -04:00
Al
9e89147c83 [openaddresses] removing spaces in numeric ranges in OpenAddresses, sometimes see things like '12 -23' 2016-08-26 12:30:15 -04:00
Al
3b2c86d240 [fix] strip values in OpenAddresses components 2016-08-26 10:24:34 -04:00
Al
b2f8180d19 [openaddresses] Ignore any fields in OpenAddresses which have N/A as a value 2016-08-25 23:58:38 -04:00
Al
c23a7a4030 [openaddresses] Ditto for numeric boundary names 2016-08-25 22:58:52 -04:00
Al
34b01e203d [openaddresses] Don't allow single-letter boundary names as they're probably just typos 2016-08-25 22:58:26 -04:00
Al
859868aea2 [openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts 2016-08-25 16:36:18 -04:00