Commit Graph

1520 Commits

Author SHA1 Message Date
Al
6284ec39db [fix] name 2016-08-29 00:36:45 -04:00
Al
75ece5f5e9 [fix] import 2016-08-29 00:36:22 -04:00
Al
f5b2b6327e [openaddresses] Using a download script to download the individual OA files of interest rather than the collected file with expansions applied 2016-08-29 00:34:39 -04:00
Al
4d36e2553a [utils] Using curl with redirects and retries for download_file 2016-08-29 00:32:29 -04:00
Al
a0cf6ff225 [openaddresses] Allowing house numbers like "11 C" 2016-08-28 19:11:41 -04:00
Al
ac403bbe49 [openaddresses] Adding sin numero validator (sem numero in this case) for Portuguese 2016-08-28 18:39:19 -04:00
Al
27c5c8536a [openaddresses] adding debug argument to OpenAddresses training data 2016-08-28 17:58:41 -04:00
Al
6740e5a1c6 [fix] var name 2016-08-28 17:55:10 -04:00
Al
7ea47126ba [fix] logging 2016-08-28 15:54:55 -04:00
Al
a58194ca2e [fix] add_admin_boundaries and adding cleaned up house number 2016-08-28 15:15:57 -04:00
Al
bae04eb543 [fix] int 2016-08-28 14:11:25 -04:00
Al
de0a7bfe4f [fix] /or/and/ 2016-08-28 14:09:30 -04:00
Al
51590825ee [fix] do component dropout anyway 2016-08-28 14:07:49 -04:00
Al
44e59e8daf [fix] return the original for already abbreviated tokens 2016-08-28 14:05:58 -04:00
Al
f69e63e311 [openaddresses] Place component dropout. Obtain population from OSM components when we have them but otherwise assume it's actually 0 (not unknown), that way the more conservative probabilities will be used i.e. state names will be included more often rather than unqualified cities 2016-08-28 13:59:28 -04:00
Al
dea5fbbf2e [logging] printing off filenames in constructing OpenAddresses training data 2016-08-28 12:11:53 -04:00
Al
3cf3e401db [fix] abbreviation recasing 2016-08-28 12:04:36 -04:00
Al
3da80b0706 [fix] typo 2016-08-28 11:55:40 -04:00
Al
aa62b8e8b4 [fix] indentation 2016-08-28 11:48:27 -04:00
Al
b8b1ac1261 [openaddresses] Handling validation after cleanup, adding per-field regex replacements 2016-08-28 11:47:30 -04:00
Al
3ae7a15960 [openaddresses] Adding a few special cases for Spanish. Rewrite simple numeric street names to include the oft-omitted Calle (e.g. 27 => Calle 27), which is uniformly omitted in the Spanish-language data in OpenAddresses while still being valid for grid-based cities like Mérida. Humans and signs usually add Calle for numeric streets while it may be omitted for named streets 2016-08-27 15:03:23 -04:00
Al
15f9817933 [openaddresses] Replacing number sign in house number 2016-08-27 02:42:06 -04:00
Al
01ac1371b5 [openaddresses] Cleaning up house numbers as well, which can sometimes be stored as floats 2016-08-27 01:50:05 -04:00
Al
4ed394cc1c [openaddresses] Omitting fields with the value "unknown" 2016-08-27 00:46:21 -04:00
Al
6723fff9b4 [fix] unit phrases 2016-08-27 00:23:51 -04:00
Al
d29e4f3b2e [openaddresses] Adding optional hyphen between unit number 2016-08-26 23:46:19 -04:00
Al
8c6a4c763c [openaddresses] Increasing limit to 3 characters for unit abbreviations in case anything clashes (not a huge issue if a few units are tacked on, but this seems more common in OpenAddresses than OSM) 2016-08-26 23:43:53 -04:00
Al
12d429b63d [openaddresses] Simple regex-based method to strip unit phrases tacked onto the end of a street 2016-08-26 22:39:13 -04:00
Al
318ad2a0c4 [openaddresses] Removing <Null> tag from values in OpenAddresses, seeing it in Colorado county files 2016-08-26 21:42:00 -04:00
Al
0f9e8ee95d [openaddresses] Better handling of float postcodes 2016-08-26 20:16:04 -04:00
Al
56329439af [openaddresses] some postcodes in OpenAddresses are stored as floats, convert to int and then to string if that's the case 2016-08-26 19:12:48 -04:00
Al
2b9d58dcbe [openaddresses] Ignoring fields with null-like values as well (there appear to be no valid places named Null or None...yet) 2016-08-26 15:49:36 -04:00
Al
2654683af4 [openaddresses] Adding quick-and-dirty regex-based exclusion list for fields containing various patterns in OpenAddresses, to be used sparingly 2016-08-26 15:35:51 -04:00
Al
4e9f9e8957 [openaddresses] Replace multiple spaces with single space 2016-08-26 12:45:49 -04:00
Al
9e89147c83 [openaddresses] removing spaces in numeric ranges in OpenAddresses, sometimes see things like '12 -23' 2016-08-26 12:30:15 -04:00
Al
3b2c86d240 [fix] strip values in OpenAddresses components 2016-08-26 10:24:34 -04:00
Al
b2f8180d19 [openaddresses] Ignore any fields in OpenAddresses which have N/A as a value 2016-08-25 23:58:38 -04:00
Al
c23a7a4030 [openaddresses] Ditto for numeric boundary names 2016-08-25 22:58:52 -04:00
Al
34b01e203d [openaddresses] Don't allow single-letter boundary names as they're probably just typos 2016-08-25 22:58:26 -04:00
Al
859868aea2 [openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts 2016-08-25 16:36:18 -04:00
Al
da619e3cf4 [osm] Adding border_type=city to override tags 2016-08-25 15:21:33 -04:00
Al
dd0ca5e008 [addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties) 2016-08-25 01:20:06 -04:00
Al
2e7f8f1ae7 [abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names 2016-08-24 18:52:00 -04:00
Al
dfa5c8e0a6 [abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten 2016-08-24 18:50:24 -04:00
Al
a6dad74a2b [openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets 2016-08-24 15:06:04 -04:00
Al
d250f58293 [openaddresses] Also skipping addresses where street == unit 2016-08-24 14:10:41 -04:00
Al
7c3ad708d8 [openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally 2016-08-24 13:46:56 -04:00
Al
b7c600e496 [openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options 2016-08-23 02:11:21 -04:00
Al
ed0b49884e [openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY 2016-08-23 00:38:43 -04:00
Al
8ec288d8f8 [openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields. 2016-08-23 00:29:09 -04:00