Commit Graph

1666 Commits

Author SHA1 Message Date
Al
3cf3e401db [fix] abbreviation recasing 2016-08-28 12:04:36 -04:00
Al
3da80b0706 [fix] typo 2016-08-28 11:55:40 -04:00
Al
aa62b8e8b4 [fix] indentation 2016-08-28 11:48:27 -04:00
Al
b8b1ac1261 [openaddresses] Handling validation after cleanup, adding per-field regex replacements 2016-08-28 11:47:30 -04:00
Al
3ae7a15960 [openaddresses] Adding a few special cases for Spanish. Rewrite simple numeric street names to include the oft-omitted Calle (e.g. 27 => Calle 27), which is uniformly omitted in the Spanish-language data in OpenAddresses while still being valid for grid-based cities like Mérida. Humans and signs usually add Calle for numeric streets while it may be omitted for named streets 2016-08-27 15:03:23 -04:00
Al
15f9817933 [openaddresses] Replacing number sign in house number 2016-08-27 02:42:06 -04:00
Al
01ac1371b5 [openaddresses] Cleaning up house numbers as well, which can sometimes be stored as floats 2016-08-27 01:50:05 -04:00
Al
4ed394cc1c [openaddresses] Omitting fields with the value "unknown" 2016-08-27 00:46:21 -04:00
Al
6723fff9b4 [fix] unit phrases 2016-08-27 00:23:51 -04:00
Al
d29e4f3b2e [openaddresses] Adding optional hyphen between unit number 2016-08-26 23:46:19 -04:00
Al
8c6a4c763c [openaddresses] Increasing limit to 3 characters for unit abbreviations in case anything clashes (not a huge issue if a few units are tacked on, but this seems more common in OpenAddresses than OSM) 2016-08-26 23:43:53 -04:00
Al
12d429b63d [openaddresses] Simple regex-based method to strip unit phrases tacked onto the end of a street 2016-08-26 22:39:13 -04:00
Al
318ad2a0c4 [openaddresses] Removing <Null> tag from values in OpenAddresses, seeing it in Colorado county files 2016-08-26 21:42:00 -04:00
Al
0f9e8ee95d [openaddresses] Better handling of float postcodes 2016-08-26 20:16:04 -04:00
Al
56329439af [openaddresses] some postcodes in OpenAddresses are stored as floats, convert to int and then to string if that's the case 2016-08-26 19:12:48 -04:00
Al
2b9d58dcbe [openaddresses] Ignoring fields with null-like values as well (there appear to be no valid places named Null or None...yet) 2016-08-26 15:49:36 -04:00
Al
2654683af4 [openaddresses] Adding quick-and-dirty regex-based exclusion list for fields containing various patterns in OpenAddresses, to be used sparingly 2016-08-26 15:35:51 -04:00
Al
4e9f9e8957 [openaddresses] Replace multiple spaces with single space 2016-08-26 12:45:49 -04:00
Al
9e89147c83 [openaddresses] removing spaces in numeric ranges in OpenAddresses, sometimes see things like '12 -23' 2016-08-26 12:30:15 -04:00
Al
3b2c86d240 [fix] strip values in OpenAddresses components 2016-08-26 10:24:34 -04:00
Al
b2f8180d19 [openaddresses] Ignore any fields in OpenAddresses which have N/A as a value 2016-08-25 23:58:38 -04:00
Al
c23a7a4030 [openaddresses] Ditto for numeric boundary names 2016-08-25 22:58:52 -04:00
Al
34b01e203d [openaddresses] Don't allow single-letter boundary names as they're probably just typos 2016-08-25 22:58:26 -04:00
Al
859868aea2 [openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts 2016-08-25 16:36:18 -04:00
Al
da619e3cf4 [osm] Adding border_type=city to override tags 2016-08-25 15:21:33 -04:00
Al
dd0ca5e008 [addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties) 2016-08-25 01:20:06 -04:00
Al
2e7f8f1ae7 [abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names 2016-08-24 18:52:00 -04:00
Al
dfa5c8e0a6 [abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten 2016-08-24 18:50:24 -04:00
Al
a6dad74a2b [openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets 2016-08-24 15:06:04 -04:00
Al
d250f58293 [openaddresses] Also skipping addresses where street == unit 2016-08-24 14:10:41 -04:00
Al
7c3ad708d8 [openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally 2016-08-24 13:46:56 -04:00
Al
b7c600e496 [openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options 2016-08-23 02:11:21 -04:00
Al
ed0b49884e [openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY 2016-08-23 00:38:43 -04:00
Al
8ec288d8f8 [openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields. 2016-08-23 00:29:09 -04:00
Al
99f71b718f [openaddresses] New command-line arguments to OpenAddresses training data script 2016-08-22 22:12:47 -04:00
Al
23be122d2e [openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached. 2016-08-22 22:09:00 -04:00
Al
8b57a7acf2 [osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries 2016-08-22 20:55:35 -04:00
Al
d281e71d2c [fix] removing metro station indexas a dependency for AddressComponents 2016-08-22 15:52:27 -04:00
Al
79c9694e2d [names] Allowing for similarity-only normalization in name affixes 2016-08-22 03:47:08 -04:00
Al
cb4408fea8 [transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string. 2016-08-20 18:17:46 -04:00
Al
85ae5d4a05 [fix] name 2016-08-19 23:38:33 -04:00
Al
7951044d74 [intersections] Abbreviating street names that are not base names with random probabilities 2016-08-19 23:27:29 -04:00
Al
42808c62e3 [fix] dictionary access 2016-08-19 16:02:36 -04:00
Al
41f715d6ee [intersections] Better handling of default languages in intersection queries 2016-08-19 15:59:58 -04:00
Al
a7118b40a7 [intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections 2016-08-19 13:12:02 -04:00
Al
0b2d3d965f [fix] using lat/lon from the node properties in intersections data 2016-08-19 12:23:08 -04:00
Al
294316c721 [intersections] no need to store lat/lon in intersections 2016-08-19 01:58:53 -04:00
Al
9a6ec41ce6 [points] Adding __iter__ and __len__ to point index 2016-08-19 01:01:05 -04:00
Al
f43abe0846 [fix] making cleaned_name a classmethod 2016-08-18 19:55:52 -04:00
Al
defc7ffacc [fix] arg name again 2016-08-18 18:22:06 -04:00