Commit Graph

4053 Commits

Author SHA1 Message Date
Al
5118bbc6b9 [places] lower dropout probabilities for country field 2016-11-25 00:44:52 -08:00
Al
5c3ccc3bc6 [places] better handling of population exceptions in places config 2016-11-25 00:38:49 -08:00
Al
4e10dc47e1 [places] adding a few more exceptions to places config, making state/country required for smaller cities 2016-11-24 23:42:16 -08:00
Al
5a8ea5c3b9 [openaddresses] fix for Arapahoe County, CO. Had CO listed as the city 2016-11-24 04:18:46 -05:00
Al
89cacbdb0e [openaddresses] El Paso County, CO 2016-11-24 04:03:21 -05:00
Al
e07c74f077 [fix] config 2016-11-24 03:57:52 -05:00
Al
f72b576f39 [fix] indentation 2016-11-24 03:55:37 -05:00
Al
46b7043dc7 [fix] typo 2016-11-24 03:50:11 -05:00
Al
da882a4195 [names] adding "District Municipality of" to ignorable prefixes 2016-11-24 03:48:54 -05:00
Al
1e1f00670b [openaddresses] fixing some cities that I thought were counties 2016-11-24 03:45:52 -05:00
Al
fcf4717335 [openaddresses] adding city_replacements handling to OA formatter 2016-11-23 20:16:48 -05:00
Al
1ccca9086f [openaddresses] add city_replacements for all files using OSM boundaries (replace with known county or city) 2016-11-23 16:14:25 -05:00
Al
3dc2a922fb [addresses/languages] if there's only one default language and we don't have a road name or a unicode script to disambiguate, assume the default (e.g. English in the US unless there's a Spanish/French road name). Can affect things like state abbreviations 2016-11-22 18:27:54 -05:00
Al
3c5e2afeed [boundaries] exception for Cardiff, Wales 2016-11-22 13:16:21 -05:00
Al
49054932ad [boundaries] adding city replacements for South Africa 2016-11-22 12:05:14 -05:00
Al
ee6edbbd91 [countries] take first encountered country code instead of reversing the components (for cases like Puerto Rico, Hong Kong, etc.) 2016-11-22 11:55:41 -05:00
Al
ee8c070fd5 [osm] override admin_level with other components in config if present 2016-11-22 11:22:26 -05:00
Al
e8a3d256f9 [boundaries] adding more exceptions for some of the UK's unitary authorities that are basically equivalent to city boundaries (smaller towns within the boundary can still override) 2016-11-22 10:47:09 -05:00
Al
aa1f4fdd20 [places] adding section called city_replacements to places config, for countries where something like the state_district/county, suburb or city_district should stand in for the city when one cannot be reverse geocoded (unincorporated county addresses, etc.) 2016-11-22 09:51:04 -05:00
Al
480796f46f [osm] trying representative_point() on the unfixed polygons to capture some cases where the geometry still needs to be fixed before it's valid 2016-11-22 01:28:02 -05:00
Al
bf1928d8c0 [boundaries] making admin_level=6 city for Mexico as municipalities are the main type of boundary found in OSM 2016-11-22 01:03:54 -05:00
Al
f5ac51ab9f [boundaries] fixing config structure for a few countries 2016-11-21 19:06:52 -05:00
Al
9ae30b598b [openaddresses] updating the Canada province-wide data sets to the new format 2016-11-21 18:44:19 -05:00
Al
ff086a6bb9 [boundaries] exception for Calgary, CA 2016-11-21 18:08:46 -05:00
Al
7298c895c8 [utils] adding a chunked shuffle as the concatenated file sizes may get larger than memory 2016-11-21 14:04:34 -05:00
Al
eff0443fcf [openaddresses city_of_flint, not flint 2016-11-20 17:24:23 -05:00
Al
1d05c98cc4 [openaddresses] add Bucks County, PA 2016-11-20 13:02:10 -05:00
Al
a596d03309 [fix] return values 2016-11-19 12:45:39 -05:00
Al
1ef3d073db [dictionaries] adding green to place names 2016-11-19 04:24:25 -05:00
Al
e15036fcce [fix] if there are street types that are not venue words and not vice versa, then call the venue invalid as a standalone term 2016-11-19 04:11:33 -05:00
Al
8e905fd17d [fix] if no venue names are passed in to formatted_addresses_with_venue_names, remove any existing venue name from the components as well 2016-11-19 03:46:16 -05:00
Al
e6fe576ec7 [fix] var 2016-11-19 03:15:23 -05:00
Al
1f50481cad [fix] args 2016-11-19 03:14:06 -05:00
Al
4d14f80f0c [osm] using the new gazetteer methods to do more thorough checks on single house names (if there are no other components than the standalone venue name, make sure it contains venue words like {library, bar}, etc. and not street type words like {road, street}, etc. so we don't get training examples that are simply "Abbey/house Road/house" with no house number or street name). If the venue name equals the street name or house number, drop it. Same if the venue name equals one of the admin components and no house number or street is present. If the venue name is numeric, require both a house number and a street name. 2016-11-19 03:12:24 -05:00
Al
5140db536a [phrases] additions to venue names dictionaries and a more restrictive version of street types dictionaries 2016-11-19 02:58:27 -05:00
Al
71be0fdfbc [fix] sets 2016-11-19 02:30:40 -05:00
Al
b6f7b5b577 [fix] name 2016-11-19 01:38:15 -05:00
Al
de9bf29af0 [addresses] allowing osm_components argument to AddressComponents.expanded 2016-11-19 01:38:02 -05:00
Al
1df1b60a9f [phrases] adding extract_phrases method to gazetteers, which returns a set of gazetteer phrases found in a given string 2016-11-18 23:35:44 -05:00
Al
8ef8d88186 [fix] don't short-circuit OSM address formatting unless there are no components and no venue names 2016-11-18 23:31:24 -05:00
Al
25ceeed6ef [fix] check before pop 2016-11-18 18:36:35 -05:00
Al
7a89c6e9ce [osm] removing dependencies for house/venue name (purely numeric names taken care of in osm formatter) 2016-11-18 18:32:44 -05:00
Al
ca89a6ca2e [fix] args 2016-11-18 18:09:48 -05:00
Al
72305975eb [openaddresses] adding Nelson Mandela Bay as a pre-release download 2016-11-18 18:00:42 -05:00
Al
6e73d46097 [fix] typo 2016-11-18 00:50:18 -05:00
Al
4e30a23313 [addresses] Adding toponym abbreviation to the input admin components as well as those obtained through reverse geocoding. Also was doing two random tests before abbreviating toponyms, reducing their frequency in the training data, now correctly using a single test. 2016-11-17 19:53:09 -05:00
Al
a9fdfee2ac [polygons] adding optional test_point for complex polygons with an admin_center, and including admin_center lat/lon as part of the properties 2016-11-17 19:36:32 -05:00
Al
c2ccec70ad [polygons] adding lat/lon props to admin centers 2016-11-17 19:21:31 -05:00
Al
71d535e845 [polygons] using try/except in polygons 2016-11-17 17:38:54 -05:00
Al
d701bb1320 [polygons] only applying the new fix-on-read solution in the OSM admin/subdivision indices 2016-11-17 00:33:06 -05:00