Commit Graph

178 Commits

Author SHA1 Message Date
Al
1d288954d7 [osm] Fixing an issue in the training data with house numbers in OSM (seen mostly in Uruguay) where a comma separated list of house numbers is entered. 2015-12-10 18:46:28 -05:00
Al
779298360c [osm] In cases with more than one official language and where the address language can be determined, use it for looking up language-specific OSM polygons 2015-12-09 01:00:59 -05:00
Al
aeb72d7d26 [osm] Randomly select up to n components for state_district OSM boundaries. For all other fields select one name at random 2015-12-09 00:20:20 -05:00
Al
69a469d9d3 [osm] Choosing a language at random in countries with multilingual addresses for the parser training data so we get some monolingual examples 2015-12-08 20:38:32 -05:00
Al
f8a3081d0f [fix] city name in OSM formatting 2015-12-07 02:33:12 -05:00
Al
b25a738000 [osm] Doing more deduping in the OSM training data to avoid confusing the parser when city, state, district all have the same name 2015-12-06 16:14:02 -05:00
Al
5fcb6d2c30 [fix] typo 2015-12-05 16:23:58 -05:00
Al
3a7ba0288f [fix] .get 2015-12-05 16:13:15 -05:00
Al
c92a6de477 [fix] name 2015-12-05 15:49:50 -05:00
Al
2a4210f93f [osm] Stripping standard city prefixes/suffies e.g. Township of 2015-12-05 15:42:22 -05:00
Al
f41158b8b3 [osm] Avoid using the alternate name (e.g. Brooklyn instead of Kings County) when it is the same as city 2015-12-05 14:21:07 -05:00
Al
7c26317903 [fix] osm components 2015-12-03 19:30:15 -05:00
Al
42a8890652 [osm] Only removing local language city if there are prior components from OSM 2015-12-03 19:11:03 -05:00
Al
5af95ee613 [osm] Adding GeoNames abbreviated city names in a small percentage of cases to get variations like NYC, BK, SF, etc. in the training data 2015-12-03 18:00:05 -05:00
Al
218361f43f [osm] Removing multilinestring boundaries from OSM polygon index (often partial boundaries e.g. France-Germany) 2015-12-03 00:51:09 -05:00
Al
8484d4fffd [fix] venue names should be removed probabilistically in the training data, giving neighborhoods a slightly better chance of being included 2015-11-30 23:28:12 -05:00
Al
6ef40c1769 [fix] dupe checking 2015-11-30 18:43:11 -05:00
Al
af170de019 [fix] Smaller probabilities on adding neighborhoods and admin polygons, eliminating duplicates on the row level 2015-11-30 18:35:31 -05:00
Al
621fd79002 [fix] var 2015-11-30 18:20:26 -05:00
Al
b430fb7657 [osm/formatting] Adding pick random name logic to neighborhoods as well, getting rid of drop probabilities as they're covered elsewhere, adding several forms of venue names to the training data 2015-11-30 18:10:18 -05:00
Al
839a12b212 [osm/formatting] Changing drop probabilities and doing it in random order 2015-11-30 15:27:35 -05:00
Al
89677d94a3 [parsing] Initial commit of the address parser, training/testing, feature function, I/O 2015-11-30 14:48:13 -05:00
Al
9a8ba14887 [osm/formatting] Adding per-field drop probabilities to OSM training data to make some fields more likely to be dropped, although it might create more training data 2015-11-30 11:10:12 -05:00
Al
15d9e00121 [osm/formatting] Adding in more ISO alpha-3 codes for countries in the training data 2015-11-28 14:08:07 -05:00
Al
66778737ff [fix] non-local language states 2015-11-28 13:48:59 -05:00
Al
69ba631dc9 [docs] updating params in OSM training data docs 2015-11-28 01:09:14 -05:00
Al
3cd1fee89d [fix] KeyError 2015-11-27 14:40:11 -05:00
Al
a77bc03977 [fix] language 2015-11-27 14:24:32 -05:00
Al
38d4e2d67a [fix] cities 2015-11-27 14:05:53 -05:00
Al
3cf98770e3 [fix] var name 2015-11-27 13:54:38 -05:00
Al
2e0f35b13a [fix] key checks for Quattroshapes cities, removing city in non-local language case 2015-11-27 13:45:51 -05:00
Al
105ba313c5 [fix] var name 2015-11-27 12:00:11 -05:00
Al
3eea355352 [fix] argument order 2015-11-27 11:47:39 -05:00
Al
51f6a82727 [fix] import again 2015-11-27 11:38:40 -05:00
Al
644eeb74c6 [fix] import 2015-11-27 11:17:53 -05:00
Al
2830986073 [osm/formatting] Adding in cities from Quattroshapes/GeoNames in the case of non-local languages or in general with a small random probability 2015-11-27 11:09:12 -05:00
Al
a50c971732 [polygons/osm] Ommitting last node in every way of a connected component since that node is equal to the start node of its neighbor 2015-11-25 17:09:19 -05:00
Al
3217fa39cd [fix] add country randomly in the formatted language training data in cases where country is not present 2015-11-25 14:54:41 -05:00
Al
5781813cbd [fix] For countries like Denmark, removing country with a smaller probability 2015-11-25 00:39:52 -05:00
Al
e4b8349d98 [fix] sparsity of country tags should be enough for language address training data 2015-11-25 00:32:01 -05:00
Al
824c779107 [fix] Cutting down training repeatedly on country names 2015-11-24 23:22:57 -05:00
Al
88529d28e2 [fix] country formatting in language address training data 2015-11-24 23:20:31 -05:00
Al
cd74fcda3c [fix] not requiring minimal keys in format language data 2015-11-24 23:13:28 -05:00
Al
e560e53308 [fix] formatter 2015-11-24 22:27:57 -05:00
Al
8c422a6e61 [osm] Adding new localized country names in anguage training data for formatted addresses 2015-11-24 21:49:10 -05:00
Al
e40ca0bb89 [fix] Removing house numbers from formatted address language training data, using a simple whitespace splitter 2015-11-24 21:15:22 -05:00
Al
ef9c5c2ca1 [fix] args 2015-11-24 11:02:35 -05:00
Al
e75c1ce860 [fix] limited addresses 2015-11-24 11:01:22 -05:00
Al
94039f98ad [fix] argument validation in OSM training data script 2015-11-24 10:59:16 -05:00
Al
6d20d7348f [osm] Using OSM namespaced tags from polygons in the case of non-local languages 2015-11-23 14:42:30 -05:00