Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
6ac72576bc
|
[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK
|
2016-01-22 02:56:39 -05:00 |
|
Al
|
3262d2ccd3
|
[fix] arg count
|
2016-01-19 03:16:14 -05:00 |
|
Al
|
19a5541a85
|
[polygons/osm] append polygon nodes by vertices that connect to each other
|
2016-01-16 21:20:49 -05:00 |
|
Al
|
1d288954d7
|
[osm] Fixing an issue in the training data with house numbers in OSM (seen mostly in Uruguay) where a comma separated list of house numbers is entered.
|
2015-12-10 18:46:28 -05:00 |
|
Al
|
779298360c
|
[osm] In cases with more than one official language and where the address language can be determined, use it for looking up language-specific OSM polygons
|
2015-12-09 01:00:59 -05:00 |
|
Al
|
aeb72d7d26
|
[osm] Randomly select up to n components for state_district OSM boundaries. For all other fields select one name at random
|
2015-12-09 00:20:20 -05:00 |
|
Al
|
69a469d9d3
|
[osm] Choosing a language at random in countries with multilingual addresses for the parser training data so we get some monolingual examples
|
2015-12-08 20:38:32 -05:00 |
|
Al
|
f8a3081d0f
|
[fix] city name in OSM formatting
|
2015-12-07 02:33:12 -05:00 |
|
Al
|
b25a738000
|
[osm] Doing more deduping in the OSM training data to avoid confusing the parser when city, state, district all have the same name
|
2015-12-06 16:14:02 -05:00 |
|
Al
|
5fcb6d2c30
|
[fix] typo
|
2015-12-05 16:23:58 -05:00 |
|
Al
|
3a7ba0288f
|
[fix] .get
|
2015-12-05 16:13:15 -05:00 |
|
Al
|
c92a6de477
|
[fix] name
|
2015-12-05 15:49:50 -05:00 |
|
Al
|
2a4210f93f
|
[osm] Stripping standard city prefixes/suffies e.g. Township of
|
2015-12-05 15:42:22 -05:00 |
|
Al
|
f41158b8b3
|
[osm] Avoid using the alternate name (e.g. Brooklyn instead of Kings County) when it is the same as city
|
2015-12-05 14:21:07 -05:00 |
|
Al
|
7c26317903
|
[fix] osm components
|
2015-12-03 19:30:15 -05:00 |
|
Al
|
42a8890652
|
[osm] Only removing local language city if there are prior components from OSM
|
2015-12-03 19:11:03 -05:00 |
|
Al
|
5af95ee613
|
[osm] Adding GeoNames abbreviated city names in a small percentage of cases to get variations like NYC, BK, SF, etc. in the training data
|
2015-12-03 18:00:05 -05:00 |
|
Al
|
218361f43f
|
[osm] Removing multilinestring boundaries from OSM polygon index (often partial boundaries e.g. France-Germany)
|
2015-12-03 00:51:09 -05:00 |
|
Al
|
8484d4fffd
|
[fix] venue names should be removed probabilistically in the training data, giving neighborhoods a slightly better chance of being included
|
2015-11-30 23:28:12 -05:00 |
|
Al
|
6ef40c1769
|
[fix] dupe checking
|
2015-11-30 18:43:11 -05:00 |
|
Al
|
af170de019
|
[fix] Smaller probabilities on adding neighborhoods and admin polygons, eliminating duplicates on the row level
|
2015-11-30 18:35:31 -05:00 |
|
Al
|
621fd79002
|
[fix] var
|
2015-11-30 18:20:26 -05:00 |
|
Al
|
b430fb7657
|
[osm/formatting] Adding pick random name logic to neighborhoods as well, getting rid of drop probabilities as they're covered elsewhere, adding several forms of venue names to the training data
|
2015-11-30 18:10:18 -05:00 |
|
Al
|
839a12b212
|
[osm/formatting] Changing drop probabilities and doing it in random order
|
2015-11-30 15:27:35 -05:00 |
|
Al
|
89677d94a3
|
[parsing] Initial commit of the address parser, training/testing, feature function, I/O
|
2015-11-30 14:48:13 -05:00 |
|
Al
|
9a8ba14887
|
[osm/formatting] Adding per-field drop probabilities to OSM training data to make some fields more likely to be dropped, although it might create more training data
|
2015-11-30 11:10:12 -05:00 |
|
Al
|
15d9e00121
|
[osm/formatting] Adding in more ISO alpha-3 codes for countries in the training data
|
2015-11-28 14:08:07 -05:00 |
|
Al
|
66778737ff
|
[fix] non-local language states
|
2015-11-28 13:48:59 -05:00 |
|
Al
|
69ba631dc9
|
[docs] updating params in OSM training data docs
|
2015-11-28 01:09:14 -05:00 |
|
Al
|
3cd1fee89d
|
[fix] KeyError
|
2015-11-27 14:40:11 -05:00 |
|
Al
|
a77bc03977
|
[fix] language
|
2015-11-27 14:24:32 -05:00 |
|
Al
|
38d4e2d67a
|
[fix] cities
|
2015-11-27 14:05:53 -05:00 |
|
Al
|
3cf98770e3
|
[fix] var name
|
2015-11-27 13:54:38 -05:00 |
|
Al
|
2e0f35b13a
|
[fix] key checks for Quattroshapes cities, removing city in non-local language case
|
2015-11-27 13:45:51 -05:00 |
|
Al
|
105ba313c5
|
[fix] var name
|
2015-11-27 12:00:11 -05:00 |
|
Al
|
3eea355352
|
[fix] argument order
|
2015-11-27 11:47:39 -05:00 |
|
Al
|
51f6a82727
|
[fix] import again
|
2015-11-27 11:38:40 -05:00 |
|
Al
|
644eeb74c6
|
[fix] import
|
2015-11-27 11:17:53 -05:00 |
|
Al
|
2830986073
|
[osm/formatting] Adding in cities from Quattroshapes/GeoNames in the case of non-local languages or in general with a small random probability
|
2015-11-27 11:09:12 -05:00 |
|
Al
|
a50c971732
|
[polygons/osm] Ommitting last node in every way of a connected component since that node is equal to the start node of its neighbor
|
2015-11-25 17:09:19 -05:00 |
|
Al
|
3217fa39cd
|
[fix] add country randomly in the formatted language training data in cases where country is not present
|
2015-11-25 14:54:41 -05:00 |
|
Al
|
5781813cbd
|
[fix] For countries like Denmark, removing country with a smaller probability
|
2015-11-25 00:39:52 -05:00 |
|
Al
|
e4b8349d98
|
[fix] sparsity of country tags should be enough for language address training data
|
2015-11-25 00:32:01 -05:00 |
|
Al
|
824c779107
|
[fix] Cutting down training repeatedly on country names
|
2015-11-24 23:22:57 -05:00 |
|
Al
|
88529d28e2
|
[fix] country formatting in language address training data
|
2015-11-24 23:20:31 -05:00 |
|
Al
|
cd74fcda3c
|
[fix] not requiring minimal keys in format language data
|
2015-11-24 23:13:28 -05:00 |
|
Al
|
e560e53308
|
[fix] formatter
|
2015-11-24 22:27:57 -05:00 |
|