Commit Graph

199 Commits

Author SHA1 Message Date
Al
bb12d0940e [fix] options/docs in osm address training 2016-12-10 13:45:37 -05:00
Al
5098599ed6 [addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now 2016-12-10 02:08:40 -05:00
Al
da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data 2016-12-05 18:36:17 -05:00
Al
7b3a59878c [fix] bracket 2016-10-05 14:27:24 -04:00
Al
432f9dd42e [fix] format of candidate_languages in the new OSM rtree 2016-10-05 03:12:07 -04:00
Al
faf418decb [languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses 2016-10-05 02:49:55 -04:00
Al
d281e71d2c [fix] removing metro station indexas a dependency for AddressComponents 2016-08-22 15:52:27 -04:00
Al
145af9331e [osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time 2016-08-17 18:11:55 -04:00
Al
e35649f09d [fix] import 2016-08-06 20:01:38 -04:00
Al
0edfbe0d61 [osm] Adding metro stations index to training data options 2016-08-06 19:52:21 -04:00
Al
ffece04855 [osm] Place training data from OSM script 2016-07-25 02:45:16 -04:00
Al
73b2aec25e [fix] input file 2016-07-21 17:04:57 -04:00
Al
51831e2111 [fix] add ways db dir 2016-07-21 17:04:57 -04:00
Al
0a912766e4 [fix] logging for intersections data 2016-07-21 17:04:57 -04:00
Al
8aada7086f [intersections] intersections training data 2016-07-21 17:04:57 -04:00
Al
11d1acc3bc [parser] Sample chain store alternate names from the cross-language dictionary 2016-07-21 17:04:57 -04:00
Al
5ea570835e [fix] args again 2016-07-21 17:04:57 -04:00
Al
7c41d84d8f [fix] args 2016-07-21 17:04:57 -04:00
Al
2e4ba6e6cc [subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone 2016-07-21 17:04:57 -04:00
Al
91db1ec371 [fix] removing unnecessary vars 2016-07-21 17:04:57 -04:00
Al
bce7004ed7 [fix] import 2016-07-21 17:04:57 -04:00
Al
e57783ff5f [fix] constructor 2016-07-21 17:04:57 -04:00
Al
677a86224e [fix] cli arg name 2016-07-21 17:04:57 -04:00
Al
d04a026528 [fix] no need to init language, etc. in new script 2016-07-21 17:04:57 -04:00
Al
611002ea7a [fix] cleaning up imports 2016-07-21 17:04:57 -04:00
Al
a96e5760a9 [osm] Same great training script, only shorter 2016-07-21 17:04:57 -04:00
Al
00ce71223f [osm] Using the default probabilities for abbreviations in ways training data 2016-01-24 00:53:41 -05:00
Al
bab7a0f961 [osm] splitting streets (way names) on semicolons 2016-01-24 00:42:25 -05:00
Al
7646adfc0f [osm] Adding abbreviated street names in addition to the originals 2016-01-23 23:23:58 -05:00
Al
67130383ce [fix] converting semicolons to commas in OSM house numbers and picking one at random 2016-01-23 23:16:19 -05:00
Al
1bb797f783 [fix] spacing in phrases 2016-01-23 21:59:49 -05:00
Al
3a8c3dfcf6 [fix] spacing in phrases at end of string 2016-01-23 21:51:40 -05:00
Al
78450bfad9 [fix] Spaces in abbreviation 2016-01-23 21:36:20 -05:00
Al
308ceb5a5f [fix] convert UTF8 slices back to unicode before using with the Python trie 2016-01-23 20:20:23 -05:00
Al
5eb6bb309b [fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string 2016-01-23 20:09:45 -05:00
Al
d61207e95a [fix] var name 2016-01-23 18:01:02 -05:00
Al
e44cba1d06 [fix] geonames db not required in OSM training data 2016-01-23 17:59:55 -05:00
Al
4f03711e60 [osm] Adding abbreviated training examples to ways language training data 2016-01-23 14:10:47 -05:00
Al
c9fb4ee69d [osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used 2016-01-22 17:58:24 -05:00
Al
ea9bb3f2d5 [fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled 2016-01-22 15:48:21 -05:00
Al
f9f6558e06 [fix] simple whitespace field splits for the limited format training data (used for language classification) 2016-01-22 04:34:42 -05:00
Al
cd1db7b288 [fix] Making sure rare components are dropped first, adding state and country back in 2016-01-22 04:17:19 -05:00
Al
adc3a00264 [fix] var name 2016-01-22 04:10:16 -05:00
Al
261beffa36 [fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities 2016-01-22 04:00:45 -05:00
Al
a6cc3d0114 [fix] Adding state to the more frequently dropped components 2016-01-22 03:56:38 -05:00
Al
bca3dae004 [fix] state full name probabilities for limited vs. full formatted OSM training sets 2016-01-22 03:54:20 -05:00
Al
d1cf253092 [osm/formatting] Higher probability of dropout for rare components like counties, etc. 2016-01-22 03:39:35 -05:00
Al
b22646ee30 [mv] Moving gazetteers into their own module 2016-01-22 03:15:56 -05:00
Al
6ac72576bc [osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK 2016-01-22 02:56:39 -05:00
Al
1d288954d7 [osm] Fixing an issue in the training data with house numbers in OSM (seen mostly in Uruguay) where a comma separated list of house numbers is entered. 2015-12-10 18:46:28 -05:00