Commit Graph

202 Commits

Author SHA1 Message Date
Al
be6f48f109 [fix] that didn't work, set log level to CRITICAL 2017-02-15 14:06:57 -05:00
Al
26bf617a06 [fix] prevent Shapely from logging to console 2017-02-15 14:00:51 -05:00
Al
934f6247c6 [osm] options to build the streets-only training data 2017-01-16 15:26:04 -05:00
Al
bb12d0940e [fix] options/docs in osm address training 2016-12-10 13:45:37 -05:00
Al
5098599ed6 [addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now 2016-12-10 02:08:40 -05:00
Al
da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data 2016-12-05 18:36:17 -05:00
Al
7b3a59878c [fix] bracket 2016-10-05 14:27:24 -04:00
Al
432f9dd42e [fix] format of candidate_languages in the new OSM rtree 2016-10-05 03:12:07 -04:00
Al
faf418decb [languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses 2016-10-05 02:49:55 -04:00
Al
d281e71d2c [fix] removing metro station indexas a dependency for AddressComponents 2016-08-22 15:52:27 -04:00
Al
145af9331e [osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time 2016-08-17 18:11:55 -04:00
Al
e35649f09d [fix] import 2016-08-06 20:01:38 -04:00
Al
0edfbe0d61 [osm] Adding metro stations index to training data options 2016-08-06 19:52:21 -04:00
Al
ffece04855 [osm] Place training data from OSM script 2016-07-25 02:45:16 -04:00
Al
73b2aec25e [fix] input file 2016-07-21 17:04:57 -04:00
Al
51831e2111 [fix] add ways db dir 2016-07-21 17:04:57 -04:00
Al
0a912766e4 [fix] logging for intersections data 2016-07-21 17:04:57 -04:00
Al
8aada7086f [intersections] intersections training data 2016-07-21 17:04:57 -04:00
Al
11d1acc3bc [parser] Sample chain store alternate names from the cross-language dictionary 2016-07-21 17:04:57 -04:00
Al
5ea570835e [fix] args again 2016-07-21 17:04:57 -04:00
Al
7c41d84d8f [fix] args 2016-07-21 17:04:57 -04:00
Al
2e4ba6e6cc [subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone 2016-07-21 17:04:57 -04:00
Al
91db1ec371 [fix] removing unnecessary vars 2016-07-21 17:04:57 -04:00
Al
bce7004ed7 [fix] import 2016-07-21 17:04:57 -04:00
Al
e57783ff5f [fix] constructor 2016-07-21 17:04:57 -04:00
Al
677a86224e [fix] cli arg name 2016-07-21 17:04:57 -04:00
Al
d04a026528 [fix] no need to init language, etc. in new script 2016-07-21 17:04:57 -04:00
Al
611002ea7a [fix] cleaning up imports 2016-07-21 17:04:57 -04:00
Al
a96e5760a9 [osm] Same great training script, only shorter 2016-07-21 17:04:57 -04:00
Al
00ce71223f [osm] Using the default probabilities for abbreviations in ways training data 2016-01-24 00:53:41 -05:00
Al
bab7a0f961 [osm] splitting streets (way names) on semicolons 2016-01-24 00:42:25 -05:00
Al
7646adfc0f [osm] Adding abbreviated street names in addition to the originals 2016-01-23 23:23:58 -05:00
Al
67130383ce [fix] converting semicolons to commas in OSM house numbers and picking one at random 2016-01-23 23:16:19 -05:00
Al
1bb797f783 [fix] spacing in phrases 2016-01-23 21:59:49 -05:00
Al
3a8c3dfcf6 [fix] spacing in phrases at end of string 2016-01-23 21:51:40 -05:00
Al
78450bfad9 [fix] Spaces in abbreviation 2016-01-23 21:36:20 -05:00
Al
308ceb5a5f [fix] convert UTF8 slices back to unicode before using with the Python trie 2016-01-23 20:20:23 -05:00
Al
5eb6bb309b [fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string 2016-01-23 20:09:45 -05:00
Al
d61207e95a [fix] var name 2016-01-23 18:01:02 -05:00
Al
e44cba1d06 [fix] geonames db not required in OSM training data 2016-01-23 17:59:55 -05:00
Al
4f03711e60 [osm] Adding abbreviated training examples to ways language training data 2016-01-23 14:10:47 -05:00
Al
c9fb4ee69d [osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used 2016-01-22 17:58:24 -05:00
Al
ea9bb3f2d5 [fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled 2016-01-22 15:48:21 -05:00
Al
f9f6558e06 [fix] simple whitespace field splits for the limited format training data (used for language classification) 2016-01-22 04:34:42 -05:00
Al
cd1db7b288 [fix] Making sure rare components are dropped first, adding state and country back in 2016-01-22 04:17:19 -05:00
Al
adc3a00264 [fix] var name 2016-01-22 04:10:16 -05:00
Al
261beffa36 [fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities 2016-01-22 04:00:45 -05:00
Al
a6cc3d0114 [fix] Adding state to the more frequently dropped components 2016-01-22 03:56:38 -05:00
Al
bca3dae004 [fix] state full name probabilities for limited vs. full formatted OSM training sets 2016-01-22 03:54:20 -05:00
Al
d1cf253092 [osm/formatting] Higher probability of dropout for rare components like counties, etc. 2016-01-22 03:39:35 -05:00