Al
|
934f6247c6
|
[osm] options to build the streets-only training data
|
2017-01-16 15:26:04 -05:00 |
|
Al
|
bb12d0940e
|
[fix] options/docs in osm address training
|
2016-12-10 13:45:37 -05:00 |
|
Al
|
5098599ed6
|
[addresses] remove Quattroshapes/GeoNames cities as they may have problematic names, and in any case we have point-based cities from OSM now
|
2016-12-10 02:08:40 -05:00 |
|
Al
|
da36b71829
|
[addresses] adding new places index in OSM and OpenAddresses training data
|
2016-12-05 18:36:17 -05:00 |
|
Al
|
7b3a59878c
|
[fix] bracket
|
2016-10-05 14:27:24 -04:00 |
|
Al
|
432f9dd42e
|
[fix] format of candidate_languages in the new OSM rtree
|
2016-10-05 03:12:07 -04:00 |
|
Al
|
faf418decb
|
[languages] using country_and_languages method in OSM, neighborhoods and OpenAddresses
|
2016-10-05 02:49:55 -04:00 |
|
Al
|
d281e71d2c
|
[fix] removing metro station indexas a dependency for AddressComponents
|
2016-08-22 15:52:27 -04:00 |
|
Al
|
145af9331e
|
[osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time
|
2016-08-17 18:11:55 -04:00 |
|
Al
|
e35649f09d
|
[fix] import
|
2016-08-06 20:01:38 -04:00 |
|
Al
|
0edfbe0d61
|
[osm] Adding metro stations index to training data options
|
2016-08-06 19:52:21 -04:00 |
|
Al
|
ffece04855
|
[osm] Place training data from OSM script
|
2016-07-25 02:45:16 -04:00 |
|
Al
|
73b2aec25e
|
[fix] input file
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
51831e2111
|
[fix] add ways db dir
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
0a912766e4
|
[fix] logging for intersections data
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
8aada7086f
|
[intersections] intersections training data
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
11d1acc3bc
|
[parser] Sample chain store alternate names from the cross-language dictionary
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
5ea570835e
|
[fix] args again
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
7c41d84d8f
|
[fix] args
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
2e4ba6e6cc
|
[subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
91db1ec371
|
[fix] removing unnecessary vars
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
bce7004ed7
|
[fix] import
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e57783ff5f
|
[fix] constructor
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
677a86224e
|
[fix] cli arg name
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
d04a026528
|
[fix] no need to init language, etc. in new script
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
611002ea7a
|
[fix] cleaning up imports
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
a96e5760a9
|
[osm] Same great training script, only shorter
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
00ce71223f
|
[osm] Using the default probabilities for abbreviations in ways training data
|
2016-01-24 00:53:41 -05:00 |
|
Al
|
bab7a0f961
|
[osm] splitting streets (way names) on semicolons
|
2016-01-24 00:42:25 -05:00 |
|
Al
|
7646adfc0f
|
[osm] Adding abbreviated street names in addition to the originals
|
2016-01-23 23:23:58 -05:00 |
|
Al
|
67130383ce
|
[fix] converting semicolons to commas in OSM house numbers and picking one at random
|
2016-01-23 23:16:19 -05:00 |
|
Al
|
1bb797f783
|
[fix] spacing in phrases
|
2016-01-23 21:59:49 -05:00 |
|
Al
|
3a8c3dfcf6
|
[fix] spacing in phrases at end of string
|
2016-01-23 21:51:40 -05:00 |
|
Al
|
78450bfad9
|
[fix] Spaces in abbreviation
|
2016-01-23 21:36:20 -05:00 |
|
Al
|
308ceb5a5f
|
[fix] convert UTF8 slices back to unicode before using with the Python trie
|
2016-01-23 20:20:23 -05:00 |
|
Al
|
5eb6bb309b
|
[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string
|
2016-01-23 20:09:45 -05:00 |
|
Al
|
d61207e95a
|
[fix] var name
|
2016-01-23 18:01:02 -05:00 |
|
Al
|
e44cba1d06
|
[fix] geonames db not required in OSM training data
|
2016-01-23 17:59:55 -05:00 |
|
Al
|
4f03711e60
|
[osm] Adding abbreviated training examples to ways language training data
|
2016-01-23 14:10:47 -05:00 |
|
Al
|
c9fb4ee69d
|
[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used
|
2016-01-22 17:58:24 -05:00 |
|
Al
|
ea9bb3f2d5
|
[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled
|
2016-01-22 15:48:21 -05:00 |
|
Al
|
f9f6558e06
|
[fix] simple whitespace field splits for the limited format training data (used for language classification)
|
2016-01-22 04:34:42 -05:00 |
|
Al
|
cd1db7b288
|
[fix] Making sure rare components are dropped first, adding state and country back in
|
2016-01-22 04:17:19 -05:00 |
|
Al
|
adc3a00264
|
[fix] var name
|
2016-01-22 04:10:16 -05:00 |
|
Al
|
261beffa36
|
[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities
|
2016-01-22 04:00:45 -05:00 |
|
Al
|
a6cc3d0114
|
[fix] Adding state to the more frequently dropped components
|
2016-01-22 03:56:38 -05:00 |
|
Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
6ac72576bc
|
[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK
|
2016-01-22 02:56:39 -05:00 |
|