Commit Graph

4306 Commits

Author SHA1 Message Date
Al
e5b84205bc [osm] Use int_name tag and add English boundary names even if only a raw name is available for the original place node 2016-07-25 00:13:21 -04:00
Al
b50cb0cdf9 [osm] add random variations of the containing components' names in building place training data. For places with small or unknown populations, use the default names of the containing components 2016-07-25 00:04:44 -04:00
Al
dbc5957fa6 [fix] reverting, random state abbreviations should be fine 2016-07-24 23:47:30 -04:00
Al
cf84b5727e [osm] always_use_full_names=True for encompassing boundaries on place queries 2016-07-24 23:21:14 -04:00
Al
0fa372f2c0 [fix] tags.get as nodes may not have type/id 2016-07-24 23:04:09 -04:00
Al
273f5ecf58 [fix] language defaults 2016-07-24 23:02:39 -04:00
Al
43e6f2433a [fix] use ISO3166-1:alpha2 2016-07-24 23:00:59 -04:00
Al
53906c4833 [fix] parens 2016-07-24 22:57:58 -04:00
Al
38b76701d8 [osm] Falling back on OSM country/languages if the point doesn't match the Quattroshapes geometry 2016-07-24 22:56:53 -04:00
Al
4b26962793 [osm] Don't return language from node_place_tags as the list of tags contains the various languages already 2016-07-24 22:17:42 -04:00
Al
87a47a825e [fix] var reference before assignment 2016-07-24 22:00:07 -04:00
Al
696448981c [fix] var name 2016-07-24 21:58:56 -04:00
Al
bfb89adaab [osm] use containing ids in component mapping 2016-07-24 21:57:04 -04:00
Al
2a9185874a [fix] component index 2016-07-24 21:55:02 -04:00
Al
1158076154 [fix] default language suffix is '' 2016-07-24 21:34:59 -04:00
Al
60d4fd3102 [fix] another import 2016-07-24 21:31:52 -04:00
Al
648c016b05 [fix] import and return values 2016-07-24 21:30:53 -04:00
Al
09b77b52a6 [osm] Adding place training set. Every place, even nodes, in OSM will get population / 10000 + 1 simple place queries like city + state included in the training set, even if there are no OSM addresses for that city. Where postcodes are available, they'll also be added to the training examples 2016-07-24 20:09:56 -04:00
Al
39c193d52d [osm] Fixing parse_osm_number_range. Only treat it as a range if the number on the right is greater, make letter range parsing optional 2016-07-24 19:49:20 -04:00
Al
4151ce7919 [osm] Adding rail stations to venues data set if they have a street address and a Wikipedia 2016-07-24 14:13:38 -04:00
Al
75d9c31395 [text] Adding NORMALIZE_STRING_COMPOSE constant in pynormalize.c 2016-07-24 03:37:43 -04:00
Al
7b3f4e9175 [text] Adding utils.py for is_numeric/is_numeric_strict 2016-07-24 03:37:11 -04:00
Al Barrentine
65c4688f89 Merge pull request #97 from uberbaud/multipart_edgecase
Don't call `download_multipart` for 1 chunk
2016-07-24 00:03:51 -04:00
Travis
3f0eff228e [auto][ci skip] Adding data files from Travis build #145 2016-07-23 22:28:32 +00:00
Al
bedfd34363 [fix] small change to dictionary so generated file rebuilds 2016-07-23 18:18:36 -04:00
Al
e8beca0971 [fix] ReEscape backslash when escaping dictionary files 2016-07-23 18:16:44 -04:00
Tom Davis
2991ffd193 Don't call download_multipart for 1 chunk
Previously, where a file was larger than `$LARGE_FILE_SIZE` but smaller
than `$CHUNK_SIZE*2`, `download_multipart` would be called but would
only download one (1) chunk that was the whole file.

This fix keeps the same download performance as before but optimizes
processing chunks out.
2016-07-23 16:41:04 -04:00
Al
a620cae6e0 [fix] var 2016-07-23 15:45:07 -04:00
Al
487d589531 [fix] remove var 2016-07-23 15:17:47 -04:00
Al
bfc75912bc [fix] Only skip Quattroshapes matching if place=neighborhood 2016-07-23 15:15:23 -04:00
Al
26225ee8bb [osm] Removing rail stations from venues and making them a separate data set for reverse geocoding, fixing building!=yes query, should not include records with no building tag at all 2016-07-23 03:57:05 -04:00
Al
83f39a3dc5 [fix] removing print 2016-07-23 03:26:02 -04:00
Al
2a634797ec [fix] make sure values are hashable in mapping OSM components 2016-07-23 03:04:23 -04:00
Al
31db378303 [fix] var name 2016-07-23 02:32:30 -04:00
Al
53f6053ec6 [fix] var names in osm component mapping 2016-07-23 02:05:41 -04:00
Al
ba507fded0 [fix] OSM component mapping in neighborhoods index 2016-07-23 01:44:43 -04:00
Al Barrentine
e3eaa9efaf Merge pull request #93 from uberbaud/no_seq
Remove call to `seq` which may not exist
2016-07-23 01:16:57 -04:00
Al
a3e11974e6 [fix] import 2016-07-23 01:07:43 -04:00
Tom Davis
24e0314e71 Remove call to seq which may not exist 2016-07-23 01:03:15 -04:00
Al
d18362056f [fix] typo 2016-07-23 00:32:48 -04:00
Al
ae3ee39709 [fix] Using containing polygons from OSM to determine component type in neighborhoods index 2016-07-22 19:16:44 -04:00
Al
9bf065f8a5 [fix] var 2016-07-22 19:06:12 -04:00
Al
69a491d057 [fix] /house_number/house_numbers/ 2016-07-22 18:59:04 -04:00
Al
9681d4dc8e [merge] 2016-07-22 18:55:55 -04:00
Al
c8e426a94d [osm] If sub-building tags are specified in OSM tags (e.g. addr:floor), only include them if the values are numeric 2016-07-22 18:47:31 -04:00
Al
226dd55a97 [osm] Adding Romaji probability to Japanese config for block/house number phrases 2016-07-22 17:01:15 -04:00
Al
9bece91bd5 [osm] When choosing a namespaced language, alias all namespaced tags, not just the addr:* tags 2016-07-22 14:56:07 -04:00
Al
9a6279d73b [fix] normalize building component tags, not regular tags 2016-07-22 14:54:18 -04:00
Al
b1b797171c [osm] Combining addr:block_number and addr:housenumber in Japan (randomly adds phrases for the 番号/bango system) 2016-07-22 14:52:16 -04:00
Al
06541f5911 [osm] Adding country_region tag to address formatter 2016-07-21 23:38:37 -04:00