Commit Graph

3531 Commits

Author SHA1 Message Date
Al
813f29f299 [osm] Removing the call to normalize_place_names in place data formatting as we should be able to trust the places more than the addresses 2016-08-02 16:29:34 -04:00
Al
0ab3b13b75 [osm] Remove hanging commas, slashes, etc. Implementing a stricter rule for user-specified tags (not reverse geocoded) so that if they contain an unknown phrase followed by an unknown boundary phrase, we delete that tag and fall back to the reverse geocoded components. Moving CLDR country tagging to later in the process since those are known correct names. 2016-08-02 16:25:45 -04:00
Al
97a2436ad7 [tokenization] Adding two more sets to token_types for punctuation and non-alphanumerics 2016-08-02 16:24:01 -04:00
Al
c40ad99ec7 [osm] removing postcode phrase from place training data and adding CLDR countries only after all the other normalizations 2016-08-02 14:52:12 -04:00
Al
5117fb21d3 [fix] access 2016-08-02 03:20:42 -04:00
Al
bd780d3424 [fix] typo 2016-08-02 03:19:22 -04:00
Al
c74d883344 [fix] unindent 2016-08-02 03:17:42 -04:00
Al
f29d043544 [places] Using all of the ideas that apply to places from address formatting for the places-only data set 2016-08-02 03:16:08 -04:00
Al
4ab60cd4fc [osm] Remove boundary names with trailing commas 2016-08-02 03:13:05 -04:00
Al
12466b12dc [osm] Removing boundary names (not including postal codes) which are simply digits 2016-08-02 02:17:25 -04:00
Al
a1f0c1a3c9 [fix] import 2016-08-02 01:50:17 -04:00
Al
818bd50105 [fix] unit phrase should return None if there's no config available for a particular zone type (again enforcing the idea that venues typically don't have sub-building information) 2016-08-01 18:29:32 -04:00
Al
e11c723f8b [fix] var rename 2016-08-01 17:50:00 -04:00
Al
79ce922432 [osm] Fixing sub-building components so generated numbers are not added to the address components unless cls.phrase returns non-None 2016-08-01 17:44:23 -04:00
Al
4c8b662648 [fix] block numbers 2016-08-01 14:36:28 -04:00
Al
1fb8185b75 [osm/boundaries] Allowing OSM entities to map to NULL 2016-08-01 00:52:58 -04:00
Al
fa003ca430 [fix] indentation in boundaries configs 2016-08-01 00:52:10 -04:00
Al
2faffc81e7 [fix] import 2016-08-01 00:06:47 -04:00
Al
5edc60299c [fix] Bulgarian category probabilities 2016-07-31 22:50:48 -04:00
Al
973ac42a97 [test] Checking probability distributions as part of the address config tests 2016-07-31 22:29:21 -04:00
Al
3ead069b1b [fix] Romanian staircase probability 2016-07-31 22:28:31 -04:00
Al
3505af4bc1 [fix] don't add phrases for non-numeric existing components 2016-07-31 22:14:37 -04:00
Al
d3e50fc894 [fix] NULL-phrase first ordering 2016-07-31 22:10:25 -04:00
Al
afbb79b81d [osm/parser] Making a much lower probability of generating sub-building components for named venues (usually on the ground floor, etc.) 2016-07-31 20:40:44 -04:00
Al
b727078be5 [fix] use alphanumeric in generated component configs by default 2016-07-31 20:39:22 -04:00
Al
2e92c6fcc8 [fix] Probabilities for Ukrainian house numbers 2016-07-31 20:01:42 -04:00
Al
0f3c4276b4 [fix] args 2016-07-31 19:53:39 -04:00
Al
0827caf578 [fix] sample=true 2016-07-31 19:51:03 -04:00
Al
3871869d4b [osm] Check that OSM venue names contain at least one word-like token 2016-07-31 19:50:45 -04:00
Al
ce17b50064 [fix] canonical probability 2016-07-31 19:16:46 -04:00
Al
0bdcae252f [fix] building tag updates 2016-07-31 18:43:55 -04:00
Al
3a19506121 [fix] containing ids 2016-07-31 18:30:58 -04:00
Al
d04a627e92 [fix] KeyError 2016-07-31 18:29:29 -04:00
Al
92b8566930 [places] Increase probability of state and decrease probability of county for smaller ciites/towns 2016-07-31 03:26:34 -04:00
Al
3f450054f9 [fix] numeric conditions in place config 2016-07-31 03:15:43 -04:00
Al
99333d58ca [fix] conditions in place config 2016-07-31 03:09:51 -04:00
Al
cec4914233 [openaddresses] In some OpenAddresses data sets, the house number is just a copy of the street name, so eliminate non-numeric house numbers to be safe 2016-07-31 01:12:04 -04:00
Al
f8e9d39e12 [places] Implementing population-based place components in both place and address component expansion 2016-07-30 19:15:03 -04:00
Al
bb91a5b0f0 [places] For the US, add state_district (county) with higher probability for towns with higher populations. Helps with cases that would be difficult to get right otherwise like Brooklyn, Cattaraugus County, NY (http://www.openstreetmap.org/node/158644800) 2016-07-30 18:57:28 -04:00
Al
ebaef4d671 [places] Implementation of population-based exceptions for adding OSM boundary components 2016-07-30 18:52:55 -04:00
Al
20aad99a38 [parser] enum just lists boundary types 2016-07-30 17:07:23 -04:00
Al
965bac1833 [trie] Making methods to construct string phrases from phrase matches available through trie_search.h 2016-07-30 17:06:20 -04:00
Al
469332ffc4 [osm/polygons] Reducing cache_size to 250k now that the polygons are larger 2016-07-30 16:44:59 -04:00
Al
5bfc29d3f6 [osm/places] Using num_references / 2 for non-default languages and min_references / 2 for alternate name tags 2016-07-30 12:46:54 -04:00
Al
3d20bd13c3 [osm] Add population to reverse geocoder properties 2016-07-30 12:25:39 -04:00
Al
a45ff88f5f [osm/polygons] Don't simplify OSM polygons, might have memory 2016-07-29 12:53:13 -04:00
Al
f8c8d05997 [fix] same thing for the exception countries 2016-07-29 12:47:08 -04:00
Al
045eab8e58 [osm] Making ISO codes lower probability for reverse geocoded country as well 2016-07-29 12:30:32 -04:00
Al
09b16d954f [osm] Use much lower probability of ISO country codes 2016-07-29 11:41:39 -04:00
Al
9dc52ea3c4 [osm] Add more English + non-local language names for places in OSM 2016-07-29 10:31:26 -04:00