Commit Graph

4658 Commits

Author SHA1 Message Date
Al
7bfc52b540 [osm] add postcode phrases when there's no validation/component-stripping 2017-02-11 01:54:55 -05:00
Al
c9ade4a7da [openaddresses] adding postal code country phrases in OpenAddresses as well 2017-02-11 01:48:54 -05:00
Al
f6bea5ebe5 [fix] always validate in comma-separated postcodes 2017-02-11 01:42:19 -05:00
Al
3d9b512cda [fix] pop 2017-02-11 01:28:56 -05:00
Al
e5a98d16d8 [fix] args 2017-02-11 01:15:07 -05:00
Al
01c4c8ec82 [fix] scope 2017-02-11 01:07:20 -05:00
Al
29ba58d68a [fix] var 2017-02-11 01:01:29 -05:00
Al
f0dfd7850c [fix] ignore punctuation in strip_components 2017-02-11 01:00:37 -05:00
Al
f07d93df2c [fix] omitted line 2017-02-11 00:57:47 -05:00
Al
ffc12ec5ab [osm] add new method in OSM formatting to extract one or more expanded postal codes from an addr:postcode tag, using the new country-specific rules 2017-02-11 00:53:52 -05:00
Al
bbcb6444c8 [addresses] add strip_components method which simply removes the names of OSM components from a string (for e.g. postal codes) 2017-02-11 00:07:55 -05:00
Al
4e1d7d9373 [osm] use new postal codes module in OSM formatting 2017-02-10 23:56:23 -05:00
Al
9022fb9149 [places] use country.lower() 2017-02-10 23:54:43 -05:00
Al
a0d674274a [neighborhoods] immutable data structures when loading from JSON 2017-02-10 23:54:24 -05:00
Al
293587bae9 [addresses] adding new config for postal codes around the world. Allows appending the ISO alpha-2 country code to the beginning of the postcode as in e.g. SI-1000 (only used if the postcode begins with a digit). This system was used for postal codes in continental Europe as a recommendation from the CEPT. Now 7 member states still use it, so in those countries add the country-code with higher probability. The config also contains the license plate codes for countries where e.g. L-1234 might be used instead of LU-1234. Allows configuring in which countries postcodes should be validated using Google's per-country validation regexes (and the ability to override with a custom regex), and in which countries other admin component names should be stripped. 2017-02-10 23:53:50 -05:00
Al
109aa76718 [boundaries] mapping Manhattan node ID to city_district so it gets labeled as such in the neighborhoods index 2017-02-10 14:43:53 -05:00
Al
b570855b78 [parser] adding postcode context features and associated data structures to the parser. Masking digits, which should hopefully help with generalization. Creating positive/negative features for postcode with and without context support. Note: even with known postcodes in known contexts, only use the masked digits to avoid creating too many features that are redundant with the index. 2017-02-10 03:41:14 -05:00
Al
9a93e95938 [api] removing geodb from setup functions 2017-02-10 01:02:52 -05:00
Al
ff245d74f8 [parser] building an index of postal codes and their valid admin contexts (city, state, country, etc.) during training e.g. "11216" => ["brooklyn", "ny"]. Postal code phrases like CP in Spanish are removed when constructing the index. 2017-02-10 00:50:48 -05:00
Al
40d1b26e12 [openaddresses] Henderson County, KY 2017-02-09 16:23:43 -05:00
Al
d18c68918d [openaddresses] Kern County, CA 2017-02-09 15:23:16 -05:00
Al
598f15cad8 [openaddresses] city of Vilnius, Lithuania 2017-02-09 15:19:34 -05:00
Al
fa3405fe4d [openaddresses] Scott County, KY 2017-02-09 15:17:20 -05:00
Al
f00625029b [openaddresses] Ajax, ON 2017-02-09 15:16:39 -05:00
Al
ce5826928b [openaddresses] add Saskatoon, SK 2017-02-09 15:11:56 -05:00
Al
1aacb5bccc Merge branch 'master' into parser-data 2017-02-09 15:09:28 -05:00
Al
ea168279bd [fix] free json-encoded string in parser client output 2017-02-09 14:34:15 -05:00
Al
38c6c26146 [fix] freeing normalized string in address_parser_parse 2017-02-09 14:33:13 -05:00
Al
8aa3749cfb [utils] some convenience functions for generic hashtables (incr, get, etc) 2017-02-08 19:01:13 -05:00
Al
a6844c8ec1 [parser] structural changes for postal codes index 2017-02-08 18:52:45 -05:00
Al
7a360f4211 [osm] addr:postcode can be all over the place in OSM. Start with postcodes containing commas or semicolons. If addr:postcode (on address of building) contains either, iterate over the values and pick the first one that matches a postcode validation regex for that country 2017-02-08 16:13:29 -05:00
Al
97ccbef807 [openaddresses] adding Lincoln County, WY 2017-02-08 15:54:11 -05:00
Al
30fba16141 [openaddresses] adding Wuppertal, Germany, Marietta GA, Salem OR, and Atlanta 2017-02-08 15:41:17 -05:00
Al
6e4f641743 [phrases] adding token_phrase_memberships to trie_search for reuse 2017-02-08 01:59:39 -05:00
Al
ae35da8d17 [fix] uninitialized var 2017-02-08 01:58:53 -05:00
Al
3a95af104b [openaddresses] remove add_osm_boundaries from City of Anaheim 2017-02-07 14:37:15 -05:00
Al
dbf7242ea0 [fix] /cls/self/ 2017-02-04 19:12:49 -05:00
Al
35effe4b0b [openaddresses] adding state of Thüringen, Germany 2017-02-04 18:00:13 -05:00
Al
af06270896 [openaddresses] adding ignore regexes for US counties where we use the unit, using non_numeric_units in every case 2017-02-04 15:48:00 -05:00
Al
c600f05f06 [openaddresses] adding Czech Republic to the street not required set 2017-02-04 15:30:46 -05:00
Al
0169448a4d [addresses] adding Central European city district regexes (e.g. Praha 1, Budapest IV, etc.) to country-specific cleanup 2017-02-03 20:54:23 -05:00
Al
1b6263a6e7 [openaddresses] add postcode field to NY statewide 2017-02-02 15:00:34 -05:00
Al
990ce176aa [openaddresses] add language and modern city name to Dnipro, Ukraine 2017-02-02 14:02:59 -05:00
Al
c95d5db290 [openaddresses] ignore US postcodes that are 1-4 digits, usually typos. Reformat where needed 2017-02-02 14:02:06 -05:00
Al
85f03184d5 [openaddresses] moving postcode fixes before validation. Adding regex for validating Russian house numbers in the Ukraine 2017-02-02 11:21:00 -05:00
Al
f6e9c5f709 [openaddresses] adding postcode length=5 (so leading zeros get captured if the field was an integer) for Germany, France, Italy, and Mexico. Adding validation to Volgograd oblast 2017-02-02 11:16:38 -05:00
Al
4fbd99d2c8 [openaddresses] add Tacoma, WA 2017-02-01 20:30:21 -05:00
Al
d4d3407f2c [openaddresses] adding Vernon BC, Churchill County NV, and some of the new Georgia sources 2017-01-31 16:01:39 -05:00
Al
12146b6eeb [openaddresses] adding Nacka, Sweden 2017-01-30 02:41:30 -05:00
Al
0380f565d2 [parser] shorter first word feature 2017-01-29 22:10:28 -05:00