Al
|
3d9b512cda
|
[fix] pop
|
2017-02-11 01:28:56 -05:00 |
|
Al
|
e5a98d16d8
|
[fix] args
|
2017-02-11 01:15:07 -05:00 |
|
Al
|
01c4c8ec82
|
[fix] scope
|
2017-02-11 01:07:20 -05:00 |
|
Al
|
29ba58d68a
|
[fix] var
|
2017-02-11 01:01:29 -05:00 |
|
Al
|
f0dfd7850c
|
[fix] ignore punctuation in strip_components
|
2017-02-11 01:00:37 -05:00 |
|
Al
|
f07d93df2c
|
[fix] omitted line
|
2017-02-11 00:57:47 -05:00 |
|
Al
|
ffc12ec5ab
|
[osm] add new method in OSM formatting to extract one or more expanded postal codes from an addr:postcode tag, using the new country-specific rules
|
2017-02-11 00:53:52 -05:00 |
|
Al
|
bbcb6444c8
|
[addresses] add strip_components method which simply removes the names of OSM components from a string (for e.g. postal codes)
|
2017-02-11 00:07:55 -05:00 |
|
Al
|
4e1d7d9373
|
[osm] use new postal codes module in OSM formatting
|
2017-02-10 23:56:23 -05:00 |
|
Al
|
9022fb9149
|
[places] use country.lower()
|
2017-02-10 23:54:43 -05:00 |
|
Al
|
a0d674274a
|
[neighborhoods] immutable data structures when loading from JSON
|
2017-02-10 23:54:24 -05:00 |
|
Al
|
293587bae9
|
[addresses] adding new config for postal codes around the world. Allows appending the ISO alpha-2 country code to the beginning of the postcode as in e.g. SI-1000 (only used if the postcode begins with a digit). This system was used for postal codes in continental Europe as a recommendation from the CEPT. Now 7 member states still use it, so in those countries add the country-code with higher probability. The config also contains the license plate codes for countries where e.g. L-1234 might be used instead of LU-1234. Allows configuring in which countries postcodes should be validated using Google's per-country validation regexes (and the ability to override with a custom regex), and in which countries other admin component names should be stripped.
|
2017-02-10 23:53:50 -05:00 |
|
Al
|
109aa76718
|
[boundaries] mapping Manhattan node ID to city_district so it gets labeled as such in the neighborhoods index
|
2017-02-10 14:43:53 -05:00 |
|
Al
|
b570855b78
|
[parser] adding postcode context features and associated data structures to the parser. Masking digits, which should hopefully help with generalization. Creating positive/negative features for postcode with and without context support. Note: even with known postcodes in known contexts, only use the masked digits to avoid creating too many features that are redundant with the index.
|
2017-02-10 03:41:14 -05:00 |
|
Al
|
9a93e95938
|
[api] removing geodb from setup functions
|
2017-02-10 01:02:52 -05:00 |
|
Al
|
ff245d74f8
|
[parser] building an index of postal codes and their valid admin contexts (city, state, country, etc.) during training e.g. "11216" => ["brooklyn", "ny"]. Postal code phrases like CP in Spanish are removed when constructing the index.
|
2017-02-10 00:50:48 -05:00 |
|
Al
|
40d1b26e12
|
[openaddresses] Henderson County, KY
|
2017-02-09 16:23:43 -05:00 |
|
Al
|
d18c68918d
|
[openaddresses] Kern County, CA
|
2017-02-09 15:23:16 -05:00 |
|
Al
|
598f15cad8
|
[openaddresses] city of Vilnius, Lithuania
|
2017-02-09 15:19:34 -05:00 |
|
Al
|
fa3405fe4d
|
[openaddresses] Scott County, KY
|
2017-02-09 15:17:20 -05:00 |
|
Al
|
f00625029b
|
[openaddresses] Ajax, ON
|
2017-02-09 15:16:39 -05:00 |
|
Al
|
ce5826928b
|
[openaddresses] add Saskatoon, SK
|
2017-02-09 15:11:56 -05:00 |
|
Al
|
1aacb5bccc
|
Merge branch 'master' into parser-data
|
2017-02-09 15:09:28 -05:00 |
|
Al
|
ea168279bd
|
[fix] free json-encoded string in parser client output
|
2017-02-09 14:34:15 -05:00 |
|
Al
|
38c6c26146
|
[fix] freeing normalized string in address_parser_parse
|
2017-02-09 14:33:13 -05:00 |
|
Al
|
8aa3749cfb
|
[utils] some convenience functions for generic hashtables (incr, get, etc)
|
2017-02-08 19:01:13 -05:00 |
|
Al
|
a6844c8ec1
|
[parser] structural changes for postal codes index
|
2017-02-08 18:52:45 -05:00 |
|
Al
|
7a360f4211
|
[osm] addr:postcode can be all over the place in OSM. Start with postcodes containing commas or semicolons. If addr:postcode (on address of building) contains either, iterate over the values and pick the first one that matches a postcode validation regex for that country
|
2017-02-08 16:13:29 -05:00 |
|
Al
|
97ccbef807
|
[openaddresses] adding Lincoln County, WY
|
2017-02-08 15:54:11 -05:00 |
|
Al
|
30fba16141
|
[openaddresses] adding Wuppertal, Germany, Marietta GA, Salem OR, and Atlanta
|
2017-02-08 15:41:17 -05:00 |
|
Al
|
6e4f641743
|
[phrases] adding token_phrase_memberships to trie_search for reuse
|
2017-02-08 01:59:39 -05:00 |
|
Al
|
ae35da8d17
|
[fix] uninitialized var
|
2017-02-08 01:58:53 -05:00 |
|
Al
|
3a95af104b
|
[openaddresses] remove add_osm_boundaries from City of Anaheim
|
2017-02-07 14:37:15 -05:00 |
|
Al
|
dbf7242ea0
|
[fix] /cls/self/
|
2017-02-04 19:12:49 -05:00 |
|
Al
|
35effe4b0b
|
[openaddresses] adding state of Thüringen, Germany
|
2017-02-04 18:00:13 -05:00 |
|
Al
|
af06270896
|
[openaddresses] adding ignore regexes for US counties where we use the unit, using non_numeric_units in every case
|
2017-02-04 15:48:00 -05:00 |
|
Al
|
c600f05f06
|
[openaddresses] adding Czech Republic to the street not required set
|
2017-02-04 15:30:46 -05:00 |
|
Al
|
0169448a4d
|
[addresses] adding Central European city district regexes (e.g. Praha 1, Budapest IV, etc.) to country-specific cleanup
|
2017-02-03 20:54:23 -05:00 |
|
Al
|
1b6263a6e7
|
[openaddresses] add postcode field to NY statewide
|
2017-02-02 15:00:34 -05:00 |
|
Al
|
990ce176aa
|
[openaddresses] add language and modern city name to Dnipro, Ukraine
|
2017-02-02 14:02:59 -05:00 |
|
Al
|
c95d5db290
|
[openaddresses] ignore US postcodes that are 1-4 digits, usually typos. Reformat where needed
|
2017-02-02 14:02:06 -05:00 |
|
Al
|
85f03184d5
|
[openaddresses] moving postcode fixes before validation. Adding regex for validating Russian house numbers in the Ukraine
|
2017-02-02 11:21:00 -05:00 |
|
Al
|
f6e9c5f709
|
[openaddresses] adding postcode length=5 (so leading zeros get captured if the field was an integer) for Germany, France, Italy, and Mexico. Adding validation to Volgograd oblast
|
2017-02-02 11:16:38 -05:00 |
|
Al
|
4fbd99d2c8
|
[openaddresses] add Tacoma, WA
|
2017-02-01 20:30:21 -05:00 |
|
Al
|
d4d3407f2c
|
[openaddresses] adding Vernon BC, Churchill County NV, and some of the new Georgia sources
|
2017-01-31 16:01:39 -05:00 |
|
Al
|
12146b6eeb
|
[openaddresses] adding Nacka, Sweden
|
2017-01-30 02:41:30 -05:00 |
|
Al
|
0380f565d2
|
[parser] shorter first word feature
|
2017-01-29 22:10:28 -05:00 |
|
Al
|
1fbdd964b3
|
[openaddresses] add languages for China and Russia data sets so the validators kick in
|
2017-01-28 02:15:00 -05:00 |
|
Al
|
12bc18f74b
|
[openaddresses] fix Chinese house number validation
|
2017-01-28 02:03:19 -05:00 |
|
Al
|
2b349ef8a8
|
[fix] nevermind, needed to do the Spanish-language street names before validation (simple numeric names like \"8\" needs to be prefixed with \"Calle\" or they'll fail validation)
|
2017-01-28 01:08:10 -05:00 |
|