0196fe8736[utils] fixing key_type in hash_get, adding int64_double map
Al
2017-02-15 22:20:36 -05:00
be6f48f109[fix] that didn't work, set log level to CRITICAL
Al
2017-02-15 14:06:57 -05:00
26bf617a06[fix] prevent Shapely from logging to console
Al
2017-02-15 14:00:51 -05:00
a0b508caf6[transliteration] adding no-args option for transliteration_rules script
Al
2017-02-15 13:22:33 -05:00
8abfa766fd[fix] paren
Al
2017-02-15 02:26:18 -05:00
06003dfbb0[fix] lower probability of name:prefix
Al
2017-02-14 18:57:31 -05:00
92b34f6af4[fix] var name
Al
2017-02-14 18:53:53 -05:00
ca79342636[fix] config
Al
2017-02-14 18:50:51 -05:00
8eafc5730b[parser] adding long-context features which help classify the first token in the string by finding the relative positions of a) the first numeric token and b) the first street-level phrase like "Ave" or "Calle"
Al
2017-02-14 18:42:51 -05:00
08976c772e[neighborhoods] base parser config changes for new prefix/first_match options
Al
2017-02-14 18:19:15 -05:00
64673c2875[neighborhoods] add neighborhoods that are not the top match occasionally
Al
2017-02-14 18:17:40 -05:00
b99e31ca17[neighborhoods] add name:prefix in admin boundaries and neighborhoods (used often in e.g. Germany), use alternative/language keys as well
Al
2017-02-14 18:07:13 -05:00
614479aee1[neighborhoods] don't add point with same name as existing OSM polygon
Al
2017-02-14 16:40:08 -05:00
854ac853a9[fix] OSM neighborhoods index check
Al
2017-02-14 03:53:22 -05:00
949c10ab22[fix] remove print
Al
2017-02-14 01:53:54 -05:00
738bd7b525[neighborhoods] logging, moving OSM/CTH before Quattroshapes for easier testing
Al
2017-02-14 01:52:59 -05:00
67f69ce6ce[fix] move
Al
2017-02-14 01:51:23 -05:00
6d580f4c87[osm] neighborhood polygon reader
Al
2017-02-14 01:50:04 -05:00
6c68d446a0[neighborhoods] adding ClickThatHood config to whitelist/specify what kind of polygon is specified in each file. Adding OSM neighborhoods (ways/relations where place=neighbourhood to reduce ambiguity) as the highest priority, followed by CTH/OSM, CTH, Quattro/OSM, Quattro
Al
2017-02-14 01:46:23 -05:00
2003e08623[osm] creating an OSM neighborhood boundaries data set for place=neighbourhood polygons only (place=suburb, etc. can be ambiguous)
Al
2017-02-13 17:41:33 -05:00
eff9280224[boundaries] Amsterdam and Rotterdam listed as admin_level=10 in OSM, making exceptions
Al
2017-02-13 16:06:18 -05:00
2f4bcaeec2[parser] address_parser_test memory cleanup, add print-errors option to print individual parser errors on held-out data
Al
2017-02-12 15:58:50 -05:00
b1e178b7b2[fix] is_numeric_token includes IDEOGRAPHIC_NUMBER
Al
2017-02-12 15:11:56 -05:00
b2978f49ba[openaddresses] adding Newaygo County, MI and Scioto County, OH
Al
2017-02-12 00:37:15 -05:00
e569956944[osm] remove postcode field if more than one is found
Al
2017-02-11 03:52:46 -05:00
9af4b1bd42[openaddresses] fixing street requirement
Al
2017-02-11 03:29:09 -05:00
2dff6c8839[fix] call
Al
2017-02-11 02:16:55 -05:00
081f023d60[fix] name
Al
2017-02-11 02:10:59 -05:00
6705ebaffd[fix] import
Al
2017-02-11 02:09:31 -05:00
7bfc52b540[osm] add postcode phrases when there's no validation/component-stripping
Al
2017-02-11 01:54:55 -05:00
c9ade4a7da[openaddresses] adding postal code country phrases in OpenAddresses as well
Al
2017-02-11 01:48:54 -05:00
f6bea5ebe5[fix] always validate in comma-separated postcodes
Al
2017-02-11 01:42:19 -05:00
f0dfd7850c[fix] ignore punctuation in strip_components
Al
2017-02-11 01:00:37 -05:00
f07d93df2c[fix] omitted line
Al
2017-02-11 00:57:47 -05:00
ffc12ec5ab[osm] add new method in OSM formatting to extract one or more expanded postal codes from an addr:postcode tag, using the new country-specific rules
Al
2017-02-11 00:53:52 -05:00
bbcb6444c8[addresses] add strip_components method which simply removes the names of OSM components from a string (for e.g. postal codes)
Al
2017-02-10 23:57:20 -05:00
4e1d7d9373[osm] use new postal codes module in OSM formatting
Al
2017-02-10 23:56:23 -05:00
9022fb9149[places] use country.lower()
Al
2017-02-10 23:54:43 -05:00
a0d674274a[neighborhoods] immutable data structures when loading from JSON
Al
2017-02-10 23:54:24 -05:00
293587bae9[addresses] adding new config for postal codes around the world. Allows appending the ISO alpha-2 country code to the beginning of the postcode as in e.g. SI-1000 (only used if the postcode begins with a digit). This system was used for postal codes in continental Europe as a recommendation from the CEPT. Now 7 member states still use it, so in those countries add the country-code with higher probability. The config also contains the license plate codes for countries where e.g. L-1234 might be used instead of LU-1234. Allows configuring in which countries postcodes should be validated using Google's per-country validation regexes (and the ability to override with a custom regex), and in which countries other admin component names should be stripped.
Al
2017-02-10 18:38:32 -05:00
109aa76718[boundaries] mapping Manhattan node ID to city_district so it gets labeled as such in the neighborhoods index
Al
2017-02-10 14:43:46 -05:00
c3d50386c1Merge pull request #161 from pasupulaphani/master
Al Barrentine
2017-02-10 11:15:37 -05:00
b570855b78[parser] adding postcode context features and associated data structures to the parser. Masking digits, which should hopefully help with generalization. Creating positive/negative features for postcode with and without context support. Note: even with known postcodes in known contexts, only use the masked digits to avoid creating too many features that are redundant with the index.
Al
2017-02-10 03:40:43 -05:00
9a93e95938[api] removing geodb from setup functions
Al
2017-02-10 01:02:52 -05:00
ff245d74f8[parser] building an index of postal codes and their valid admin contexts (city, state, country, etc.) during training e.g. "11216" => ["brooklyn", "ny"]. Postal code phrases like CP in Spanish are removed when constructing the index.
Al
2017-02-10 00:50:48 -05:00
40d1b26e12[openaddresses] Henderson County, KY
Al
2017-02-09 16:23:43 -05:00
d18c68918d[openaddresses] Kern County, CA
Al
2017-02-09 15:23:16 -05:00
598f15cad8[openaddresses] city of Vilnius, Lithuania
Al
2017-02-09 15:19:34 -05:00
fa3405fe4d[openaddresses] Scott County, KY
Al
2017-02-09 15:17:20 -05:00
f00625029b[openaddresses] Ajax, ON
Al
2017-02-09 15:16:39 -05:00
ce5826928b[openaddresses] add Saskatoon, SK
Al
2017-02-09 15:11:53 -05:00
1aacb5bcccMerge branch 'master' into parser-data
Al
2017-02-09 15:09:28 -05:00
ea168279bd[fix] free json-encoded string in parser client output
Al
2017-02-09 14:34:15 -05:00
38c6c26146[fix] freeing normalized string in address_parser_parse
Al
2017-02-09 14:33:13 -05:00
8aa3749cfb[utils] some convenience functions for generic hashtables (incr, get, etc)
Al
2017-02-08 19:01:13 -05:00
a6844c8ec1[parser] structural changes for postal codes index
Al
2017-02-08 18:52:45 -05:00
7a360f4211[osm] addr:postcode can be all over the place in OSM. Start with postcodes containing commas or semicolons. If addr:postcode (on address of building) contains either, iterate over the values and pick the first one that matches a postcode validation regex for that country
Al
2017-02-08 16:13:25 -05:00
97ccbef807[openaddresses] adding Lincoln County, WY
Al
2017-02-08 15:54:11 -05:00
30fba16141[openaddresses] adding Wuppertal, Germany, Marietta GA, Salem OR, and Atlanta
Al
2017-02-08 15:20:46 -05:00
6e4f641743[phrases] adding token_phrase_memberships to trie_search for reuse
Al
2017-02-08 01:59:39 -05:00
ae35da8d17[fix] uninitialized var
Al
2017-02-08 01:58:53 -05:00
3a95af104b[openaddresses] remove add_osm_boundaries from City of Anaheim
Al
2017-02-07 14:37:15 -05:00
dbf7242ea0[fix] /cls/self/
Al
2017-02-04 19:12:49 -05:00
35effe4b0b[openaddresses] adding state of Thüringen, Germany
Al
2017-02-04 18:00:13 -05:00
af06270896[openaddresses] adding ignore regexes for US counties where we use the unit, using non_numeric_units in every case
Al
2017-02-04 15:48:00 -05:00
c600f05f06[openaddresses] adding Czech Republic to the street not required set
Al
2017-02-04 15:30:46 -05:00
0169448a4d[addresses] adding Central European city district regexes (e.g. Praha 1, Budapest IV, etc.) to country-specific cleanup
Al
2017-02-03 20:54:23 -05:00
1b6263a6e7[openaddresses] add postcode field to NY statewide
Al
2017-02-02 15:00:34 -05:00
990ce176aa[openaddresses] add language and modern city name to Dnipro, Ukraine
Al
2017-02-02 14:02:59 -05:00
c95d5db290[openaddresses] ignore US postcodes that are 1-4 digits, usually typos. Reformat where needed
Al
2017-02-02 14:02:03 -05:00
85f03184d5[openaddresses] moving postcode fixes before validation. Adding regex for validating Russian house numbers in the Ukraine
Al
2017-02-02 11:21:00 -05:00
f6e9c5f709[openaddresses] adding postcode length=5 (so leading zeros get captured if the field was an integer) for Germany, France, Italy, and Mexico. Adding validation to Volgograd oblast
Al
2017-02-02 11:16:38 -05:00
4fbd99d2c8[openaddresses] add Tacoma, WA
Al
2017-02-01 20:30:21 -05:00
d4d3407f2c[openaddresses] adding Vernon BC, Churchill County NV, and some of the new Georgia sources
Al
2017-01-31 14:47:11 -05:00
12146b6eeb[openaddresses] adding Nacka, Sweden
Al
2017-01-30 02:41:30 -05:00
0380f565d2[parser] shorter first word feature
Al
2017-01-29 22:10:28 -05:00
1fbdd964b3[openaddresses] add languages for China and Russia data sets so the validators kick in
Al
2017-01-28 02:15:00 -05:00
12bc18f74b[openaddresses] fix Chinese house number validation
Al
2017-01-28 02:03:19 -05:00
2b349ef8a8[fix] nevermind, needed to do the Spanish-language street names before validation (simple numeric names like \"8\" needs to be prefixed with \"Calle\" or they'll fail validation)
Al
2017-01-28 01:08:05 -05:00
dcacbece8f[openaddresses] adding city_district for Wuhan, China
Al
2017-01-28 01:03:11 -05:00
2953759321[openaddresses] formatting Chinese house number (with annex adding a second number potentially) and adding Spanish street names after the language is known by reverse geocoding
Al
2017-01-28 01:01:26 -05:00
c9417436f7[openaddresses] allowing a single character boundary name in ideographic languages
Al
2017-01-27 23:38:03 -05:00
c798f4a83b[places] always include suburb in Japan as it functions as the street
Al
2017-01-27 21:22:12 -05:00
72881ad315[fix] conditional + var name
Al
2017-01-27 19:20:41 -05:00
987609ee8e[fix] var name
Al
2017-01-27 18:46:58 -05:00
cd1875d077[fix] import
Al
2017-01-27 18:35:43 -05:00
01d6d47b08[osm] removing addr:place mapping to road as it's usually a village in post-Soviet states, etc. Can handle it down the road
Al
2017-01-27 13:54:05 -05:00
11345bf2bf[osm] using new constants in OSM formatting as well
Al
2017-01-27 13:53:00 -05:00