Commit Graph

  • 0196fe8736 [utils] fixing key_type in hash_get, adding int64_double map Al 2017-02-15 22:20:36 -05:00
  • be6f48f109 [fix] that didn't work, set log level to CRITICAL Al 2017-02-15 14:06:57 -05:00
  • 26bf617a06 [fix] prevent Shapely from logging to console Al 2017-02-15 14:00:51 -05:00
  • a0b508caf6 [transliteration] adding no-args option for transliteration_rules script Al 2017-02-15 13:22:33 -05:00
  • 8abfa766fd [fix] paren Al 2017-02-15 02:26:18 -05:00
  • 06003dfbb0 [fix] lower probability of name:prefix Al 2017-02-14 18:57:31 -05:00
  • 92b34f6af4 [fix] var name Al 2017-02-14 18:53:53 -05:00
  • ca79342636 [fix] config Al 2017-02-14 18:50:51 -05:00
  • 8eafc5730b [parser] adding long-context features which help classify the first token in the string by finding the relative positions of a) the first numeric token and b) the first street-level phrase like "Ave" or "Calle" Al 2017-02-14 18:42:51 -05:00
  • 08976c772e [neighborhoods] base parser config changes for new prefix/first_match options Al 2017-02-14 18:19:15 -05:00
  • 64673c2875 [neighborhoods] add neighborhoods that are not the top match occasionally Al 2017-02-14 18:17:40 -05:00
  • b99e31ca17 [neighborhoods] add name:prefix in admin boundaries and neighborhoods (used often in e.g. Germany), use alternative/language keys as well Al 2017-02-14 18:07:13 -05:00
  • 614479aee1 [neighborhoods] don't add point with same name as existing OSM polygon Al 2017-02-14 16:40:08 -05:00
  • 854ac853a9 [fix] OSM neighborhoods index check Al 2017-02-14 03:53:22 -05:00
  • a416a314fa [fix] var Al 2017-02-14 03:38:53 -05:00
  • 56f68e4399 [phrases] fixing trie suffix search Al 2017-02-14 03:36:29 -05:00
  • 5bbc0e15d7 [fix] poly.context Al 2017-02-14 03:09:02 -05:00
  • 1ee0f1fe0d [osm] mapping admin_level=11 to suburb in Germany, admin_level=10 to suburb in Berlin Al 2017-02-14 02:34:47 -05:00
  • b9c24867d7 [fix] set component to suburb on OSM neighborhoods index Al 2017-02-14 02:19:53 -05:00
  • 072a838fde [neighborhoods] components are now pre-calculated by CTH index Al 2017-02-14 02:04:07 -05:00
  • c91a0bdb91 [fix] rm Al 2017-02-14 01:56:24 -05:00
  • 949c10ab22 [fix] remove print Al 2017-02-14 01:53:54 -05:00
  • 738bd7b525 [neighborhoods] logging, moving OSM/CTH before Quattroshapes for easier testing Al 2017-02-14 01:52:59 -05:00
  • 67f69ce6ce [fix] move Al 2017-02-14 01:51:23 -05:00
  • 6d580f4c87 [osm] neighborhood polygon reader Al 2017-02-14 01:50:04 -05:00
  • 6c68d446a0 [neighborhoods] adding ClickThatHood config to whitelist/specify what kind of polygon is specified in each file. Adding OSM neighborhoods (ways/relations where place=neighbourhood to reduce ambiguity) as the highest priority, followed by CTH/OSM, CTH, Quattro/OSM, Quattro Al 2017-02-14 01:46:23 -05:00
  • 2003e08623 [osm] creating an OSM neighborhood boundaries data set for place=neighbourhood polygons only (place=suburb, etc. can be ambiguous) Al 2017-02-13 17:41:33 -05:00
  • eff9280224 [boundaries] Amsterdam and Rotterdam listed as admin_level=10 in OSM, making exceptions Al 2017-02-13 16:06:18 -05:00
  • 2f4bcaeec2 [parser] address_parser_test memory cleanup, add print-errors option to print individual parser errors on held-out data Al 2017-02-12 15:58:50 -05:00
  • b1e178b7b2 [fix] is_numeric_token includes IDEOGRAPHIC_NUMBER Al 2017-02-12 15:11:56 -05:00
  • b2978f49ba [openaddresses] adding Newaygo County, MI and Scioto County, OH Al 2017-02-12 00:37:15 -05:00
  • e569956944 [osm] remove postcode field if more than one is found Al 2017-02-11 03:52:46 -05:00
  • 9af4b1bd42 [openaddresses] fixing street requirement Al 2017-02-11 03:29:09 -05:00
  • 2dff6c8839 [fix] call Al 2017-02-11 02:16:55 -05:00
  • 081f023d60 [fix] name Al 2017-02-11 02:10:59 -05:00
  • 6705ebaffd [fix] import Al 2017-02-11 02:09:31 -05:00
  • 7bfc52b540 [osm] add postcode phrases when there's no validation/component-stripping Al 2017-02-11 01:54:55 -05:00
  • c9ade4a7da [openaddresses] adding postal code country phrases in OpenAddresses as well Al 2017-02-11 01:48:54 -05:00
  • f6bea5ebe5 [fix] always validate in comma-separated postcodes Al 2017-02-11 01:42:19 -05:00
  • 3d9b512cda [fix] pop Al 2017-02-11 01:28:56 -05:00
  • e5a98d16d8 [fix] args Al 2017-02-11 01:15:07 -05:00
  • 01c4c8ec82 [fix] scope Al 2017-02-11 01:07:20 -05:00
  • 29ba58d68a [fix] var Al 2017-02-11 01:01:29 -05:00
  • f0dfd7850c [fix] ignore punctuation in strip_components Al 2017-02-11 01:00:37 -05:00
  • f07d93df2c [fix] omitted line Al 2017-02-11 00:57:47 -05:00
  • ffc12ec5ab [osm] add new method in OSM formatting to extract one or more expanded postal codes from an addr:postcode tag, using the new country-specific rules Al 2017-02-11 00:53:52 -05:00
  • bbcb6444c8 [addresses] add strip_components method which simply removes the names of OSM components from a string (for e.g. postal codes) Al 2017-02-10 23:57:20 -05:00
  • 4e1d7d9373 [osm] use new postal codes module in OSM formatting Al 2017-02-10 23:56:23 -05:00
  • 9022fb9149 [places] use country.lower() Al 2017-02-10 23:54:43 -05:00
  • a0d674274a [neighborhoods] immutable data structures when loading from JSON Al 2017-02-10 23:54:24 -05:00
  • 293587bae9 [addresses] adding new config for postal codes around the world. Allows appending the ISO alpha-2 country code to the beginning of the postcode as in e.g. SI-1000 (only used if the postcode begins with a digit). This system was used for postal codes in continental Europe as a recommendation from the CEPT. Now 7 member states still use it, so in those countries add the country-code with higher probability. The config also contains the license plate codes for countries where e.g. L-1234 might be used instead of LU-1234. Allows configuring in which countries postcodes should be validated using Google's per-country validation regexes (and the ability to override with a custom regex), and in which countries other admin component names should be stripped. Al 2017-02-10 18:38:32 -05:00
  • 109aa76718 [boundaries] mapping Manhattan node ID to city_district so it gets labeled as such in the neighborhoods index Al 2017-02-10 14:43:46 -05:00
  • c3d50386c1 Merge pull request #161 from pasupulaphani/master Al Barrentine 2017-02-10 11:15:37 -05:00
  • b570855b78 [parser] adding postcode context features and associated data structures to the parser. Masking digits, which should hopefully help with generalization. Creating positive/negative features for postcode with and without context support. Note: even with known postcodes in known contexts, only use the masked digits to avoid creating too many features that are redundant with the index. Al 2017-02-10 03:40:43 -05:00
  • 9a93e95938 [api] removing geodb from setup functions Al 2017-02-10 01:02:52 -05:00
  • ff245d74f8 [parser] building an index of postal codes and their valid admin contexts (city, state, country, etc.) during training e.g. "11216" => ["brooklyn", "ny"]. Postal code phrases like CP in Spanish are removed when constructing the index. Al 2017-02-10 00:50:48 -05:00
  • 40d1b26e12 [openaddresses] Henderson County, KY Al 2017-02-09 16:23:43 -05:00
  • d18c68918d [openaddresses] Kern County, CA Al 2017-02-09 15:23:16 -05:00
  • 598f15cad8 [openaddresses] city of Vilnius, Lithuania Al 2017-02-09 15:19:34 -05:00
  • fa3405fe4d [openaddresses] Scott County, KY Al 2017-02-09 15:17:20 -05:00
  • f00625029b [openaddresses] Ajax, ON Al 2017-02-09 15:16:39 -05:00
  • ce5826928b [openaddresses] add Saskatoon, SK Al 2017-02-09 15:11:53 -05:00
  • 1aacb5bccc Merge branch 'master' into parser-data Al 2017-02-09 15:09:28 -05:00
  • ea168279bd [fix] free json-encoded string in parser client output Al 2017-02-09 14:34:15 -05:00
  • 38c6c26146 [fix] freeing normalized string in address_parser_parse Al 2017-02-09 14:33:13 -05:00
  • 621f37c836 Update ReadME to add ZeroMQ bindings docker Phaninder Pasupula 2017-02-09 14:24:56 +00:00
  • 8aa3749cfb [utils] some convenience functions for generic hashtables (incr, get, etc) Al 2017-02-08 19:01:13 -05:00
  • a6844c8ec1 [parser] structural changes for postal codes index Al 2017-02-08 18:52:45 -05:00
  • 7a360f4211 [osm] addr:postcode can be all over the place in OSM. Start with postcodes containing commas or semicolons. If addr:postcode (on address of building) contains either, iterate over the values and pick the first one that matches a postcode validation regex for that country Al 2017-02-08 16:13:25 -05:00
  • 97ccbef807 [openaddresses] adding Lincoln County, WY Al 2017-02-08 15:54:11 -05:00
  • 30fba16141 [openaddresses] adding Wuppertal, Germany, Marietta GA, Salem OR, and Atlanta Al 2017-02-08 15:20:46 -05:00
  • 6e4f641743 [phrases] adding token_phrase_memberships to trie_search for reuse Al 2017-02-08 01:59:39 -05:00
  • ae35da8d17 [fix] uninitialized var Al 2017-02-08 01:58:53 -05:00
  • 3a95af104b [openaddresses] remove add_osm_boundaries from City of Anaheim Al 2017-02-07 14:37:15 -05:00
  • dbf7242ea0 [fix] /cls/self/ Al 2017-02-04 19:12:49 -05:00
  • 35effe4b0b [openaddresses] adding state of Thüringen, Germany Al 2017-02-04 18:00:13 -05:00
  • af06270896 [openaddresses] adding ignore regexes for US counties where we use the unit, using non_numeric_units in every case Al 2017-02-04 15:48:00 -05:00
  • c600f05f06 [openaddresses] adding Czech Republic to the street not required set Al 2017-02-04 15:30:46 -05:00
  • 0169448a4d [addresses] adding Central European city district regexes (e.g. Praha 1, Budapest IV, etc.) to country-specific cleanup Al 2017-02-03 20:54:23 -05:00
  • 1b6263a6e7 [openaddresses] add postcode field to NY statewide Al 2017-02-02 15:00:34 -05:00
  • 990ce176aa [openaddresses] add language and modern city name to Dnipro, Ukraine Al 2017-02-02 14:02:59 -05:00
  • c95d5db290 [openaddresses] ignore US postcodes that are 1-4 digits, usually typos. Reformat where needed Al 2017-02-02 14:02:03 -05:00
  • 85f03184d5 [openaddresses] moving postcode fixes before validation. Adding regex for validating Russian house numbers in the Ukraine Al 2017-02-02 11:21:00 -05:00
  • f6e9c5f709 [openaddresses] adding postcode length=5 (so leading zeros get captured if the field was an integer) for Germany, France, Italy, and Mexico. Adding validation to Volgograd oblast Al 2017-02-02 11:16:38 -05:00
  • 4fbd99d2c8 [openaddresses] add Tacoma, WA Al 2017-02-01 20:30:21 -05:00
  • d4d3407f2c [openaddresses] adding Vernon BC, Churchill County NV, and some of the new Georgia sources Al 2017-01-31 14:47:11 -05:00
  • 12146b6eeb [openaddresses] adding Nacka, Sweden Al 2017-01-30 02:41:30 -05:00
  • 0380f565d2 [parser] shorter first word feature Al 2017-01-29 22:10:28 -05:00
  • 1fbdd964b3 [openaddresses] add languages for China and Russia data sets so the validators kick in Al 2017-01-28 02:15:00 -05:00
  • 12bc18f74b [openaddresses] fix Chinese house number validation Al 2017-01-28 02:03:19 -05:00
  • 2b349ef8a8 [fix] nevermind, needed to do the Spanish-language street names before validation (simple numeric names like \"8\" needs to be prefixed with \"Calle\" or they'll fail validation) Al 2017-01-28 01:08:05 -05:00
  • dcacbece8f [openaddresses] adding city_district for Wuhan, China Al 2017-01-28 01:03:11 -05:00
  • 2953759321 [openaddresses] formatting Chinese house number (with annex adding a second number potentially) and adding Spanish street names after the language is known by reverse geocoding Al 2017-01-28 01:01:26 -05:00
  • c9417436f7 [openaddresses] allowing a single character boundary name in ideographic languages Al 2017-01-27 23:38:03 -05:00
  • c798f4a83b [places] always include suburb in Japan as it functions as the street Al 2017-01-27 21:22:12 -05:00
  • 72881ad315 [fix] conditional + var name Al 2017-01-27 19:20:41 -05:00
  • 987609ee8e [fix] var name Al 2017-01-27 18:46:58 -05:00
  • cd1875d077 [fix] import Al 2017-01-27 18:35:43 -05:00
  • 01d6d47b08 [osm] removing addr:place mapping to road as it's usually a village in post-Soviet states, etc. Can handle it down the road Al 2017-01-27 13:54:05 -05:00
  • 11345bf2bf [osm] using new constants in OSM formatting as well Al 2017-01-27 13:53:00 -05:00