Commit Graph

4690 Commits

Author SHA1 Message Date
Al
8abfa766fd [fix] paren 2017-02-15 02:26:18 -05:00
Al
06003dfbb0 [fix] lower probability of name:prefix 2017-02-14 18:57:31 -05:00
Al
92b34f6af4 [fix] var name 2017-02-14 18:53:53 -05:00
Al
ca79342636 [fix] config 2017-02-14 18:50:51 -05:00
Al
8eafc5730b [parser] adding long-context features which help classify the first token in the string by finding the relative positions of a) the first numeric token and b) the first street-level phrase like "Ave" or "Calle" 2017-02-14 18:42:51 -05:00
Al
08976c772e [neighborhoods] base parser config changes for new prefix/first_match options 2017-02-14 18:19:15 -05:00
Al
64673c2875 [neighborhoods] add neighborhoods that are not the top match occasionally 2017-02-14 18:17:48 -05:00
Al
b99e31ca17 [neighborhoods] add name:prefix in admin boundaries and neighborhoods (used often in e.g. Germany), use alternative/language keys as well 2017-02-14 18:07:13 -05:00
Al
614479aee1 [neighborhoods] don't add point with same name as existing OSM polygon 2017-02-14 16:40:08 -05:00
Al
854ac853a9 [fix] OSM neighborhoods index check 2017-02-14 03:53:22 -05:00
Al
a416a314fa [fix] var 2017-02-14 03:38:53 -05:00
Al
56f68e4399 [phrases] fixing trie suffix search 2017-02-14 03:36:29 -05:00
Al
5bbc0e15d7 [fix] poly.context 2017-02-14 03:09:02 -05:00
Al
1ee0f1fe0d [osm] mapping admin_level=11 to suburb in Germany, admin_level=10 to suburb in Berlin 2017-02-14 02:34:47 -05:00
Al
b9c24867d7 [fix] set component to suburb on OSM neighborhoods index 2017-02-14 02:19:53 -05:00
Al
072a838fde [neighborhoods] components are now pre-calculated by CTH index 2017-02-14 02:04:07 -05:00
Al
c91a0bdb91 [fix] rm 2017-02-14 01:56:24 -05:00
Al
949c10ab22 [fix] remove print 2017-02-14 01:53:54 -05:00
Al
738bd7b525 [neighborhoods] logging, moving OSM/CTH before Quattroshapes for easier testing 2017-02-14 01:52:59 -05:00
Al
67f69ce6ce [fix] move 2017-02-14 01:51:23 -05:00
Al
6d580f4c87 [osm] neighborhood polygon reader 2017-02-14 01:50:04 -05:00
Al
6c68d446a0 [neighborhoods] adding ClickThatHood config to whitelist/specify what kind of polygon is specified in each file. Adding OSM neighborhoods (ways/relations where place=neighbourhood to reduce ambiguity) as the highest priority, followed by CTH/OSM, CTH, Quattro/OSM, Quattro 2017-02-14 01:48:43 -05:00
Al
2003e08623 [osm] creating an OSM neighborhood boundaries data set for place=neighbourhood polygons only (place=suburb, etc. can be ambiguous) 2017-02-13 20:45:54 -05:00
Al
eff9280224 [boundaries] Amsterdam and Rotterdam listed as admin_level=10 in OSM, making exceptions 2017-02-13 16:06:18 -05:00
Al
2f4bcaeec2 [parser] address_parser_test memory cleanup, add print-errors option to print individual parser errors on held-out data 2017-02-12 16:05:11 -05:00
Al
b1e178b7b2 [fix] is_numeric_token includes IDEOGRAPHIC_NUMBER 2017-02-12 15:11:56 -05:00
Al
b2978f49ba [openaddresses] adding Newaygo County, MI and Scioto County, OH 2017-02-12 00:37:15 -05:00
Al
e569956944 [osm] remove postcode field if more than one is found 2017-02-11 03:52:46 -05:00
Al
9af4b1bd42 [openaddresses] fixing street requirement 2017-02-11 03:29:09 -05:00
Al
2dff6c8839 [fix] call 2017-02-11 02:16:55 -05:00
Al
081f023d60 [fix] name 2017-02-11 02:10:59 -05:00
Al
6705ebaffd [fix] import 2017-02-11 02:09:31 -05:00
Al
7bfc52b540 [osm] add postcode phrases when there's no validation/component-stripping 2017-02-11 01:54:55 -05:00
Al
c9ade4a7da [openaddresses] adding postal code country phrases in OpenAddresses as well 2017-02-11 01:48:54 -05:00
Al
f6bea5ebe5 [fix] always validate in comma-separated postcodes 2017-02-11 01:42:19 -05:00
Al
3d9b512cda [fix] pop 2017-02-11 01:28:56 -05:00
Al
e5a98d16d8 [fix] args 2017-02-11 01:15:07 -05:00
Al
01c4c8ec82 [fix] scope 2017-02-11 01:07:20 -05:00
Al
29ba58d68a [fix] var 2017-02-11 01:01:29 -05:00
Al
f0dfd7850c [fix] ignore punctuation in strip_components 2017-02-11 01:00:37 -05:00
Al
f07d93df2c [fix] omitted line 2017-02-11 00:57:47 -05:00
Al
ffc12ec5ab [osm] add new method in OSM formatting to extract one or more expanded postal codes from an addr:postcode tag, using the new country-specific rules 2017-02-11 00:53:52 -05:00
Al
bbcb6444c8 [addresses] add strip_components method which simply removes the names of OSM components from a string (for e.g. postal codes) 2017-02-11 00:07:55 -05:00
Al
4e1d7d9373 [osm] use new postal codes module in OSM formatting 2017-02-10 23:56:23 -05:00
Al
9022fb9149 [places] use country.lower() 2017-02-10 23:54:43 -05:00
Al
a0d674274a [neighborhoods] immutable data structures when loading from JSON 2017-02-10 23:54:24 -05:00
Al
293587bae9 [addresses] adding new config for postal codes around the world. Allows appending the ISO alpha-2 country code to the beginning of the postcode as in e.g. SI-1000 (only used if the postcode begins with a digit). This system was used for postal codes in continental Europe as a recommendation from the CEPT. Now 7 member states still use it, so in those countries add the country-code with higher probability. The config also contains the license plate codes for countries where e.g. L-1234 might be used instead of LU-1234. Allows configuring in which countries postcodes should be validated using Google's per-country validation regexes (and the ability to override with a custom regex), and in which countries other admin component names should be stripped. 2017-02-10 23:53:50 -05:00
Al
109aa76718 [boundaries] mapping Manhattan node ID to city_district so it gets labeled as such in the neighborhoods index 2017-02-10 14:43:53 -05:00
Al
b570855b78 [parser] adding postcode context features and associated data structures to the parser. Masking digits, which should hopefully help with generalization. Creating positive/negative features for postcode with and without context support. Note: even with known postcodes in known contexts, only use the masked digits to avoid creating too many features that are redundant with the index. 2017-02-10 03:41:14 -05:00
Al
9a93e95938 [api] removing geodb from setup functions 2017-02-10 01:02:52 -05:00