Commit Graph

  • d137b31be3 [fix] YAML config Al 2016-08-26 22:48:26 -04:00
  • 12d429b63d [openaddresses] Simple regex-based method to strip unit phrases tacked onto the end of a street Al 2016-08-26 22:39:13 -04:00
  • 318ad2a0c4 [openaddresses] Removing <Null> tag from values in OpenAddresses, seeing it in Colorado county files Al 2016-08-26 21:41:30 -04:00
  • cfd4537bb5 [openaddresses] adding Nebraska (37th state in the union) Al 2016-08-26 20:59:38 -04:00
  • 0f9e8ee95d [openaddresses] Better handling of float postcodes Al 2016-08-26 20:10:30 -04:00
  • 0618e506c4 [openaddresses] adding Nevada (36th state in the union) Al 2016-08-26 19:59:03 -04:00
  • e6f4fd656a [openaddresses] Adding West Virginia (35th state in the union) Al 2016-08-26 19:52:25 -04:00
  • 19eceb0074 [openaddresses] Adding Kansas (34th state in the union) Al 2016-08-26 19:48:30 -04:00
  • 219149988f [openaddresses] adding Oregon (33rd state in the union) Al 2016-08-26 19:25:59 -04:00
  • 56329439af [openaddresses] some postcodes in OpenAddresses are stored as floats, convert to int and then to string if that's the case Al 2016-08-26 19:12:48 -04:00
  • 155d46f46c [openaddresses] Adding Minnesota (32nd state in the union) Al 2016-08-26 19:04:34 -04:00
  • e771e9b2d5 [openaddresses] Adding Wisconsin (30th state in the union) Al 2016-08-26 18:37:48 -04:00
  • 0016cdbd7f [openaddresses] Adding Iowa (29th state in the union) Al 2016-08-26 18:13:50 -04:00
  • 0ec9593e6c [openaddresses] Adding Texas (28th state in the union, however reluctantly) Al 2016-08-26 17:42:24 -04:00
  • 79b4e0be90 [dictionaries] service road abbreviations Al 2016-08-26 16:46:02 -04:00
  • d696c792ae [openaddresses] Adding Florida (27th state in the union) Al 2016-08-26 16:35:04 -04:00
  • 2b9d58dcbe [openaddresses] Ignoring fields with null-like values as well (there appear to be no valid places named Null or None...yet) Al 2016-08-26 15:48:32 -04:00
  • 2654683af4 [openaddresses] Adding quick-and-dirty regex-based exclusion list for fields containing various patterns in OpenAddresses, to be used sparingly Al 2016-08-26 15:34:30 -04:00
  • 7bcddeff44 [openaddresses] Adding Michigan (26th state in the union) Al 2016-08-26 13:42:13 -04:00
  • 755a65aa14 [openaddresses] Adding Arkansas (25th state in the union) Al 2016-08-26 13:36:25 -04:00
  • d97bb9cd4c [openaddresses] Adding Missouri (24th state in the union) Al 2016-08-26 13:36:09 -04:00
  • d4e76eac0b [openaddresses] Adding Alabama (22nd state in the Union) Al 2016-08-26 13:12:39 -04:00
  • aa26277136 [openaddresses] Adding Illinois (21st state in the union) Al 2016-08-26 13:08:06 -04:00
  • 4e9f9e8957 [openaddresses] Replace multiple spaces with single space Al 2016-08-26 12:45:49 -04:00
  • 9e89147c83 [openaddresses] removing spaces in numeric ranges in OpenAddresses, sometimes see things like '12 -23' Al 2016-08-26 12:30:15 -04:00
  • a11abf2787 [openaddresses] Adding Mississippi (20th state in the union) Al 2016-08-26 10:58:16 -04:00
  • ebeb7f816a [openaddresses] Adding Indiana (19th state in the union) Al 2016-08-26 10:48:05 -04:00
  • 3b2c86d240 [fix] strip values in OpenAddresses components Al 2016-08-26 10:24:34 -04:00
  • 472580320d [dictionaries] English synonyms update Al 2016-08-26 10:19:51 -04:00
  • b2f8180d19 [openaddresses] Ignore any fields in OpenAddresses which have N/A as a value Al 2016-08-25 23:58:38 -04:00
  • 01afbf80ef [data] Each curl process will retry the chunk up to 3 times Al 2016-08-25 23:18:39 -04:00
  • c23a7a4030 [openaddresses] Ditto for numeric boundary names Al 2016-08-25 22:58:52 -04:00
  • 34b01e203d [openaddresses] Don't allow single-letter boundary names as they're probably just typos Al 2016-08-25 22:58:26 -04:00
  • 3a8dee523d [openaddresses] Adding Louisiana (18th state in the union) Al 2016-08-25 22:50:18 -04:00
  • 9aea4451ff [openaddresses] Adding Ohio (17th state in the union) Al 2016-08-25 20:29:26 -04:00
  • 0b19f27d8d [openaddresses] Adding Tennessee (16th state in the union) Al 2016-08-25 18:55:54 -04:00
  • 59a840ab37 [openaddresses] Adding Kentucky (15th state in the union) Al 2016-08-25 18:38:53 -04:00
  • dc6e483067 [openaddresses] Adding DC (not a state, but in after the original 13 colonies) Al 2016-08-25 18:10:02 -04:00
  • e251fc42fa [openaddresses] Adding North Carolina (12th state in the union) Al 2016-08-25 18:08:19 -04:00
  • 2009b4c992 [openaddresses] Adding Virginia (10th state in the union) Al 2016-08-25 16:37:39 -04:00
  • 859868aea2 [openaddresses] Adding option to strip non-digits from postcode, addresses with a postcode and no house_number+street may still be useful, keeping them around as place queries to help with postcode contexts Al 2016-08-25 16:36:18 -04:00
  • da619e3cf4 [osm] Adding border_type=city to override tags Al 2016-08-25 15:21:33 -04:00
  • 93b377c8a7 [openaddresses] Fixes for California, have to remove Orange County because it's all being stuffed into the street field Al 2016-08-25 14:39:45 -04:00
  • dd0ca5e008 [addresses] Adding admin_center properties to place components in add_admin_boundaries (only overriding for specified areas where the boundary may otherwise not have all the properties) Al 2016-08-25 01:19:32 -04:00
  • b75419d6e8 [boundaries] Luxembourg quarters = city_district Al 2016-08-24 23:37:44 -04:00
  • 2e7f8f1ae7 [abbreviations] Adding toponyms gazetteer for probabilistically abbreviating things like Mount=>Mt, Saint=>St, Fort=>Ft in place names Al 2016-08-24 18:52:00 -04:00
  • dfa5c8e0a6 [abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten Al 2016-08-24 17:32:28 -04:00
  • a6dad74a2b [openaddresses] cleaning comma-delimited boundary components in OpenAddresses data sets Al 2016-08-24 15:06:04 -04:00
  • 14bc224f25 [openaddresses] Adding OSM neighborhoods across the US wherever we have them. That index is relatively small and cheap to do lookups for every point whereas the general R-tree should be used only when necessary Al 2016-08-24 14:58:07 -04:00
  • 4552aa380c [openaddresses] Adding South Carolina Al 2016-08-24 14:47:07 -04:00
  • 84bb12657b [dictionaries] Adding a variety of abbreviations/misspellings for street, road, drive, and place Al 2016-08-24 14:19:40 -04:00
  • 709cecd300 [dictionaries] Adding some MLK synonyms after looking at the Georgia data Al 2016-08-24 14:18:44 -04:00
  • d250f58293 [openaddresses] Also skipping addresses where street == unit Al 2016-08-24 14:10:41 -04:00
  • f66fb4a172 [openaddresses] Adding Maryland Al 2016-08-24 13:54:07 -04:00
  • f9ec02c8e0 [openaddresses] Adding Georgia. There's a lot of weirdness in there so whitelisting files. Files that weren't added were deliberate Al 2016-08-24 13:52:35 -04:00
  • 7c3ad708d8 [openaddresses] Ensuring integer house numbers are > 0, street is not simply a numeric token (usually a copy of the house number) and that street != house number generally Al 2016-08-24 13:46:56 -04:00
  • ad625a46a4 [openaddresses] Adding Delaware and Pennsylvania. Going with the "older states in the union will have funkier addresses" strategy. Al 2016-08-23 22:22:29 -04:00
  • f36ca6a788 [dictionaries] Adding Asturian language and dictionaries for the Asturias region of Spain. Realized some of the default street names/addresses in Oviedo, etc. are actually Asturian rather than Spanish Al 2016-08-23 21:52:22 -04:00
  • ff06462981 [dictionaries] Adding oberste etage, unterste etage and parkdeck to German dictionaries. Generating as part of the sub-building info for the address parser Al 2016-08-23 21:49:44 -04:00
  • de1255af00 [auto][ci skip] Adding data files from Travis build #161 Travis 2016-08-23 22:48:20 +00:00
  • f03df6aab8 Merge pull request #108 from petacat/patch-5 Al Barrentine 2016-08-23 18:38:08 -04:00
  • f19c9852aa [auto][ci skip] Adding data files from Travis build #160 Travis 2016-08-23 22:24:19 +00:00
  • d797d6c863 [auto][ci skip] Adding data files from Travis build #159 Travis 2016-08-23 22:14:07 +00:00
  • d1991848a3 Merge pull request #106 from petacat/patch-3 Al Barrentine 2016-08-23 18:09:47 -04:00
  • 964b440380 Merge pull request #104 from petacat/patch-1 Al Barrentine 2016-08-23 17:49:36 -04:00
  • a787c25cdf Update toponyms.txt Thomas Rosen 2016-08-23 23:09:32 +02:00
  • 7e258f2d87 Update place_names.txt Thomas Rosen 2016-08-23 23:03:31 +02:00
  • bd109dc9ca Update directionals.txt Thomas Rosen 2016-08-23 22:56:56 +02:00
  • e746cbab75 [openaddresses] Adding New England states (postcodes beginning with 0). Al 2016-08-23 02:51:20 -04:00
  • 9866614f63 [openaddresses] Using new config implementation, using neighborhoods/boroughs in NYC Al 2016-08-23 02:14:29 -04:00
  • b7c600e496 [openaddresses] adding numeric_postcodes_only and add_osm_neighborhoods options Al 2016-08-23 02:11:21 -04:00
  • ed0b49884e [openaddresses] Changes to OA config utilizing some of the new cleanup options. Adding language to brussels-fr and brussels-nl, adding New York and New Jersey statewide with the understanding that OSM components will be added in NJ and postcodes will be stripped of letters in NY Al 2016-08-23 00:38:43 -04:00
  • 8ec288d8f8 [openaddresses] Adding ability to specify language of a particular OpenAddresses CSV a priori. Unless otherwise specified, non-numeric unit fields will be discarded and phrases will be added randomly for numeric unit fields. Al 2016-08-23 00:29:05 -04:00
  • 99f71b718f [openaddresses] New command-line arguments to OpenAddresses training data script Al 2016-08-22 22:12:47 -04:00
  • 23be122d2e [openaddresses] Adding ability to use OSM boundaries for OpenAddresses (not turned on by default), cleaning up street names, requiring at least house number and street, validating house number to provide some assurance that it's not a badly-formatted NULL value, adding ability to strip letters from postcode for data sets like New York's statewide where there are some codes attached. Al 2016-08-22 22:07:34 -04:00
  • 8b57a7acf2 [osm] abbreviate toponyms (qualifiers) with some probability so we get those versions in the model's phrase dictionaries Al 2016-08-22 20:29:29 -04:00
  • d281e71d2c [fix] removing metro station indexas a dependency for AddressComponents Al 2016-08-22 15:52:22 -04:00
  • 3fef3e56d5 [boundaries] converting Mexico City boroughs to city_district Al 2016-08-22 03:51:01 -04:00
  • 79c9694e2d [names] Allowing for similarity-only normalization in name affixes Al 2016-08-22 03:47:03 -04:00
  • 72b5f6b55a [dictionaries] German dictionary updates Al 2016-08-22 00:11:10 -04:00
  • 58851a9088 [normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input Al 2016-08-21 19:45:32 -04:00
  • 8b9702b43d [error handling] Checking that resize succeeded in transliterate.c Al 2016-08-21 19:43:09 -04:00
  • 2644fed18f [transliteration] Adding LATIN_ASCII_SIMPLE constant to transliterate.h Al 2016-08-21 19:42:10 -04:00
  • 4375bdea3b [transliteration] strduping transliterator name while building table Al 2016-08-21 19:41:34 -04:00
  • bde8776bc2 [transliteration] Regenerating transliteration data files Al 2016-08-21 19:41:11 -04:00
  • cb4408fea8 [transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string. Al 2016-08-20 18:17:35 -04:00
  • 85ae5d4a05 [fix] name Al 2016-08-19 23:38:33 -04:00
  • 7951044d74 [intersections] Abbreviating street names that are not base names with random probabilities Al 2016-08-19 23:27:29 -04:00
  • 42808c62e3 [fix] dictionary access Al 2016-08-19 16:02:36 -04:00
  • 41f715d6ee [intersections] Better handling of default languages in intersection queries Al 2016-08-19 15:59:54 -04:00
  • a7118b40a7 [intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections Al 2016-08-19 13:12:02 -04:00
  • 0b2d3d965f [fix] using lat/lon from the node properties in intersections data Al 2016-08-19 12:23:08 -04:00
  • 294316c721 [intersections] no need to store lat/lon in intersections Al 2016-08-19 01:58:53 -04:00
  • 9a6ec41ce6 [points] Adding __iter__ and __len__ to point index Al 2016-08-19 01:01:05 -04:00
  • f43abe0846 [fix] making cleaned_name a classmethod Al 2016-08-18 19:55:52 -04:00
  • defc7ffacc [fix] arg name again Al 2016-08-18 18:22:06 -04:00
  • 4a28225df6 [fix] name Al 2016-08-18 18:20:55 -04:00
  • 86b921c629 [intersections] Adding the intersection's properties for intersections in case we want to do anything with named intersections in Japan/Korea Al 2016-08-18 17:14:23 -04:00
  • 87ee5f47f9 [fix] check for None in binary_search Al 2016-08-18 15:12:23 -04:00
  • 1675bba3f0 [intersections] highway=crossing also valid Al 2016-08-18 03:00:23 -04:00