Commit Graph

  • 7edb983566 [openaddresses] adding D.C. with periodds as the state for the DC data set Al 2016-12-09 19:58:57 -05:00
  • c7b1818695 [fix] imports Al 2016-12-09 19:53:17 -05:00
  • 973466bb13 [states] adding multiple state abbreviations for states that can have periods in the naem like D.C., D.F. in Mexico and Brasil, etc. Al 2016-12-09 19:48:59 -05:00
  • d575caba8a [data] using UTC for libpostal data files on the Mac version of the download script as well Al 2016-12-09 19:43:05 -05:00
  • c3f3896b48 [fix] update test for date function in data download script Al 2016-12-09 19:29:00 -05:00
  • 675552d254 [addresses] using normalized tokens when stripping off compound place names for things like D.C. Al 2016-12-09 17:50:08 -05:00
  • c0a468d7e8 [normalization] adding a normalize_token function and some token options for deleting periods Al 2016-12-09 17:46:26 -05:00
  • 318773ffe7 [parser] header changes for the data set struct Al 2016-12-09 13:37:45 -05:00
  • 69ca4a85ce [openaddresses] adding units to Olpympia training data Al 2016-12-09 03:45:15 -05:00
  • 8f30987bdf [fix] checking if building is a rail station Al 2016-12-09 02:57:47 -05:00
  • e92963de50 [openaddresses] adding new counties from OpenAddresses, strip commas option for thousands separators Al 2016-12-09 01:57:21 -05:00
  • b60b7c9009 [geoplanet] adding an index of state_districts, states, etc. that contain a city with an identical name. Alias to the city if it's the only contained place, otherwise don't allow the admin name without the city. Al 2016-12-08 04:01:02 -05:00
  • 640f70c05d [geoplanet] all_places table, specified dirs Al 2016-12-08 02:50:08 -05:00
  • f9945103ba [addresses] if suburb/city_district is already listed, and we're finding the closest city by point rather than by boundary, use the closest actual city, not something smaller like a village/hamlet Al 2016-12-08 02:39:27 -05:00
  • 28d9ef12c0 [geoplanet] fixing geoplanet aliases insert warning Al 2016-12-08 02:31:10 -05:00
  • 763c86dcd4 [geoplanet] add County to the names of US counties outside of Louisiana and Alaska, add Parish in Lousiana Al 2016-12-08 02:30:37 -05:00
  • 7d0c402a31 [openaddresses] adding Douglas County and Paulding County in GA. Jackson County and Rankin County in MS Al 2016-12-08 01:53:25 -05:00
  • c2c2822936 [openaddresses] adding today's changes from OpenAddresses Al 2016-12-07 17:00:07 -05:00
  • 55c2f18896 [dictionaries] adding US highway and US route expansions Al 2016-12-07 14:39:27 -05:00
  • 42861aa38c [names] adding New Zealand to places that normalize City as a suffix (not Australia though as it has some cities that actually do end in City) Al 2016-12-07 06:19:08 -05:00
  • 7436d9693a [names] adding new name_affixes call to replace both prefixes/suffixes in one call, using in GeoPlanet training and the generic AddressComponents normalizations Al 2016-12-07 05:49:16 -05:00
  • 9386a999f6 [names] adding country-specific affixes and only normalizing the word City as a suffix in UK/Ireland Al 2016-12-07 05:37:25 -05:00
  • a9209fae37 [openaddresses] adding Kenton County, KY Al 2016-12-06 23:04:21 -05:00
  • b69914ff18 [openaddresses] adding Kansas City, MO Al 2016-12-06 22:56:31 -05:00
  • 3ff472c8cf [openaddresses] fixing house numbers with multiple consecutive hyphens Al 2016-12-06 22:50:14 -05:00
  • ae527ef5b1 [fix] indentation Al 2016-12-06 19:03:13 -05:00
  • 78615bf29c [places] higher probability of state_district for non-city Ireland Al 2016-12-06 18:15:35 -05:00
  • fddf21d1c1 [boundaries] moving Ireland counties back to state_district, regions to state (as they're typically used as admin1 in ISO, etc.) Al 2016-12-06 17:05:22 -05:00
  • aae8a8acf0 [boundaries] adding a few more common prefixes (looks like in Ireland it's common enough to remove the County prefix) Al 2016-12-06 17:04:09 -05:00
  • fadf0ca66b [openaddresses] filename for Ward County, ND Al 2016-12-06 15:55:33 -05:00
  • 29590be406 [openaddresses] adding Kalmar, Sweden and Fribourg, Switzerland Al 2016-12-06 15:51:10 -05:00
  • e13787a6f6 [fix] var name again Al 2016-12-05 18:49:23 -05:00
  • e1c6eff5e2 [fix] var Al 2016-12-05 18:46:49 -05:00
  • da36b71829 [addresses] adding new places index in OSM and OpenAddresses training data Al 2016-12-05 18:34:09 -05:00
  • 628fecea59 [addresses] adding point-based city/equivalent reverse geocoding for places that don't have as many defined polygons in OSM Al 2016-12-05 18:30:27 -05:00
  • 8509fe3ac0 [dictionaries] English dictionary fix Al 2016-12-05 18:24:27 -05:00
  • f87f0df717 [places] adding generic place index for reverse geocoding to points Al 2016-12-05 02:01:46 -05:00
  • e32c232c67 [localities] /planet-neighborhoods/planet-localities/ Al 2016-12-04 23:05:11 -05:00
  • cca80b046c [abbreviation] fixing abbreviations within hyphenated phrases, particularly for prefix/suffix matches Al 2016-12-03 17:55:11 -05:00
  • 22c4e99ea0 [parser] As part of reading/tokenizing the address parser data set, several copies of the same training example will be generated. Al 2016-12-02 13:09:03 -05:00
  • adab232674 [osm] don't include rail stations with no venue phrases (if there's a railway station at Foo, only include it if it's named "Foo Station", not just plain "Foo") Al 2016-12-01 02:03:38 -05:00
  • 7bfee0c1d3 [openaddresses] adding some of the new/fixed counties from upstream OA Al 2016-12-01 01:42:16 -05:00
  • 87d2463c9f [openaddresses] adding Texas statewide Al 2016-11-30 21:42:04 -08:00
  • 4b35da629f [numex] regenerated numex data file Al 2016-11-30 15:58:55 -08:00
  • 4677874610 [parser] stripping postal codes of phrases like CP (in Spanish) before adding them to the gazetteers, whether it's concatenated or a separate token. Adding a command-line argument for the number of iterations Al 2016-11-30 15:58:03 -08:00
  • 0e29cdd9fd [parser] fixing some uninitialized value issues during parser training Al 2016-11-30 15:42:09 -08:00
  • f5a6bd0f36 [fix] sparse_matrix_new_from_matrix uses new matrix types Al 2016-11-30 10:15:12 -08:00
  • b639fa5127 [utils] string_replace also creates a copy Al 2016-11-30 10:09:33 -08:00
  • 5a7e73e2a1 [openaddresses] adding Plumas County, CA and York County, PA Al 2016-11-29 11:37:28 -08:00
  • 1b18fd0b36 [openaddresses] adding Tehama County, CA Al 2016-11-28 16:25:23 -08:00
  • 89f6611c4e [strings] string_trim makes a copy rather than modifying the pointer Al 2016-11-28 15:06:07 -08:00
  • d922d9a60a [expansion] regenerated address_expansion_data.c Al 2016-11-28 10:47:15 -08:00
  • f67ebe8711 [openaddresses] Tulare County, CA Al 2016-11-28 10:02:10 -08:00
  • f78281456a [fix] header defintion Al 2016-11-27 01:00:25 -08:00
  • eea11beb6a [expansion] using easier-to-access data structure for address dictionaries Al 2016-11-27 00:56:20 -08:00
  • 7de2aa21cd [boundaries] increasing state probability for Venezuela and India Al 2016-11-26 16:20:08 -08:00
  • df803ab53d [openaddresses] Ward County, ND Al 2016-11-26 12:09:17 -08:00
  • ef243fbb18 [fix] var name Al 2016-11-25 13:41:07 -08:00
  • cdbc102821 [boundaries] in addition to population, check if a city has an unambiguous Wikipedia Al 2016-11-25 13:29:59 -08:00
  • 78c1a40708 [boundaries] more UK exceptions (strategy here is not to confuse the parser with districts that share names with cities. Only affects addresses with no city already specified and would be similar to listing a postal city) Al 2016-11-25 02:52:38 -08:00
  • 08b420cd1f [boundaries] a few more UK exceptions Al 2016-11-25 02:19:40 -08:00
  • 06fcb73a08 [boundaries] two more exceptions in Wales and Scotland Al 2016-11-25 02:16:07 -08:00
  • f8fc59e384 [boundaries] a few more UK exceptions for non-metropolitan districts which can basically be regarded as cities Al 2016-11-25 01:44:14 -08:00
  • eda4358a01 [fix] exception for Eastleigh, UK Al 2016-11-25 01:28:05 -08:00
  • f171d849ca [boundaries] exception for Chichester, UK Al 2016-11-25 01:17:10 -08:00
  • 87634a36e1 [openaddresses] for cases where city populations are not known (i.e. not getting boundaries from OSM, most of the sources in OpenAddresses), place-only records should have at least two identifying components. Helps when city names, etc. are highly ambiguous and need to be qualified Al 2016-11-25 00:56:38 -08:00
  • 5118bbc6b9 [places] lower dropout probabilities for country field Al 2016-11-25 00:44:52 -08:00
  • 5c3ccc3bc6 [places] better handling of population exceptions in places config Al 2016-11-25 00:37:57 -08:00
  • 4e10dc47e1 [places] adding a few more exceptions to places config, making state/country required for smaller cities Al 2016-11-24 23:41:20 -08:00
  • 5a8ea5c3b9 [openaddresses] fix for Arapahoe County, CO. Had CO listed as the city Al 2016-11-24 04:18:46 -05:00
  • 89cacbdb0e [openaddresses] El Paso County, CO Al 2016-11-24 04:03:21 -05:00
  • e07c74f077 [fix] config Al 2016-11-24 03:57:52 -05:00
  • f72b576f39 [fix] indentation Al 2016-11-24 03:55:37 -05:00
  • 46b7043dc7 [fix] typo Al 2016-11-24 03:50:11 -05:00
  • da882a4195 [names] adding "District Municipality of" to ignorable prefixes Al 2016-11-24 03:48:54 -05:00
  • 1e1f00670b [openaddresses] fixing some cities that I thought were counties Al 2016-11-24 03:45:52 -05:00
  • fcf4717335 [openaddresses] adding city_replacements handling to OA formatter Al 2016-11-23 20:15:47 -05:00
  • 1ccca9086f [openaddresses] add city_replacements for all files using OSM boundaries (replace with known county or city) Al 2016-11-23 16:14:25 -05:00
  • 3dc2a922fb [addresses/languages] if there's only one default language and we don't have a road name or a unicode script to disambiguate, assume the default (e.g. English in the US unless there's a Spanish/French road name). Can affect things like state abbreviations Al 2016-11-22 18:27:23 -05:00
  • 3c5e2afeed [boundaries] exception for Cardiff, Wales Al 2016-11-22 13:16:21 -05:00
  • 49054932ad [boundaries] adding city replacements for South Africa Al 2016-11-22 12:05:14 -05:00
  • ee6edbbd91 [countries] take first encountered country code instead of reversing the components (for cases like Puerto Rico, Hong Kong, etc.) Al 2016-11-22 11:55:41 -05:00
  • ee8c070fd5 [osm] override admin_level with other components in config if present Al 2016-11-22 11:22:26 -05:00
  • e8a3d256f9 [boundaries] adding more exceptions for some of the UK's unitary authorities that are basically equivalent to city boundaries (smaller towns within the boundary can still override) Al 2016-11-22 10:46:58 -05:00
  • aa1f4fdd20 [places] adding section called city_replacements to places config, for countries where something like the state_district/county, suburb or city_district should stand in for the city when one cannot be reverse geocoded (unincorporated county addresses, etc.) Al 2016-11-22 09:51:04 -05:00
  • 480796f46f [osm] trying representative_point() on the unfixed polygons to capture some cases where the geometry still needs to be fixed before it's valid Al 2016-11-22 01:28:02 -05:00
  • bf1928d8c0 [boundaries] making admin_level=6 city for Mexico as municipalities are the main type of boundary found in OSM Al 2016-11-22 01:03:50 -05:00
  • f5ac51ab9f [boundaries] fixing config structure for a few countries Al 2016-11-21 19:06:52 -05:00
  • 9ae30b598b [openaddresses] updating the Canada province-wide data sets to the new format Al 2016-11-21 18:44:19 -05:00
  • ff086a6bb9 [boundaries] exception for Calgary, CA Al 2016-11-21 18:08:18 -05:00
  • 7298c895c8 [utils] adding a chunked shuffle as the concatenated file sizes may get larger than memory Al 2016-11-21 14:04:34 -05:00
  • eff0443fcf [openaddresses city_of_flint, not flint Al 2016-11-20 17:24:23 -05:00
  • 1d05c98cc4 [openaddresses] add Bucks County, PA Al 2016-11-20 13:02:10 -05:00
  • a596d03309 [fix] return values Al 2016-11-19 12:45:39 -05:00
  • 1ef3d073db [dictionaries] adding green to place names Al 2016-11-19 04:24:25 -05:00
  • e15036fcce [fix] if there are street types that are not venue words and not vice versa, then call the venue invalid as a standalone term Al 2016-11-19 04:11:33 -05:00
  • 8e905fd17d [fix] if no venue names are passed in to formatted_addresses_with_venue_names, remove any existing venue name from the components as well Al 2016-11-19 03:46:16 -05:00
  • e6fe576ec7 [fix] var Al 2016-11-19 03:15:23 -05:00
  • 1f50481cad [fix] args Al 2016-11-19 03:14:06 -05:00
  • 4d14f80f0c [osm] using the new gazetteer methods to do more thorough checks on single house names (if there are no other components than the standalone venue name, make sure it contains venue words like {library, bar}, etc. and not street type words like {road, street}, etc. so we don't get training examples that are simply "Abbey/house Road/house" with no house number or street name). If the venue name equals the street name or house number, drop it. Same if the venue name equals one of the admin components and no house number or street is present. If the venue name is numeric, require both a house number and a street name. Al 2016-11-19 03:12:24 -05:00