Commit Graph

  • 0f3c4276b4 [fix] args Al 2016-07-31 19:53:35 -04:00
  • 0827caf578 [fix] sample=true Al 2016-07-31 19:51:03 -04:00
  • 3871869d4b [osm] Check that OSM venue names contain at least one word-like token Al 2016-07-31 19:50:45 -04:00
  • ce17b50064 [fix] canonical probability Al 2016-07-31 19:16:46 -04:00
  • 0bdcae252f [fix] building tag updates Al 2016-07-31 18:43:55 -04:00
  • 3a19506121 [fix] containing ids Al 2016-07-31 18:30:46 -04:00
  • d04a627e92 [fix] KeyError Al 2016-07-31 18:29:29 -04:00
  • 92b8566930 [places] Increase probability of state and decrease probability of county for smaller ciites/towns Al 2016-07-31 03:26:34 -04:00
  • 3f450054f9 [fix] numeric conditions in place config Al 2016-07-31 03:15:43 -04:00
  • 99333d58ca [fix] conditions in place config Al 2016-07-31 03:09:51 -04:00
  • cec4914233 [openaddresses] In some OpenAddresses data sets, the house number is just a copy of the street name, so eliminate non-numeric house numbers to be safe Al 2016-07-31 01:12:04 -04:00
  • f8e9d39e12 [places] Implementing population-based place components in both place and address component expansion Al 2016-07-30 19:14:59 -04:00
  • bb91a5b0f0 [places] For the US, add state_district (county) with higher probability for towns with higher populations. Helps with cases that would be difficult to get right otherwise like Brooklyn, Cattaraugus County, NY (http://www.openstreetmap.org/node/158644800) Al 2016-07-30 18:57:28 -04:00
  • ebaef4d671 [places] Implementation of population-based exceptions for adding OSM boundary components Al 2016-07-30 18:52:55 -04:00
  • 20aad99a38 [parser] enum just lists boundary types Al 2016-07-30 17:07:23 -04:00
  • 965bac1833 [trie] Making methods to construct string phrases from phrase matches available through trie_search.h Al 2016-07-30 17:06:20 -04:00
  • 469332ffc4 [osm/polygons] Reducing cache_size to 250k now that the polygons are larger Al 2016-07-30 16:44:59 -04:00
  • 5bfc29d3f6 [osm/places] Using num_references / 2 for non-default languages and min_references / 2 for alternate name tags Al 2016-07-30 12:46:54 -04:00
  • 3d20bd13c3 [osm] Add population to reverse geocoder properties Al 2016-07-30 12:25:35 -04:00
  • a45ff88f5f [osm/polygons] Don't simplify OSM polygons, might have memory Al 2016-07-29 12:53:13 -04:00
  • f8c8d05997 [fix] same thing for the exception countries Al 2016-07-29 12:47:08 -04:00
  • 045eab8e58 [osm] Making ISO codes lower probability for reverse geocoded country as well Al 2016-07-29 12:30:32 -04:00
  • 09b16d954f [osm] Use much lower probability of ISO country codes Al 2016-07-29 11:41:39 -04:00
  • 9dc52ea3c4 [osm] Add more English + non-local language names for places in OSM Al 2016-07-29 10:31:26 -04:00
  • ed0b867c13 [osm] For formatting places from the polygon index, use centroid if representative_point fails Al 2016-07-29 07:13:41 -04:00
  • f38bb151e2 [fix] var name Al 2016-07-28 23:53:55 -04:00
  • 08f39d6b80 [parser] Adding address_parser_rewind to make multiple passes through the file when compiling the phrase tries Al 2016-07-28 17:13:49 -04:00
  • 1b09b7f2e5 [fix] Adding country_region to address_parser_train Al 2016-07-28 16:18:32 -04:00
  • 21bcbd8381 [fix] restoring CLDR probability Al 2016-07-28 15:21:44 -04:00
  • c6af5cc071 [parser] Adding country_region label to parser as a boundary component Al 2016-07-28 15:19:48 -04:00
  • 854e6d901f [osm] Add CLDR country before dropout Al 2016-07-28 14:41:14 -04:00
  • bebb33fe64 [osm] Include CLDR country even if the place didn't match simplified OSM polygons Al 2016-07-28 14:11:31 -04:00
  • ea1226082e [fix] wrong instance Al 2016-07-28 02:56:17 -04:00
  • fc118acd90 [fix] language None for ambiguous case Al 2016-07-28 02:48:45 -04:00
  • db51cc91c2 [fix] property Al 2016-07-28 02:41:26 -04:00
  • 543048bc26 [osm] use CLDR country names with random probability Al 2016-07-28 02:37:12 -04:00
  • 095c808cea [places] increasing country probabilities, state probabilities in Mexico and Brasil Al 2016-07-28 02:26:51 -04:00
  • d276611b9c [fix] poly.context Al 2016-07-28 01:46:12 -04:00
  • 88353b75e0 [fix] more helpful error message if there are errors with the formatting config Al 2016-07-27 19:14:30 -04:00
  • 21033537a2 [fix] US insertion config Al 2016-07-27 19:13:00 -04:00
  • 3e3950b37a Merge pull request #98 from uberbaud/posix_sh Al Barrentine 2016-07-27 18:44:11 -04:00
  • 18c8e90eb3 Use xargs to start workers as soon as possible Tom Davis 2016-07-27 17:46:44 -04:00
  • a4a74aec7f [osm] Updating formatting config for all the languages/countries currently implemented Al 2016-07-27 17:45:18 -04:00
  • f8d185aaff [osm/formatting] Tag commas in a given labeld component with the SEP tag so e.g. concatenated districts can be counted as separate phrases Al 2016-07-27 16:13:57 -04:00
  • 750037330e [boundaries] Updated boundaries for Slovakia to capture city districts, etc. Al 2016-07-27 14:07:36 -04:00
  • 4cc49b7ca4 [fix] typo Al 2016-07-27 12:48:35 -04:00
  • 9e61b9409f [osm] For componens at or below the city level that are the admin_center of their smallest containing boundary with the same name, use the boundary's component name instead of the point's Al 2016-07-27 12:46:40 -04:00
  • d9b70d3404 [fix] mapping the nodes for NYC boroughs to city_district Al 2016-07-27 12:22:50 -04:00
  • ad4da98bd7 [fix] lowercase language code Al 2016-07-27 11:51:17 -04:00
  • 3f4c18ddb6 [fix] None case for names Al 2016-07-27 01:16:05 -04:00
  • 4e14926169 [osm] choosing random name for semicolons and first name for commas in OSM name components Al 2016-07-27 01:06:14 -04:00
  • 862c1b677e [fix] minimum of 5 references for unknown populations Al 2016-07-27 00:31:31 -04:00
  • 985ea79e02 [fix] cap the number of population-based references Al 2016-07-26 22:38:36 -04:00
  • 9a95c4c82f [fix] typo Al 2016-07-26 21:04:10 -04:00
  • 51f9d06a85 [fix] for commas in OSM place names, pick the first Al 2016-07-26 21:00:28 -04:00
  • da7a5e46c7 [osm] Zero fill number ranges like 01234-01240 Al 2016-07-26 20:53:39 -04:00
  • a89d7f71d7 [fix] if component name can't be mapped, return None Al 2016-07-26 20:34:31 -04:00
  • 274f31b37e [osm] map place=district to state_district Al 2016-07-26 20:30:47 -04:00
  • 11abf6cb22 Use posix sh for systems without bash Tom Davis 2016-07-26 20:17:18 -04:00
  • 53cbb52cb2 [languages] Adding Tibetan language to regional languages for the Tibet region Al 2016-07-26 19:07:34 -04:00
  • 614300d423 [fix] typo Al 2016-07-26 18:37:48 -04:00
  • bdba0a4200 [osm] In the case of semicolon delimited names, choose one at random Al 2016-07-26 18:20:56 -04:00
  • 0c1b12b65c [fix] Use local language with script e.g. ja_rm in place training data Al 2016-07-26 18:00:38 -04:00
  • 72c3723b43 [osm] Validate postcode with a regex for the given country code before sending on to parser_osm_number_range (some postcodes can also look like ranges e.g. 83-101 so validate for the given country) Al 2016-07-26 17:45:23 -04:00
  • 1ef57ee7d2 [i18n/postcodes] Fetching postcode regexes from the data source used by Google's libaddressinput, caches requests for the length of the running program (e.g. generating parser data, so the regexes will get updated over time). Al 2016-07-26 17:42:29 -04:00
  • 50b5eb7ea4 [fix] make place_tags iterable in the null case Al 2016-07-26 03:16:26 -04:00
  • 5f0a3bce9c [fix] None tuple length if no matches can be found Al 2016-07-26 02:58:21 -04:00
  • 5448d9bff2 [fix] using UNKNOWN_LANGUAGE instead of None so it can be treated as a string downstream Al 2016-07-26 02:55:04 -04:00
  • 8b24072566 [fix] reference before assignment Al 2016-07-26 02:52:58 -04:00
  • 6c3128edee [fix] adding country_region to places config Al 2016-07-26 02:51:05 -04:00
  • 890f691d7d [fix] import Al 2016-07-26 02:47:03 -04:00
  • eff884986e [osm] Place component dropout in place training data Al 2016-07-26 02:41:41 -04:00
  • 5a9e5ef8dd [fix] iteration Al 2016-07-26 02:33:31 -04:00
  • 7b25d1edfb [fix] config updates for contained_by overrides in OSM admin components Al 2016-07-25 17:10:15 -04:00
  • eae7a6a78c [osm/boundaries] extend admin overrides in the UK to Greater London which includes London and the City of London Al 2016-07-25 16:56:29 -04:00
  • 38e67f5013 [boundaries] More fun with mapping UK admin boundaries. Non-metroplitan counties and non-metropolitan districts map to state_district. admin_level=6 maps to state district except for London where it's the city minus City of London. admin_level=8 (e.g. Manchester) maps to city except in London where it maps to city_district. admin_level=10 is suburb unless designation=civil_parish, in which case it's treated as a city boundary (individual towns/villages may be city or suburb depending on their place tag). Just complicated enough to be valid UK law :-). Al 2016-07-25 16:02:00 -04:00
  • 6a8209dc98 [places] Adding country_region to places config, increasing importance of county in England outside of London, increasing importance of city globally Al 2016-07-25 15:09:37 -04:00
  • 4b67cf79f4 [boundaries/osm] Mapping regions of England to state Al 2016-07-25 15:02:22 -04:00
  • 4e58a7c12e [test] Adding test for intersection phrases and fixing a test failure for the Czech config Al 2016-07-25 03:19:52 -04:00
  • ffece04855 [osm] Place training data from OSM script Al 2016-07-25 02:45:16 -04:00
  • 4d94495d45 [osm] place training data comes from both admin nodes and the polygons in the OSM index (using representative_point) Al 2016-07-25 02:39:53 -04:00
  • 024d47a8a5 [osm] Adding admin_center handling to OSM address components Al 2016-07-25 02:14:51 -04:00
  • 1058b17a61 [osm] Moving admin_center overrides to OSM parser config Al 2016-07-25 01:58:33 -04:00
  • c9aa0bc913 [boundaries/osm] Use name:en most of the time for New Zealand and occasionally name Al 2016-07-25 01:53:43 -04:00
  • 776145cf8e [osm] Adding new option to control whether we drop non-city OSM boundary names that have the same name as the enclosed city Al 2016-07-25 01:24:13 -04:00
  • 1ccea09a92 [osm] Don't call components.normalize_place_names in OSM address formatting, only add place components population / 10000 + 1 times for the name tag itself, not loc_name, int_name, etc. Al 2016-07-25 01:16:27 -04:00
  • 3957aea430 [fix] add postal_code alias Al 2016-07-25 00:48:55 -04:00
  • ee795211bc [polygons] Include designation in OSM admin properties (for UK) Al 2016-07-25 00:27:27 -04:00
  • f0dea9cba1 [fix] No random_key for non-local languages Al 2016-07-25 00:16:22 -04:00
  • b31d71bbc1 [fix] parens Al 2016-07-25 00:14:36 -04:00
  • e5b84205bc [osm] Use int_name tag and add English boundary names even if only a raw name is available for the original place node Al 2016-07-25 00:12:53 -04:00
  • b50cb0cdf9 [osm] add random variations of the containing components' names in building place training data. For places with small or unknown populations, use the default names of the containing components Al 2016-07-25 00:04:44 -04:00
  • dbc5957fa6 [fix] reverting, random state abbreviations should be fine Al 2016-07-24 23:47:06 -04:00
  • cf84b5727e [osm] always_use_full_names=True for encompassing boundaries on place queries Al 2016-07-24 23:21:14 -04:00
  • 0fa372f2c0 [fix] tags.get as nodes may not have type/id Al 2016-07-24 23:04:09 -04:00
  • 273f5ecf58 [fix] language defaults Al 2016-07-24 23:02:39 -04:00
  • 43e6f2433a [fix] use ISO3166-1:alpha2 Al 2016-07-24 23:00:59 -04:00
  • 53906c4833 [fix] parens Al 2016-07-24 22:57:58 -04:00
  • 38b76701d8 [osm] Falling back on OSM country/languages if the point doesn't match the Quattroshapes geometry Al 2016-07-24 22:56:53 -04:00
  • 4b26962793 [osm] Don't return language from node_place_tags as the list of tags contains the various languages already Al 2016-07-24 22:17:42 -04:00