Commit Graph

3442 Commits

Author SHA1 Message Date
Al
ebaef4d671 [places] Implementation of population-based exceptions for adding OSM boundary components 2016-07-30 18:52:55 -04:00
Al
20aad99a38 [parser] enum just lists boundary types 2016-07-30 17:07:23 -04:00
Al
965bac1833 [trie] Making methods to construct string phrases from phrase matches available through trie_search.h 2016-07-30 17:06:20 -04:00
Al
469332ffc4 [osm/polygons] Reducing cache_size to 250k now that the polygons are larger 2016-07-30 16:44:59 -04:00
Al
5bfc29d3f6 [osm/places] Using num_references / 2 for non-default languages and min_references / 2 for alternate name tags 2016-07-30 12:46:54 -04:00
Al
3d20bd13c3 [osm] Add population to reverse geocoder properties 2016-07-30 12:25:39 -04:00
Al
a45ff88f5f [osm/polygons] Don't simplify OSM polygons, might have memory 2016-07-29 12:53:13 -04:00
Al
f8c8d05997 [fix] same thing for the exception countries 2016-07-29 12:47:08 -04:00
Al
045eab8e58 [osm] Making ISO codes lower probability for reverse geocoded country as well 2016-07-29 12:30:32 -04:00
Al
09b16d954f [osm] Use much lower probability of ISO country codes 2016-07-29 11:41:39 -04:00
Al
9dc52ea3c4 [osm] Add more English + non-local language names for places in OSM 2016-07-29 10:31:26 -04:00
Al
ed0b867c13 [osm] For formatting places from the polygon index, use centroid if representative_point fails 2016-07-29 07:13:41 -04:00
Al
f38bb151e2 [fix] var name 2016-07-28 23:53:55 -04:00
Al
08f39d6b80 [parser] Adding address_parser_rewind to make multiple passes through the file when compiling the phrase tries 2016-07-28 17:13:58 -04:00
Al
1b09b7f2e5 [fix] Adding country_region to address_parser_train 2016-07-28 16:18:32 -04:00
Al
21bcbd8381 [fix] restoring CLDR probability 2016-07-28 15:21:44 -04:00
Al
c6af5cc071 [parser] Adding country_region label to parser as a boundary component 2016-07-28 15:19:48 -04:00
Al
854e6d901f [osm] Add CLDR country before dropout 2016-07-28 14:41:14 -04:00
Al
bebb33fe64 [osm] Include CLDR country even if the place didn't match simplified OSM polygons 2016-07-28 14:11:31 -04:00
Al
ea1226082e [fix] wrong instance 2016-07-28 02:56:17 -04:00
Al
fc118acd90 [fix] language None for ambiguous case 2016-07-28 02:48:45 -04:00
Al
db51cc91c2 [fix] property 2016-07-28 02:41:26 -04:00
Al
543048bc26 [osm] use CLDR country names with random probability 2016-07-28 02:37:12 -04:00
Al
095c808cea [places] increasing country probabilities, state probabilities in Mexico and Brasil 2016-07-28 02:26:51 -04:00
Al
d276611b9c [fix] poly.context 2016-07-28 01:46:12 -04:00
Al
88353b75e0 [fix] more helpful error message if there are errors with the formatting config 2016-07-27 19:14:30 -04:00
Al
21033537a2 [fix] US insertion config 2016-07-27 19:13:59 -04:00
Al
a4a74aec7f [osm] Updating formatting config for all the languages/countries currently implemented 2016-07-27 17:45:18 -04:00
Al
f8d185aaff [osm/formatting] Tag commas in a given labeld component with the SEP tag so e.g. concatenated districts can be counted as separate phrases 2016-07-27 16:13:57 -04:00
Al
750037330e [boundaries] Updated boundaries for Slovakia to capture city districts, etc. 2016-07-27 14:07:36 -04:00
Al
4cc49b7ca4 [fix] typo 2016-07-27 12:48:35 -04:00
Al
9e61b9409f [osm] For componens at or below the city level that are the admin_center of their smallest containing boundary with the same name, use the boundary's component name instead of the point's 2016-07-27 12:46:43 -04:00
Al
d9b70d3404 [fix] mapping the nodes for NYC boroughs to city_district 2016-07-27 12:22:50 -04:00
Al
ad4da98bd7 [fix] lowercase language code 2016-07-27 11:51:17 -04:00
Al
3f4c18ddb6 [fix] None case for names 2016-07-27 01:16:05 -04:00
Al
4e14926169 [osm] choosing random name for semicolons and first name for commas in OSM name components 2016-07-27 01:06:14 -04:00
Al
862c1b677e [fix] minimum of 5 references for unknown populations 2016-07-27 00:31:31 -04:00
Al
985ea79e02 [fix] cap the number of population-based references 2016-07-26 22:38:41 -04:00
Al
9a95c4c82f [fix] typo 2016-07-26 21:04:10 -04:00
Al
51f9d06a85 [fix] for commas in OSM place names, pick the first 2016-07-26 21:00:28 -04:00
Al
da7a5e46c7 [osm] Zero fill number ranges like 01234-01240 2016-07-26 20:53:39 -04:00
Al
a89d7f71d7 [fix] if component name can't be mapped, return None 2016-07-26 20:34:31 -04:00
Al
274f31b37e [osm] map place=district to state_district 2016-07-26 20:30:47 -04:00
Al
53cbb52cb2 [languages] Adding Tibetan language to regional languages for the Tibet region 2016-07-26 19:07:37 -04:00
Al
614300d423 [fix] typo 2016-07-26 18:37:48 -04:00
Al
bdba0a4200 [osm] In the case of semicolon delimited names, choose one at random 2016-07-26 18:20:56 -04:00
Al
0c1b12b65c [fix] Use local language with script e.g. ja_rm in place training data 2016-07-26 18:00:38 -04:00
Al
72c3723b43 [osm] Validate postcode with a regex for the given country code before sending on to parser_osm_number_range (some postcodes can also look like ranges e.g. 83-101 so validate for the given country) 2016-07-26 17:45:23 -04:00
Al
1ef57ee7d2 [i18n/postcodes] Fetching postcode regexes from the data source used by Google's libaddressinput, caches requests for the length of the running program (e.g. generating parser data, so the regexes will get updated over time). 2016-07-26 17:42:50 -04:00
Al
50b5eb7ea4 [fix] make place_tags iterable in the null case 2016-07-26 03:16:26 -04:00