Commit Graph

  • aeddcb5606 [openaddresses] OpenAddresses config specifying a few files Al 2016-05-31 01:40:21 -04:00
  • fb80a345c5 [openaddresses] Fetch script for OpenAddresses Al 2016-05-31 01:39:04 -04:00
  • 4360f8b698 [addresses] Making address_language a classmethod Al 2016-05-31 01:20:05 -04:00
  • 4e66f81d1b [intersections] Only requiring a tag to share at least two ways Al 2016-05-30 23:10:04 -04:00
  • 6f34559f2d [intersections] Adding intersections to config Al 2016-05-30 23:08:00 -04:00
  • ce55fb2a34 [fix] name Al 2016-05-30 23:06:45 -04:00
  • 27d2a05a27 [fix] input file Al 2016-05-30 22:12:56 -04:00
  • eff7ad9163 [fix] args Al 2016-05-30 22:12:39 -04:00
  • b1c81f9405 [fix] add ways db dir Al 2016-05-30 22:07:01 -04:00
  • 01fd66f4eb [fix] name Al 2016-05-30 22:01:17 -04:00
  • 2dcdc741a2 [fix] logging for intersections data Al 2016-05-30 22:00:28 -04:00
  • ebd8101b57 [fix] import Al 2016-05-30 22:00:14 -04:00
  • fcf55720dc [fix] import Al 2016-05-30 21:58:12 -04:00
  • 3825c36523 [intersections] intersections training data Al 2016-05-30 21:50:45 -04:00
  • 8cb0c5ee8b [intersections] Adding places to intersection template, intersection phrase generator Al 2016-05-30 21:07:14 -04:00
  • 006d15dbac [fix] import Al 2016-05-30 14:53:55 -04:00
  • 5c92185e71 [tokenization] Reverting commit for tokenizing initial/final apostrophes as part of words as it may be more effective to handle during post-processing Al 2016-05-30 11:59:37 -04:00
  • b23f07b679 [parser] Using new geonames designations in parser features Al 2016-05-29 01:40:45 -04:00
  • bbddfe25bf [parser] Using NFC normalization for parser as well, @ sign not defined as separator since it may also be used in intersections Al 2016-05-29 01:37:38 -04:00
  • 1ac077914b [geodb] Adding separate bitset for geonames place types and using NFC normalization instead of NFD (requires retraining) Al 2016-05-29 01:36:00 -04:00
  • 1d1ada1bc1 [normalize] Adding NORMALIZE_STRING_COMPOSE for NFC unicode normalization Al 2016-05-28 19:25:12 -04:00
  • 1fd57fdda3 [tokenization] Adding ability to tokenize 's Gravenhage Al 2016-05-28 19:24:19 -04:00
  • 514aaf7377 [fix] warnings/size_t in libpostal.c Al 2016-05-28 19:19:31 -04:00
  • c0e8578b9c [gazetteers] Adding new gazetteer types/address components Al 2016-05-28 19:19:18 -04:00
  • acd97a0081 [dictionaries] Adding letra to Spanish numbered unit dictionaries Al 2016-05-28 19:15:02 -04:00
  • bac86be6a3 [dictionaries] Adding new dictionary types to generator script Al 2016-05-28 17:16:43 -04:00
  • cff23c77ab [boundaries] Adding Bucharest sectors as city_district Al 2016-05-27 20:22:56 -04:00
  • 5e0e22a666 [dictionaries] More dictionary refactoring Al 2016-05-27 19:40:20 -04:00
  • 5590c89a5e [addresses] Allowing null_phrase_probability for alpha, and alpha+digits instead of just for ordinals (mostly for Spain) Al 2016-05-27 13:40:38 -04:00
  • bdd6d99f56 [addresses] Adding increasing null_phrase_probability for plain numerics in Spain so things like 2o B make it into the training data Al 2016-05-27 13:37:43 -04:00
  • cc453cfbbd [places] setting probability of including island to 0.5 for Hawaii, 0.8 seems too high given all the Honolulu, HI addresses (not often seen as Honolulu, Oahu, HI) Al 2016-05-27 11:32:52 -04:00
  • f69d9e2e1c [dictionaries] Italian CAP abbreviations Al 2016-05-27 11:31:16 -04:00
  • fc96cf145f [dictionaries] Russian place names Al 2016-05-27 11:28:50 -04:00
  • ec0df1410b [dictionaries] Adding more fleshed out Greek dictionaries from a recent Nominatim NameFinder wiki update Al 2016-05-27 11:28:23 -04:00
  • dccbdc4ccc [dictionaries] Refactoring existing unit_types/level_types dictionaries to use the new more granular dictionary structure Al 2016-05-27 11:27:34 -04:00
  • 572759885f [parser] Sample chain store alternate names from the cross-language dictionary Al 2016-05-26 12:09:10 -04:00
  • 5daa64faef [parser] Fixing config keys so OSM streets/venues get abbreviated. Selecting namespaced address fields in cases like Brussels or Hong Kong where everything is bilingual. Adding the ability to pass a known language into address component expansion Al 2016-05-26 12:05:46 -04:00
  • 206a471732 [fix] loading transliteration module in address_parser_test.c as well Al 2016-05-25 19:54:01 -04:00
  • 34f5d833a2 [fix] ON needs to be quotes in YAML, uppercase Yukon abbreviation Al 2016-05-25 19:12:15 -04:00
  • f59150b047 [fix] cstring_array_split calls Al 2016-05-25 17:58:30 -04:00
  • 5065917f41 [fix] brace Al 2016-05-25 17:52:00 -04:00
  • 679d3efcdc [parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent Al 2016-05-25 17:50:29 -04:00
  • 370744ccfd [utils] Adding cstring_array_split_ignore_consecutive Al 2016-05-25 17:07:20 -04:00
  • 5c7d24c71b [fix] calls and NULL checks Al 2016-05-25 15:50:53 -04:00
  • 349df20720 [fix] tokenized_string_t should copy its source string Al 2016-05-25 15:47:57 -04:00
  • 00784a897d [fix] Need to load transliteration module for Latin-ASCII normalization Al 2016-05-25 15:25:34 -04:00
  • bf50d27b0e [places] Adding Town of to English prefixes Al 2016-05-25 11:23:31 -04:00
  • 5a88294dbc [parser] lower full-name probability for states Al 2016-05-25 00:47:36 -04:00
  • 5377a831ab [fix] use simple language code if language_script cannot be found Al 2016-05-24 19:49:08 -04:00
  • a4064ecd02 [fix] global formatter config Al 2016-05-24 19:44:40 -04:00
  • 3661a1e5eb [fix] config key name Al 2016-05-24 19:39:12 -04:00
  • 26bbd2916b [fix] neighborhood reverse geocoder using the new OSM definitions module which keeps track of whatever the data fetching script defines as being a valid {neighborhood, admin boundary, etc.} Al 2016-05-24 19:27:22 -04:00
  • 1a66fc3396 [boundaries] lines sharing a point are added to the polygon head-to-tail, reversing the node order as needed, produces accurate OSM polygons for reverse geocoding lookups Al 2016-05-24 19:24:37 -04:00
  • 206cd56cd2 [fix] moving language code replacements out of address components Al 2016-05-24 16:55:46 -04:00
  • c4aebeebc3 [boundaries] admin_level=8 is city_district in Japan Al 2016-05-24 16:53:42 -04:00
  • bdb6bb03e3 [formatting] Moving language country overrides to formatter config so actual language is retained Al 2016-05-24 16:52:08 -04:00
  • 97582e9c64 [fix] place=municipality Al 2016-05-24 15:35:33 -04:00
  • 6af06d904a [fix] OSM neighborhood ids Al 2016-05-24 15:13:07 -04:00
  • c4eab01176 [fix] Adding basic Han numeral replacement to neighborhood deduping Al 2016-05-24 14:55:54 -04:00
  • a5a24fb3b9 [fix] component bitsets Al 2016-05-24 13:07:32 -04:00
  • cf2bbcb4e0 [fix] language format changes only apply to local languages Al 2016-05-24 12:59:32 -04:00
  • bb2da53311 [formatting] Increase probability of postcode before city Al 2016-05-24 12:21:04 -04:00
  • aedb249ad7 [languages] Use English formats for Romanized CJK Al 2016-05-24 12:13:58 -04:00
  • 7186cf13de [fix] floor samples Al 2016-05-24 11:16:57 -04:00
  • eb83ae91cb [fix] Don't remove chome from Japanese, as the neighborhoods are usually just plain numbers Al 2016-05-23 18:17:04 -04:00
  • 028b7a460e [fix] args Al 2016-05-23 17:42:34 -04:00
  • 48a41eaceb [fix] US/Canada probabilities for industrial/commercial Al 2016-05-23 16:22:27 -04:00
  • f2f98043ab [boundaries] Adding CP and civil parish to English place suffixes Al 2016-05-23 15:47:57 -04:00
  • 32e017a3ab [osm] Venue name depends on one of {house_number, road, suburb, city_district, city, postcode} Al 2016-05-23 15:46:59 -04:00
  • 5f78d4f3a0 [fix] Spanish office probabilities Al 2016-05-23 15:35:55 -04:00
  • 698804b230 [fix] floors Al 2016-05-23 15:18:10 -04:00
  • b8e43fa7f8 [fix] args again Al 2016-05-23 15:01:58 -04:00
  • d6c11dde0f [fix] args Al 2016-05-23 14:59:22 -04:00
  • 1e2ffd9847 [subdivisions/buildings] Adding subdivisions and buildings rtree to training data for getting building height, zone Al 2016-05-23 14:51:44 -04:00
  • dbc41a931b [subdivisions] Adding zone types Al 2016-05-23 14:45:55 -04:00
  • edff5b9730 [fix] removing unnecessary vars Al 2016-05-23 13:04:25 -04:00
  • b0f49db9be [fix] all_names returns a list not a set Al 2016-05-23 13:04:00 -04:00
  • f20cff3b2a [osm] venue names Al 2016-05-23 12:51:28 -04:00
  • 85b3532333 [fix] language disambiguation Al 2016-05-23 11:54:36 -04:00
  • 9f95bdd4d0 [fix] set Al 2016-05-23 11:44:49 -04:00
  • bd341417a3 [languages] Adding script-only disambiguation Al 2016-05-23 11:17:59 -04:00
  • e6157915af [fix] parent streets Al 2016-05-23 10:22:25 -04:00
  • 8b87d224c9 [parser/osm] Adding address sans name for venues probabilistically Al 2016-05-23 05:28:37 -04:00
  • 5d590acbe0 [fix] place components Al 2016-05-23 05:21:00 -04:00
  • c27a1ca450 [fix] dependencies Al 2016-05-23 05:12:12 -04:00
  • 007deafc73 [fix] drop invalid components Al 2016-05-23 05:09:21 -04:00
  • c86451b66e [fix] check for None in chain store query formatting Al 2016-05-23 04:31:41 -04:00
  • e16bf93f2e [fix] args Al 2016-05-23 04:09:43 -04:00
  • 5aae4a22bb [fix] import Al 2016-05-23 04:07:21 -04:00
  • 49a930eb7a [fix] field name Al 2016-05-23 04:03:13 -04:00
  • 93cdbb1b73 [fix] filenames Al 2016-05-22 15:10:20 -04:00
  • 8dd747296b [fix] import Al 2016-05-22 15:04:33 -04:00
  • c31b2ecba7 [fix] constructor Al 2016-05-22 12:52:54 -04:00
  • 9ba386f594 [fix] cli arg name Al 2016-05-22 12:50:59 -04:00
  • aa7328cdee [fix] no need to init language, etc. in new script Al 2016-05-22 12:30:43 -04:00
  • 69e814de4b [fix] coding=utf8 Al 2016-05-22 12:28:42 -04:00
  • fdc7c782aa [fix] cleaning up imports Al 2016-05-22 12:27:50 -04:00
  • e7b1022371 [osm] Same great training script, only shorter Al 2016-05-22 12:22:37 -04:00
  • 49312e163f [parser/osm] OSM address formatter using the new component expansion Al 2016-05-22 12:21:50 -04:00
  • cb78598131 [fix] return value in Chain.query Al 2016-05-22 12:07:43 -04:00