Commit Graph

  • f137d68e12 [intersections] only juction=yes and highway=traffic_signals count as intersections, should eliminate points that are simply joining two segments of the same road Al 2016-08-18 02:53:49 -04:00
  • 93586c2592 [fix] aliasing all_languages Al 2016-08-18 02:24:59 -04:00
  • 688f103e80 [fix] languages Al 2016-08-18 02:24:34 -04:00
  • e3ac3200b3 [fix] disambiguating languages using one of the default street names in intersections data Al 2016-08-18 02:05:13 -04:00
  • 328398813a [fix] itertools.combinations Al 2016-08-18 01:26:48 -04:00
  • 737cbf4457 [fix] reference before assignment Al 2016-08-18 01:24:30 -04:00
  • b41ba7374b [intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base Al 2016-08-18 01:19:02 -04:00
  • 701bcb1d79 [intersections] Using name cleanup on intersections, including tiger:name_base which sometimes has semicolon delimiters as well Al 2016-08-17 18:47:07 -04:00
  • 7b314324ca [osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method Al 2016-08-17 18:45:33 -04:00
  • 145af9331e [osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time Al 2016-08-17 18:11:55 -04:00
  • a3ae1eb330 [intersections] Adding a read classmethod to intersections to read the intermediate JSON file Al 2016-08-17 15:29:59 -04:00
  • 96c753e8c6 [fix] adding logging on new intersections script Al 2016-08-16 23:55:18 -04:00
  • 5b172ad2d7 [intersections] Caching intersection creation in an intermediate script to save time diagnosing issues downstream Al 2016-08-16 23:52:58 -04:00
  • 330edc2c93 [utils] cstring_array_get_phrase requires a char_array to be passed in so it doesn't have to do any memory allocation Al 2016-08-16 13:11:45 -04:00
  • 92e66fd60c [utils] string_next_hyphen_index Al 2016-08-16 12:49:48 -04:00
  • 7ff0cb2704 [fix] name and a few things for intersections data Al 2016-08-15 21:26:54 -04:00
  • 7ab6af4335 [fix] bounds Al 2016-08-15 12:01:22 -04:00
  • 060d3a1f86 [fix] var name Al 2016-08-15 11:18:00 -04:00
  • 29fc198aba [osm] giving parse_osm_number_range a parameter for max range and setting it to 1000 for postal codes e.g. for major cities that may have several hundred postal codes Al 2016-08-15 10:34:24 -04:00
  • 637baad629 [osm] Adding at least min_references entries for every selected postcode Al 2016-08-15 10:30:28 -04:00
  • aa6b9cd858 [fix] var name for place tags coming from the admin rtree Al 2016-08-15 10:25:19 -04:00
  • 5cff7b85bd [geonames] Adding basic GeoNames admin mappings for all countries we have postal codes lists for so some form of training data can be created for postcodes not listed in OSM Al 2016-08-15 01:09:17 -04:00
  • 7f4e636fc5 [fix] accidentally had Vietnam country code switched with Virgin Islands Al 2016-08-14 18:31:43 -04:00
  • 8a5da5f860 [boundaries/osm] Reverting admin_level=10 back to city_district for India so it'll match the current training data, can revisit later Al 2016-08-13 22:51:42 -04:00
  • bc8acb196c [osm] Pulling valid postal codes out into a method Al 2016-08-13 01:49:26 -04:00
  • 55895369b8 [boundaries] Using state again for UK countries (England, Scotland, Wales, Northern Ireland). country_region was created mostly for non-administrative regions of a country (usually admin_level=3 in OSM). The UK is a bit more complicated in that there are multiple non-sovereign countries, but it's probably not worth creating a different tag and different set of parameters just to have a distinct name for 1st level admin in the UK Al 2016-08-11 23:47:31 -04:00
  • d51a6693ac [fix] reverting commit that was lumped in with geonames script Al 2016-08-11 21:49:29 -04:00
  • 74d042e3c7 [boundaries] For India, making admin_level 10 map to suburb rather than city_district Al 2016-08-11 21:47:10 -04:00
  • 29081a0699 [fix] adding English template insertions for the UK regardless of language Al 2016-08-11 21:32:54 -04:00
  • 22123b80ba [fix] refactoring geonames script a bit Al 2016-08-11 21:31:39 -04:00
  • 48755ec218 [boundaries] Adding regex replacements for boundary names such as Lyon 2e Arrondissement where putting Lyon is the OSM convention but we might sometimes want just 2e Arrondissement to appear in the training data next to Lyon Al 2016-08-11 13:09:08 -04:00
  • 757a7ee15f [docs][ci skip] Moving parser examples up so they come before normalization Al 2016-08-10 01:16:07 -04:00
  • 7ff8e1a5cb [docs][ci skip] Moving OpenCollective folks to the top of the README Al 2016-08-10 01:14:45 -04:00
  • a277096c96 Merge pull request #72 from piamancini/patch-1 Al Barrentine 2016-08-09 23:05:45 -04:00
  • 10a41309b8 [addresses] Increasing Romaji probability to 0.4 Al 2016-08-06 21:27:32 -04:00
  • b993e9a163 [fix] add Japanese-language variant if metro station is added Al 2016-08-06 21:17:14 -04:00
  • 39bd562d04 [addresses] only set language if we needed it for Japanese house_numbers Al 2016-08-06 21:06:01 -04:00
  • cdd5a96346 [addresses] metro station can also be used for plain venues without a house number so we get more in the training set Al 2016-08-06 20:52:25 -04:00
  • 5ec752e887 [fix] order of ops Al 2016-08-06 20:43:13 -04:00
  • e68fee7c68 [fix] null check Al 2016-08-06 20:39:28 -04:00
  • 3e34012e69 [fix] if the language is given already, use it as a suffix rather than choosing at random Al 2016-08-06 20:36:56 -04:00
  • 606c464db6 [fix] house number phrases Al 2016-08-06 20:11:32 -04:00
  • e35649f09d [fix] import Al 2016-08-06 20:01:38 -04:00
  • 0e7cb2b06c [fix] var name II Al 2016-08-06 20:00:35 -04:00
  • 8d88820d30 [fix] var name Al 2016-08-06 19:59:53 -04:00
  • 374c46ada5 [fix] metro station properties Al 2016-08-06 19:56:13 -04:00
  • 0edfbe0d61 [osm] Adding metro stations index to training data options Al 2016-08-06 19:52:21 -04:00
  • 195278cfea [osm] Reverse geocoding to metro station only for addresess in Japan Al 2016-08-06 19:37:29 -04:00
  • 6ef54bcc6f [addresses] Adding metro stations to AddressComponents expansion Al 2016-08-06 19:36:57 -04:00
  • da2985a4ae [places] Metro station dropout probabilities Al 2016-08-06 19:34:56 -04:00
  • 6ce882cb55 [addresses] Metro station component dependencies (road or house_number) Al 2016-08-06 19:34:39 -04:00
  • 668aa20996 [addresses] Metro station phrases for Japanese Romaji Al 2016-08-06 19:34:07 -04:00
  • 9cbbca5e47 [addresses] Metro station phrase for Japanese Al 2016-08-06 19:33:42 -04:00
  • d59ab82701 [metro stations] Adding metro station phrase generator Al 2016-08-06 19:33:21 -04:00
  • 1e27ad1124 [metro stations] Adding metro station component to address formatter Al 2016-08-06 19:13:20 -04:00
  • 5cff119d25 [fix] command line arg Al 2016-08-06 18:36:27 -04:00
  • 406666362c [fix] command-line index creation Al 2016-08-06 18:36:01 -04:00
  • 7ddd553129 [fix] metro stations reverse geocoder Al 2016-08-06 18:30:54 -04:00
  • 5e44f6954b [metro stations] Adding metro stations reverse geocoder Al 2016-08-06 18:24:14 -04:00
  • 954bb08a8d [points] Fixes to point index Al 2016-08-06 18:23:30 -04:00
  • 964728a02d [fix] block phrases for Japanese and namespaced language handling in case Romaji is chosen before normalization Al 2016-08-06 14:50:39 -04:00
  • 684550ea7d [fix] only add house_number phrase to numeric inputs Al 2016-08-06 14:49:28 -04:00
  • 8b5d44e173 [fix] Japanese house numbers aren't without dependencies, just have different ones (road or suburb or city_district) Al 2016-08-06 03:38:44 -04:00
  • 2c024ce9f4 [addresses] special case for Japan, house_number does not depend on street name Al 2016-08-06 02:38:58 -04:00
  • 445e8082c8 [addresses] Adding per-country overrides for address component dependencies Al 2016-08-06 02:36:25 -04:00
  • 3137ef5c6a [build] configure/Makefile changes to use SIMD exp and BLAS when available Al 2016-08-06 00:43:24 -04:00
  • 59e28c6c2a [math] double_array definition in collections.h to use new vectorized exp Al 2016-08-06 00:40:34 -04:00
  • 46cd725c13 [math] Generic dense matrix implementation using BLAS calls for matrix-matrix multiplication if available Al 2016-08-06 00:40:01 -04:00
  • d4a792f33c [math] Adding fast SIMD exponent using the Remez algorithm for vectorized exp Al 2016-08-06 00:31:16 -04:00
  • 161f18575d [utils] Adding realloc checks to vector implementation Al 2016-08-05 23:02:52 -04:00
  • 14c35b35c6 [fix] probabilities in Romanian address config Al 2016-08-04 17:53:10 -04:00
  • 13718355cc [test] Test zones in address configs Al 2016-08-04 17:52:15 -04:00
  • eb4c957b4c [test] Adding tests for known number of floors as it touches different parts of the address configs Al 2016-08-03 17:40:48 -04:00
  • f33882b7bc [fix] Swedish config for top floor phrase Al 2016-08-03 11:54:09 -04:00
  • 813f29f299 [osm] Removing the call to normalize_place_names in place data formatting as we should be able to trust the places more than the addresses Al 2016-08-02 16:29:34 -04:00
  • 0ab3b13b75 [osm] Remove hanging commas, slashes, etc. Implementing a stricter rule for user-specified tags (not reverse geocoded) so that if they contain an unknown phrase followed by an unknown boundary phrase, we delete that tag and fall back to the reverse geocoded components. Moving CLDR country tagging to later in the process since those are known correct names. Al 2016-08-02 16:25:39 -04:00
  • 97a2436ad7 [tokenization] Adding two more sets to token_types for punctuation and non-alphanumerics Al 2016-08-02 16:24:01 -04:00
  • c40ad99ec7 [osm] removing postcode phrase from place training data and adding CLDR countries only after all the other normalizations Al 2016-08-02 14:52:12 -04:00
  • 5117fb21d3 [fix] access Al 2016-08-02 03:20:42 -04:00
  • bd780d3424 [fix] typo Al 2016-08-02 03:19:22 -04:00
  • c74d883344 [fix] unindent Al 2016-08-02 03:17:42 -04:00
  • f29d043544 [places] Using all of the ideas that apply to places from address formatting for the places-only data set Al 2016-08-02 03:16:08 -04:00
  • 4ab60cd4fc [osm] Remove boundary names with trailing commas Al 2016-08-02 03:13:05 -04:00
  • 12466b12dc [osm] Removing boundary names (not including postal codes) which are simply digits Al 2016-08-02 02:17:25 -04:00
  • a1f0c1a3c9 [fix] import Al 2016-08-02 01:50:17 -04:00
  • 818bd50105 [fix] unit phrase should return None if there's no config available for a particular zone type (again enforcing the idea that venues typically don't have sub-building information) Al 2016-08-01 18:29:32 -04:00
  • e11c723f8b [fix] var rename Al 2016-08-01 17:50:00 -04:00
  • 79ce922432 [osm] Fixing sub-building components so generated numbers are not added to the address components unless cls.phrase returns non-None Al 2016-08-01 17:44:23 -04:00
  • 4c8b662648 [fix] block numbers Al 2016-08-01 14:36:28 -04:00
  • 1fb8185b75 [osm/boundaries] Allowing OSM entities to map to NULL Al 2016-08-01 00:52:58 -04:00
  • fa003ca430 [fix] indentation in boundaries configs Al 2016-08-01 00:52:10 -04:00
  • 2faffc81e7 [fix] import Al 2016-08-01 00:06:47 -04:00
  • 5edc60299c [fix] Bulgarian category probabilities Al 2016-07-31 22:50:48 -04:00
  • 973ac42a97 [test] Checking probability distributions as part of the address config tests Al 2016-07-31 22:29:21 -04:00
  • 3ead069b1b [fix] Romanian staircase probability Al 2016-07-31 22:28:31 -04:00
  • 3505af4bc1 [fix] don't add phrases for non-numeric existing components Al 2016-07-31 22:14:02 -04:00
  • d3e50fc894 [fix] NULL-phrase first ordering Al 2016-07-31 22:10:25 -04:00
  • afbb79b81d [osm/parser] Making a much lower probability of generating sub-building components for named venues (usually on the ground floor, etc.) Al 2016-07-31 20:40:44 -04:00
  • b727078be5 [fix] use alphanumeric in generated component configs by default Al 2016-07-31 20:39:15 -04:00
  • 2e92c6fcc8 [fix] Probabilities for Ukrainian house numbers Al 2016-07-31 20:01:42 -04:00