f137d68e12[intersections] only juction=yes and highway=traffic_signals count as intersections, should eliminate points that are simply joining two segments of the same road
Al
2016-08-18 02:53:49 -04:00
93586c2592[fix] aliasing all_languages
Al
2016-08-18 02:24:59 -04:00
688f103e80[fix] languages
Al
2016-08-18 02:24:34 -04:00
e3ac3200b3[fix] disambiguating languages using one of the default street names in intersections data
Al
2016-08-18 02:05:13 -04:00
328398813a[fix] itertools.combinations
Al
2016-08-18 01:26:48 -04:00
737cbf4457[fix] reference before assignment
Al
2016-08-18 01:24:30 -04:00
b41ba7374b[intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base
Al
2016-08-18 01:19:02 -04:00
701bcb1d79[intersections] Using name cleanup on intersections, including tiger:name_base which sometimes has semicolon delimiters as well
Al
2016-08-17 18:47:07 -04:00
7b314324ca[osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method
Al
2016-08-17 18:45:33 -04:00
145af9331e[osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time
Al
2016-08-17 18:11:55 -04:00
a3ae1eb330[intersections] Adding a read classmethod to intersections to read the intermediate JSON file
Al
2016-08-17 15:29:59 -04:00
96c753e8c6[fix] adding logging on new intersections script
Al
2016-08-16 23:55:18 -04:00
5b172ad2d7[intersections] Caching intersection creation in an intermediate script to save time diagnosing issues downstream
Al
2016-08-16 23:52:58 -04:00
330edc2c93[utils] cstring_array_get_phrase requires a char_array to be passed in so it doesn't have to do any memory allocation
Al
2016-08-16 13:11:45 -04:00
92e66fd60c[utils] string_next_hyphen_index
Al
2016-08-16 12:49:48 -04:00
7ff0cb2704[fix] name and a few things for intersections data
Al
2016-08-15 21:26:54 -04:00
7ab6af4335[fix] bounds
Al
2016-08-15 12:01:22 -04:00
060d3a1f86[fix] var name
Al
2016-08-15 11:18:00 -04:00
29fc198aba[osm] giving parse_osm_number_range a parameter for max range and setting it to 1000 for postal codes e.g. for major cities that may have several hundred postal codes
Al
2016-08-15 10:34:24 -04:00
637baad629[osm] Adding at least min_references entries for every selected postcode
Al
2016-08-15 10:30:28 -04:00
aa6b9cd858[fix] var name for place tags coming from the admin rtree
Al
2016-08-15 10:25:19 -04:00
5cff7b85bd[geonames] Adding basic GeoNames admin mappings for all countries we have postal codes lists for so some form of training data can be created for postcodes not listed in OSM
Al
2016-08-15 01:09:17 -04:00
7f4e636fc5[fix] accidentally had Vietnam country code switched with Virgin Islands
Al
2016-08-14 18:31:43 -04:00
8a5da5f860[boundaries/osm] Reverting admin_level=10 back to city_district for India so it'll match the current training data, can revisit later
Al
2016-08-13 22:51:42 -04:00
bc8acb196c[osm] Pulling valid postal codes out into a method
Al
2016-08-13 01:49:26 -04:00
55895369b8[boundaries] Using state again for UK countries (England, Scotland, Wales, Northern Ireland). country_region was created mostly for non-administrative regions of a country (usually admin_level=3 in OSM). The UK is a bit more complicated in that there are multiple non-sovereign countries, but it's probably not worth creating a different tag and different set of parameters just to have a distinct name for 1st level admin in the UK
Al
2016-08-11 23:47:31 -04:00
d51a6693ac[fix] reverting commit that was lumped in with geonames script
Al
2016-08-11 21:49:29 -04:00
74d042e3c7[boundaries] For India, making admin_level 10 map to suburb rather than city_district
Al
2016-08-11 21:47:10 -04:00
29081a0699[fix] adding English template insertions for the UK regardless of language
Al
2016-08-11 21:32:54 -04:00
22123b80ba[fix] refactoring geonames script a bit
Al
2016-08-11 21:31:39 -04:00
48755ec218[boundaries] Adding regex replacements for boundary names such as Lyon 2e Arrondissement where putting Lyon is the OSM convention but we might sometimes want just 2e Arrondissement to appear in the training data next to Lyon
Al
2016-08-11 13:09:08 -04:00
757a7ee15f[docs][ci skip] Moving parser examples up so they come before normalization
Al
2016-08-10 01:16:07 -04:00
7ff8e1a5cb[docs][ci skip] Moving OpenCollective folks to the top of the README
Al
2016-08-10 01:14:45 -04:00
a277096c96Merge pull request #72 from piamancini/patch-1
Al Barrentine
2016-08-09 23:05:45 -04:00
10a41309b8[addresses] Increasing Romaji probability to 0.4
Al
2016-08-06 21:27:32 -04:00
b993e9a163[fix] add Japanese-language variant if metro station is added
Al
2016-08-06 21:17:14 -04:00
39bd562d04[addresses] only set language if we needed it for Japanese house_numbers
Al
2016-08-06 21:06:01 -04:00
cdd5a96346[addresses] metro station can also be used for plain venues without a house number so we get more in the training set
Al
2016-08-06 20:52:25 -04:00
5ec752e887[fix] order of ops
Al
2016-08-06 20:43:13 -04:00
e68fee7c68[fix] null check
Al
2016-08-06 20:39:28 -04:00
3e34012e69[fix] if the language is given already, use it as a suffix rather than choosing at random
Al
2016-08-06 20:36:56 -04:00
606c464db6[fix] house number phrases
Al
2016-08-06 20:11:32 -04:00
e35649f09d[fix] import
Al
2016-08-06 20:01:38 -04:00
0e7cb2b06c[fix] var name II
Al
2016-08-06 20:00:35 -04:00
8d88820d30[fix] var name
Al
2016-08-06 19:59:53 -04:00
374c46ada5[fix] metro station properties
Al
2016-08-06 19:56:13 -04:00
0edfbe0d61[osm] Adding metro stations index to training data options
Al
2016-08-06 19:52:21 -04:00
195278cfea[osm] Reverse geocoding to metro station only for addresess in Japan
Al
2016-08-06 19:37:29 -04:00
6ef54bcc6f[addresses] Adding metro stations to AddressComponents expansion
Al
2016-08-06 19:36:57 -04:00
da2985a4ae[places] Metro station dropout probabilities
Al
2016-08-06 19:34:56 -04:00
6ce882cb55[addresses] Metro station component dependencies (road or house_number)
Al
2016-08-06 19:34:39 -04:00
668aa20996[addresses] Metro station phrases for Japanese Romaji
Al
2016-08-06 19:34:07 -04:00
9cbbca5e47[addresses] Metro station phrase for Japanese
Al
2016-08-06 19:33:42 -04:00
d59ab82701[metro stations] Adding metro station phrase generator
Al
2016-08-06 19:33:21 -04:00
1e27ad1124[metro stations] Adding metro station component to address formatter
Al
2016-08-06 19:13:20 -04:00
5cff119d25[fix] command line arg
Al
2016-08-06 18:36:27 -04:00
406666362c[fix] command-line index creation
Al
2016-08-06 18:36:01 -04:00
7ddd553129[fix] metro stations reverse geocoder
Al
2016-08-06 18:30:54 -04:00
5e44f6954b[metro stations] Adding metro stations reverse geocoder
Al
2016-08-06 18:24:14 -04:00
954bb08a8d[points] Fixes to point index
Al
2016-08-06 18:23:30 -04:00
964728a02d[fix] block phrases for Japanese and namespaced language handling in case Romaji is chosen before normalization
Al
2016-08-06 14:50:39 -04:00
684550ea7d[fix] only add house_number phrase to numeric inputs
Al
2016-08-06 14:49:28 -04:00
8b5d44e173[fix] Japanese house numbers aren't without dependencies, just have different ones (road or suburb or city_district)
Al
2016-08-06 03:38:44 -04:00
2c024ce9f4[addresses] special case for Japan, house_number does not depend on street name
Al
2016-08-06 02:38:58 -04:00
445e8082c8[addresses] Adding per-country overrides for address component dependencies
Al
2016-08-06 02:36:25 -04:00
3137ef5c6a[build] configure/Makefile changes to use SIMD exp and BLAS when available
Al
2016-08-06 00:43:24 -04:00
59e28c6c2a[math] double_array definition in collections.h to use new vectorized exp
Al
2016-08-06 00:40:34 -04:00
46cd725c13[math] Generic dense matrix implementation using BLAS calls for matrix-matrix multiplication if available
Al
2016-08-06 00:40:01 -04:00
d4a792f33c[math] Adding fast SIMD exponent using the Remez algorithm for vectorized exp
Al
2016-08-06 00:31:16 -04:00
161f18575d[utils] Adding realloc checks to vector implementation
Al
2016-08-05 23:02:52 -04:00
14c35b35c6[fix] probabilities in Romanian address config
Al
2016-08-04 17:53:10 -04:00
13718355cc[test] Test zones in address configs
Al
2016-08-04 17:52:15 -04:00
eb4c957b4c[test] Adding tests for known number of floors as it touches different parts of the address configs
Al
2016-08-03 17:40:48 -04:00
f33882b7bc[fix] Swedish config for top floor phrase
Al
2016-08-03 11:54:09 -04:00
813f29f299[osm] Removing the call to normalize_place_names in place data formatting as we should be able to trust the places more than the addresses
Al
2016-08-02 16:29:34 -04:00
0ab3b13b75[osm] Remove hanging commas, slashes, etc. Implementing a stricter rule for user-specified tags (not reverse geocoded) so that if they contain an unknown phrase followed by an unknown boundary phrase, we delete that tag and fall back to the reverse geocoded components. Moving CLDR country tagging to later in the process since those are known correct names.
Al
2016-08-02 16:25:39 -04:00
97a2436ad7[tokenization] Adding two more sets to token_types for punctuation and non-alphanumerics
Al
2016-08-02 16:24:01 -04:00
c40ad99ec7[osm] removing postcode phrase from place training data and adding CLDR countries only after all the other normalizations
Al
2016-08-02 14:52:12 -04:00
5117fb21d3[fix] access
Al
2016-08-02 03:20:42 -04:00
bd780d3424[fix] typo
Al
2016-08-02 03:19:22 -04:00
c74d883344[fix] unindent
Al
2016-08-02 03:17:42 -04:00
f29d043544[places] Using all of the ideas that apply to places from address formatting for the places-only data set
Al
2016-08-02 03:16:08 -04:00
4ab60cd4fc[osm] Remove boundary names with trailing commas
Al
2016-08-02 03:13:05 -04:00
12466b12dc[osm] Removing boundary names (not including postal codes) which are simply digits
Al
2016-08-02 02:17:25 -04:00
a1f0c1a3c9[fix] import
Al
2016-08-02 01:50:17 -04:00
818bd50105[fix] unit phrase should return None if there's no config available for a particular zone type (again enforcing the idea that venues typically don't have sub-building information)
Al
2016-08-01 18:29:32 -04:00
e11c723f8b[fix] var rename
Al
2016-08-01 17:50:00 -04:00
79ce922432[osm] Fixing sub-building components so generated numbers are not added to the address components unless cls.phrase returns non-None
Al
2016-08-01 17:44:23 -04:00
4c8b662648[fix] block numbers
Al
2016-08-01 14:36:28 -04:00
1fb8185b75[osm/boundaries] Allowing OSM entities to map to NULL
Al
2016-08-01 00:52:58 -04:00
fa003ca430[fix] indentation in boundaries configs
Al
2016-08-01 00:52:10 -04:00
2faffc81e7[fix] import
Al
2016-08-01 00:06:47 -04:00
5edc60299c[fix] Bulgarian category probabilities
Al
2016-07-31 22:50:48 -04:00
973ac42a97[test] Checking probability distributions as part of the address config tests
Al
2016-07-31 22:29:21 -04:00
3ead069b1b[fix] Romanian staircase probability
Al
2016-07-31 22:28:31 -04:00
3505af4bc1[fix] don't add phrases for non-numeric existing components
Al
2016-07-31 22:14:02 -04:00
d3e50fc894[fix] NULL-phrase first ordering
Al
2016-07-31 22:10:25 -04:00
afbb79b81d[osm/parser] Making a much lower probability of generating sub-building components for named venues (usually on the ground floor, etc.)
Al
2016-07-31 20:40:44 -04:00
b727078be5[fix] use alphanumeric in generated component configs by default
Al
2016-07-31 20:39:15 -04:00
2e92c6fcc8[fix] Probabilities for Ukrainian house numbers
Al
2016-07-31 20:01:42 -04:00