Commit Graph

3572 Commits

Author SHA1 Message Date
Al
58851a9088 [normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input 2016-08-21 19:45:32 -04:00
Al
8b9702b43d [error handling] Checking that resize succeeded in transliterate.c 2016-08-21 19:43:09 -04:00
Al
2644fed18f [transliteration] Adding LATIN_ASCII_SIMPLE constant to transliterate.h 2016-08-21 19:42:10 -04:00
Al
4375bdea3b [transliteration] strduping transliterator name while building table 2016-08-21 19:41:34 -04:00
Al
bde8776bc2 [transliteration] Regenerating transliteration data files 2016-08-21 19:41:11 -04:00
Al
cb4408fea8 [transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string. 2016-08-20 18:17:46 -04:00
Al
85ae5d4a05 [fix] name 2016-08-19 23:38:33 -04:00
Al
7951044d74 [intersections] Abbreviating street names that are not base names with random probabilities 2016-08-19 23:27:29 -04:00
Al
42808c62e3 [fix] dictionary access 2016-08-19 16:02:36 -04:00
Al
41f715d6ee [intersections] Better handling of default languages in intersection queries 2016-08-19 15:59:58 -04:00
Al
a7118b40a7 [intersections] Allowing tags like name_1, etc. to make it into road name permutations for intersections 2016-08-19 13:12:02 -04:00
Al
0b2d3d965f [fix] using lat/lon from the node properties in intersections data 2016-08-19 12:23:08 -04:00
Al
294316c721 [intersections] no need to store lat/lon in intersections 2016-08-19 01:58:53 -04:00
Al
9a6ec41ce6 [points] Adding __iter__ and __len__ to point index 2016-08-19 01:01:05 -04:00
Al
f43abe0846 [fix] making cleaned_name a classmethod 2016-08-18 19:55:52 -04:00
Al
defc7ffacc [fix] arg name again 2016-08-18 18:22:06 -04:00
Al
4a28225df6 [fix] name 2016-08-18 18:20:55 -04:00
Al
86b921c629 [intersections] Adding the intersection's properties for intersections in case we want to do anything with named intersections in Japan/Korea 2016-08-18 17:14:23 -04:00
Al
87ee5f47f9 [fix] check for None in binary_search 2016-08-18 15:12:23 -04:00
Al
1675bba3f0 [intersections] highway=crossing also valid 2016-08-18 03:00:23 -04:00
Al
f137d68e12 [intersections] only juction=yes and highway=traffic_signals count as intersections, should eliminate points that are simply joining two segments of the same road 2016-08-18 02:53:49 -04:00
Al
93586c2592 [fix] aliasing all_languages 2016-08-18 02:24:59 -04:00
Al
688f103e80 [fix] languages 2016-08-18 02:24:34 -04:00
Al
e3ac3200b3 [fix] disambiguating languages using one of the default street names in intersections data 2016-08-18 02:05:13 -04:00
Al
328398813a [fix] itertools.combinations 2016-08-18 01:26:48 -04:00
Al
737cbf4457 [fix] reference before assignment 2016-08-18 01:24:30 -04:00
Al
b41ba7374b [intersections] intersections training data, using a Cartesian product of all names in the same language, including something like tiger:name_base 2016-08-18 01:19:14 -04:00
Al
701bcb1d79 [intersections] Using name cleanup on intersections, including tiger:name_base which sometimes has semicolon delimiters as well 2016-08-17 18:47:07 -04:00
Al
7b314324ca [osm/addresses] Factoring out semicolon/comma-delimited name cleanup into its own method 2016-08-17 18:45:33 -04:00
Al
145af9331e [osm] build OSM training data for intersections using the JSON output from intersections.py rather having to compute each time 2016-08-17 18:11:55 -04:00
Al
a3ae1eb330 [intersections] Adding a read classmethod to intersections to read the intermediate JSON file 2016-08-17 15:29:59 -04:00
Al
96c753e8c6 [fix] adding logging on new intersections script 2016-08-16 23:55:22 -04:00
Al
5b172ad2d7 [intersections] Caching intersection creation in an intermediate script to save time diagnosing issues downstream 2016-08-16 23:52:58 -04:00
Al
330edc2c93 [utils] cstring_array_get_phrase requires a char_array to be passed in so it doesn't have to do any memory allocation 2016-08-16 13:11:45 -04:00
Al
92e66fd60c [utils] string_next_hyphen_index 2016-08-16 12:49:52 -04:00
Al
7ff0cb2704 [fix] name and a few things for intersections data 2016-08-15 21:26:54 -04:00
Al
7ab6af4335 [fix] bounds 2016-08-15 12:01:22 -04:00
Al
060d3a1f86 [fix] var name 2016-08-15 11:18:00 -04:00
Al
29fc198aba [osm] giving parse_osm_number_range a parameter for max range and setting it to 1000 for postal codes e.g. for major cities that may have several hundred postal codes 2016-08-15 10:34:24 -04:00
Al
637baad629 [osm] Adding at least min_references entries for every selected postcode 2016-08-15 10:30:28 -04:00
Al
aa6b9cd858 [fix] var name for place tags coming from the admin rtree 2016-08-15 10:25:19 -04:00
Al
5cff7b85bd [geonames] Adding basic GeoNames admin mappings for all countries we have postal codes lists for so some form of training data can be created for postcodes not listed in OSM 2016-08-15 01:09:17 -04:00
Al
7f4e636fc5 [fix] accidentally had Vietnam country code switched with Virgin Islands 2016-08-14 18:43:24 -04:00
Al
8a5da5f860 [boundaries/osm] Reverting admin_level=10 back to city_district for India so it'll match the current training data, can revisit later 2016-08-13 22:51:42 -04:00
Al
bc8acb196c [osm] Pulling valid postal codes out into a method 2016-08-13 01:49:26 -04:00
Al
55895369b8 [boundaries] Using state again for UK countries (England, Scotland, Wales, Northern Ireland). country_region was created mostly for non-administrative regions of a country (usually admin_level=3 in OSM). The UK is a bit more complicated in that there are multiple non-sovereign countries, but it's probably not worth creating a different tag and different set of parameters just to have a distinct name for 1st level admin in the UK 2016-08-11 23:47:31 -04:00
Al
d51a6693ac [fix] reverting commit that was lumped in with geonames script 2016-08-11 21:49:29 -04:00
Al
74d042e3c7 [boundaries] For India, making admin_level 10 map to suburb rather than city_district 2016-08-11 21:47:10 -04:00
Al
29081a0699 [fix] adding English template insertions for the UK regardless of language 2016-08-11 21:32:54 -04:00
Al
22123b80ba [fix] refactoring geonames script a bit 2016-08-11 21:31:39 -04:00