Commit Graph

649 Commits

Author SHA1 Message Date
Al
b5be1e8df5 [fix] var name 2015-08-18 03:56:23 -04:00
Al
e84f932042 [fix] language polys 2015-08-18 03:51:30 -04:00
Al
bada7fd13b [polygons] Changes to languages polygons to support new regional language handling 2015-08-18 03:27:11 -04:00
Al
d97c725bbc [languages] Allowing specification of multiple regional languages 2015-08-18 03:18:52 -04:00
Al
b8fbbb1917 [languages] Removing the Belarusian override as Russian appears to be used often in street signs and there are generally good name:ru/name:be tags 2015-08-17 04:20:39 -04:00
Al
453aa7c633 [dictionaries] Adding French as equally likely language for Guernesey, which will effectively exclude it from the language training data (doesn't matter since there's already enough English/French addresses). 2015-08-17 02:04:29 -04:00
Al
89071ea21a [osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well) 2015-08-15 03:25:45 -04:00
Al
c505260912 [fix] var name 2015-08-15 02:47:31 -04:00
Al
548ce79b99 [fix] street addresses by language 2015-08-15 02:44:04 -04:00
Al
74a751ce0a [osm] Adding a new OSM training data option for writing out full formatted addresses without place names 2015-08-15 02:39:49 -04:00
Al
133ce9e5b1 [languages] Bonaire admin1 as well as country code 2015-08-14 21:42:13 -04:00
Al
05b8f555d5 [fix] language polygon index 2015-08-14 21:22:15 -04:00
Al
0e92abd53e [osm] Adding building tag to venues training set construction 2015-08-14 21:07:07 -04:00
Al
191c0e3ce5 [languages] Changing Bonaire's default road sign language to Papiamento to help distinguish from Dutch 2015-08-14 21:06:16 -04:00
Al
cad1f95bbb [osm] Making minimal_only the default in formatted addresses, expanding list of acceptable combinations of address fields 2015-08-14 10:21:17 -04:00
Al
1e936ac9dc [fix] road+house_number as minimal keys for formatting addresses 2015-08-14 04:09:51 -04:00
Al
83bbd67c9c [fix] param 2015-08-14 00:57:17 -04:00
Al
e993ddcb51 [fix] splitter 2015-08-14 00:54:06 -04:00
Al
dc2766ae5d [fix] __init__ 2015-08-14 00:49:06 -04:00
Al
62c67aa970 [osm] Using pipe splitter for address components 2015-08-14 00:45:49 -04:00
Al
2bd763be03 [osm] Prefer amenity tag, skip if the building tag is simply building=yes 2015-08-13 21:16:34 -04:00
Al
c844d0484a [fix] carriage returns 2015-08-13 21:07:12 -04:00
Al
ef14aa2b7e [osm] Replacing escape chars at write time as there's no quoting, adding building key to venue training data 2015-08-13 19:30:44 -04:00
Al
9125f07af0 [polygons] Separating out simplify polygon into a method in RTree index 2015-08-13 18:43:35 -04:00
Al
46f2c68a69 [osm] Using tsv_no_quote writers in all OSM training data files 2015-08-13 18:40:41 -04:00
Al
9464670174 [scripts] Regenerating unicode_scripts_data file 2015-08-13 18:27:23 -04:00
Al
88d63c85d2 [utils] no-quote CSV dialect 2015-08-13 18:26:51 -04:00
Al
03febc7e20 [scripts] Better script code aliasing 2015-08-13 18:25:55 -04:00
Al
b54ff95ecc [mv] csv_utils 2015-08-13 18:19:54 -04:00
Al
66a71ab70d [normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes 2015-08-11 23:36:08 -04:00
Al
87b275fcab [transliteration] Regenerating transliteration data file 2015-08-11 23:11:17 -04:00
Al
cf70615850 [transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps 2015-08-11 23:10:55 -04:00
Al
9712e0fa87 [fix] phrase start in transliteration 2015-08-11 23:09:49 -04:00
Al
562a7c243d [phrases] Fixing tail searches in trie_get_prefix* 2015-08-11 23:08:21 -04:00
Al
51addec5f2 [fix] check for local CLDR in unicode properties 2015-08-11 20:23:48 -04:00
Al
882e4c2ab8 [fix] ensure CLDR dir 2015-08-11 20:04:42 -04:00
Al
48566bf097 [fix] cldr languages dir 2015-08-11 20:04:25 -04:00
Al
e98a822661 [build] ORder-only dependencies for downloading data files, rm-ing the tarball when done extracting 2015-08-11 12:59:37 -04:00
Al
0028c2bc53 [build] Fixing tarball uploading 2015-08-11 03:18:35 -04:00
Al
f21b767696 [build] Adding tarball back to pkgdata 2015-08-10 18:44:40 -04:00
Al
c29cf5ac9a [api] Better handling of strings with multiple scripts and strings that use more than one transliterator. Reducing complexity/allocations 2015-08-10 17:51:41 -04:00
Al
4bc6adf669 [normalize] Adding the original script as an alternative in transliteration mode as well 2015-08-10 17:48:48 -04:00
Al
a13e5117b5 [utils] string_tree_num_strings method 2015-08-10 17:46:37 -04:00
Al
219947722d [cli] delete_word_hyphens as a default option 2015-08-10 16:19:54 -04:00
Al
78a80dd86e [api] Add separable or inseparable non-canonical string affixes (e.g. foobg. => fooburg, foostrasse => foostraße|foo straße, l'ensemble => l' ensemble, etc.) in expand_address 2015-08-10 16:19:03 -04:00
Al
de5d6945b5 [expansion] Adding search_address_dictionaries_prefix/suffix for concatenated prefixes/suffixes e.g. in Germanic languages. Adding a flag to the address_expansion struct and trie value to denote separability, adding prefix/suffix keys during dictionary creation 2015-08-10 16:15:01 -04:00
Al
0f77ca1213 [normalize] Adding a char_array version of normalize token 2015-08-10 16:11:34 -04:00
Al
064b6b5898 [utils] char_array_append_reversed for adding reversed strings without a malloc 2015-08-10 16:10:05 -04:00
Al
dab181a4d7 [fix] Only the exact TRIE_PREFIX_CHAR/TRIE_SUFFIX_CHAR characters are disallowed as keys 2015-08-10 16:09:10 -04:00
Al
e511eede74 [phrases] Prefix/suffix trie search using the new characters, fixing length of matched prefixes/suffixes and exiting early on falling off the the trie 2015-08-10 16:02:38 -04:00