Commit Graph

132 Commits

Author SHA1 Message Date
Al
2bd763be03 [osm] Prefer amenity tag, skip if the building tag is simply building=yes 2015-08-13 21:16:34 -04:00
Al
c844d0484a [fix] carriage returns 2015-08-13 21:07:12 -04:00
Al
ef14aa2b7e [osm] Replacing escape chars at write time as there's no quoting, adding building key to venue training data 2015-08-13 19:30:44 -04:00
Al
9125f07af0 [polygons] Separating out simplify polygon into a method in RTree index 2015-08-13 18:43:35 -04:00
Al
46f2c68a69 [osm] Using tsv_no_quote writers in all OSM training data files 2015-08-13 18:40:41 -04:00
Al
88d63c85d2 [utils] no-quote CSV dialect 2015-08-13 18:26:51 -04:00
Al
03febc7e20 [scripts] Better script code aliasing 2015-08-13 18:25:55 -04:00
Al
b54ff95ecc [mv] csv_utils 2015-08-13 18:19:54 -04:00
Al
cf70615850 [transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps 2015-08-11 23:10:55 -04:00
Al
51addec5f2 [fix] check for local CLDR in unicode properties 2015-08-11 20:23:48 -04:00
Al
882e4c2ab8 [fix] ensure CLDR dir 2015-08-11 20:04:42 -04:00
Al
48566bf097 [fix] cldr languages dir 2015-08-11 20:04:25 -04:00
Al
dd391eabe5 [numex] Separating rules from keys for Linux gcc compilation 2015-08-09 01:00:57 -04:00
Al
a5ce1f12dd [fix] stdint header in address expansion rule generation script 2015-08-08 23:28:11 -04:00
Al
1d39916aaa [fix] Fixing warnings in unicode script data 2015-08-02 21:30:54 -06:00
Al
cdb9afddd3 [fix] address training data carriage returns 2015-07-25 00:35:27 -04:00
Al
87566bb6a5 [numex] Adding validation checks for numex JSON 2015-07-24 15:22:07 -04:00
Al
b27af13f8a [expansion] Adding an array of dictionaries to each (phrase, canonical) pair 2015-07-22 20:24:14 -04:00
Al
64a63fdf51 [mv] Moving all repo data files to a resources dir, data is only for runtime files 2015-07-21 18:11:36 -04:00
Al
7f67ed7dc0 [fix] less ambiguous variable name in the generated expansions data file 2015-07-20 02:58:26 -04:00
Al
5cba747a93 [fix] variable name 2015-07-17 03:06:09 -04:00
Al
5e7bb54a5c [polygons] only add language polygons if there's one default language 2015-07-17 02:19:55 -04:00
Al
d5ac816066 [fix] import 2015-07-16 13:33:50 -04:00
Al
8899be6eef [osm] choosing the first default language for OSM training data, fixing way/relation offsets 2015-07-16 13:32:16 -04:00
Al
b9103a39fa [expansion] Moving filename=>dictionary type mapping to the Python generation script and validating there 2015-07-16 03:51:11 -04:00
Al
f181c04e7a [expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space) 2015-07-16 02:49:53 -04:00
Al
076c07e21f [fix] Add minor languages to the language set 2015-07-16 00:58:58 -04:00
Al
1fe3c9b79b [polygons] Adding a return_all version of point_in_poly e.g. for regions like Navarra where we want to add a non-default Basque dictionary but still retain Spanish as the default from the national polygon 2015-07-15 14:34:20 -04:00
Al
d57f9df7ed [fix] regexes 2015-07-14 14:04:32 -04:00
Al
d494963dcd [fix] lat/lon conversion in address formatting 2015-07-14 13:34:22 -04:00
Al
a0f2ff1e2a [fix] adding encoding declaration 2015-07-13 21:09:18 -04:00
Al
d15737b319 [osm] Validating lat/lon in OSM training data 2015-07-13 21:08:08 -04:00
Al
0c18a57c4e [fix] planet url no longer needed 2015-07-13 14:27:26 -04:00
Al
e8348dde0e [osm] removing all the fetch/convert arguments from training data generator 2015-07-13 14:24:54 -04:00
Al
5e9e08f6b1 [fix] making fetch script executable 2015-07-13 14:19:24 -04:00
Al
465bcd46aa [fix] input file in OSM training data generator 2015-07-13 14:18:24 -04:00
Al
961606ac12 [fix] removing intermediate file in OSM fetch 2015-07-13 14:17:57 -04:00
Al
59bf23ae67 [osm] Planet admin bounds filter 2015-07-13 04:08:55 -04:00
Al
7c988fa717 [fix] imports 2015-07-13 01:50:42 -04:00
Al
e603bad9f3 [fix] adding admin_level to the allowed properties list for language polygons 2015-07-13 01:49:54 -04:00
Al
fcff210d77 [rtree] Language polygon index returns polygons from most specific admin level to least specific 2015-07-13 00:58:47 -04:00
Al
ec1e820268 [parsing] Changing to OpenCageData repo 2015-07-09 13:44:14 -04:00
Al
e64b6c3398 [geonames] NULL language and official language canonical should have the same sort value 2015-07-08 17:03:51 -04:00
Al
4a2be72350 [geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical) 2015-07-08 16:42:42 -04:00
Al
95a6845a85 [i18n] Adding regional languages as valid country languages 2015-07-08 14:54:00 -04:00
Al
ef1ecb97f7 [geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information) 2015-07-08 13:30:57 -04:00
Al
6cc677ac0b [geonames] Adding defaults to schema and another index on country code 2015-07-08 13:16:01 -04:00
Al
0c5e741bb6 [geonames] Adding LC_ALL environment variable for utf8 sorting 2015-07-06 00:39:23 -04:00
Al
acd5d07d17 [geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together 2015-07-05 15:56:46 -04:00
Al
f825dcb939 [geonames] Fixing admin table DDL 2015-07-03 05:54:41 -04:00