Commit Graph

146 Commits

Author SHA1 Message Date
Al
961606ac12 [fix] removing intermediate file in OSM fetch 2015-07-13 14:17:57 -04:00
Al
59bf23ae67 [osm] Planet admin bounds filter 2015-07-13 04:08:55 -04:00
Al
7c988fa717 [fix] imports 2015-07-13 01:50:42 -04:00
Al
e603bad9f3 [fix] adding admin_level to the allowed properties list for language polygons 2015-07-13 01:49:54 -04:00
Al
fcff210d77 [rtree] Language polygon index returns polygons from most specific admin level to least specific 2015-07-13 00:58:47 -04:00
Al
ec1e820268 [parsing] Changing to OpenCageData repo 2015-07-09 13:44:14 -04:00
Al
e64b6c3398 [geonames] NULL language and official language canonical should have the same sort value 2015-07-08 17:03:51 -04:00
Al
4a2be72350 [geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical) 2015-07-08 16:42:42 -04:00
Al
95a6845a85 [i18n] Adding regional languages as valid country languages 2015-07-08 14:54:00 -04:00
Al
ef1ecb97f7 [geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information) 2015-07-08 13:30:57 -04:00
Al
6cc677ac0b [geonames] Adding defaults to schema and another index on country code 2015-07-08 13:16:01 -04:00
Al
0c5e741bb6 [geonames] Adding LC_ALL environment variable for utf8 sorting 2015-07-06 00:39:23 -04:00
Al
acd5d07d17 [geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together 2015-07-05 15:56:46 -04:00
Al
f825dcb939 [geonames] Fixing admin table DDL 2015-07-03 05:54:41 -04:00
Al
86b23ecca3 [fix] field name 2015-07-02 15:59:11 -04:00
Al
071d6bb392 [geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available 2015-06-30 18:00:07 -04:00
Al
a580ed0b1b [transliteration] Adding numeric HTML escapes e.g. '&' 2015-06-29 15:02:34 -04:00
Al
8fb6a28e9c [fix] using empty string instead of NULL for script languages so we can use fixed length arrays 2015-06-23 15:20:09 -05:00
Al
b21c3a3a2f [transliteration] using different struct in script data header file 2015-06-22 22:06:16 -05:00
Al
c2b4744f55 [transliteration] Using a data file instead of a header for transliteration scripts 2015-06-21 05:37:56 -05:00
Al
b2e201f297 [fix] trailing comma 2015-06-20 15:14:41 -05:00
Al
d4087be40c [geonames] Pre-escaping tabs, no quoting in geonames/postal code TSVs 2015-06-20 11:54:47 -05:00
Al
ab1fb3669f [geonames] Only take alternative names that are != to the canonical name, sort by name, population desc, geonames_id 2015-06-19 15:47:50 -05:00
Al
84b9a6ff33 [transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group 2015-06-17 23:42:31 -04:00
Al
f04fad0e93 [i18n] Generating Hangul syllable classes 2015-06-16 12:50:48 -04:00
Al
cb2035867b [fix] osm geodata imports 2015-06-15 18:36:01 -04:00
Al
d2d25ead6f [utils] Adding unicode_csv module 2015-06-15 18:06:54 -04:00
Al
ccb64f7ac2 [polygons] Adding address_normalizer polygons package 2015-06-15 17:55:27 -04:00
Al
22fa81b33f [fix] __init__.py 2015-06-15 17:54:27 -04:00
Al
41dbd97bf2 [geodisambig] quattroshapes download can use default or specified location, unzips files 2015-06-15 17:54:08 -04:00
Al
037d4575ae [geodisambig] Modifying GeoNames TSV again. Using files again and sorting 2015-06-15 17:51:09 -04:00
Al
67bd9f1a31 [i18n] Adding languages.py 2015-06-15 17:48:47 -04:00
Al
073fe43698 [geodisambig] Adding quattroshapes download script 2015-06-15 17:46:11 -04:00
Al
73f37fe66b [fix] Moving default Geonames DB path to a shared module 2015-06-15 12:53:00 -04:00
Al
7a4fa7d443 [geodisambig] Canonical country names from CLDR, adding alpha-2 and alpha-3 surface forms, writing results to stdout or a file for streaming 2015-06-15 01:58:43 -04:00
Al
43e023077c [fix] Changing logging to stderr for the Geonames scripts 2015-06-14 15:38:57 -04:00
Al
fc735bb5c3 [numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500 2015-06-12 16:09:45 -04:00
Al
2d098fdab6 [numex] Adding ordinal_indicator rule type for CJK ordinals 2015-06-04 11:24:13 -04:00
Al
4c49f63caf [numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th 2015-06-04 03:09:39 -04:00
Al
b2fe9d4db0 [transliteration] Adding uppercase umlauts and Scandinativan a-ring 2015-06-03 22:55:45 -04:00
Al
2ea21dfffb [fix] constants 2015-06-02 13:44:25 -04:00
Al
208366af98 [fix] removing stopwords index 2015-06-02 12:43:48 -04:00
Al
9d0d83bc14 [numex] adding stopword rules with the regular numex rules 2015-06-02 12:37:22 -04:00
Al
4ad978f22c [numex] Using the new representation for generated data 2015-06-02 12:28:07 -04:00
Al
2dc870b3da [numex] Python script to generate numex data 2015-06-02 10:15:02 -04:00
Al
6b3d434c31 [fix] removing unnecessary definition 2015-06-01 17:13:57 -04:00
Al
9c935c9cc7 [fix] Base data dir path 2015-06-01 17:13:06 -04:00
Al
6ac4ff6021 [transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin 2015-05-31 02:07:36 -04:00
Al
9547c93a38 [fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements 2015-05-29 19:47:49 -04:00
Al
a278cfd12c [transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence 2015-05-29 16:54:05 -04:00