Commit Graph

22 Commits

Author SHA1 Message Date
Al
22123b80ba [fix] refactoring geonames script a bit 2016-08-11 21:31:39 -04:00
Al
151161cab3 [fix] Raising error in geonames output if a country cannot be localized 2015-10-07 03:45:56 -04:00
Al
c790a2b87f [fix] spoken/official 2015-10-02 19:50:11 -04:00
Al
db3364be30 [geonames] Using official country languages in GeoNames 2015-10-01 02:21:14 -04:00
Al
daad1a1313 [geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate) 2015-09-28 17:46:53 -04:00
Al
f6e521e3f3 [geonames] Adding covering index to geonames DB 2015-08-22 13:54:25 -04:00
Al
88d63c85d2 [utils] no-quote CSV dialect 2015-08-13 18:26:51 -04:00
Al
e64b6c3398 [geonames] NULL language and official language canonical should have the same sort value 2015-07-08 17:03:51 -04:00
Al
4a2be72350 [geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical) 2015-07-08 16:42:42 -04:00
Al
ef1ecb97f7 [geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information) 2015-07-08 13:30:57 -04:00
Al
0c5e741bb6 [geonames] Adding LC_ALL environment variable for utf8 sorting 2015-07-06 00:39:23 -04:00
Al
acd5d07d17 [geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together 2015-07-05 15:56:46 -04:00
Al
86b23ecca3 [fix] field name 2015-07-02 15:59:11 -04:00
Al
071d6bb392 [geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available 2015-06-30 18:00:07 -04:00
Al
b2e201f297 [fix] trailing comma 2015-06-20 15:14:41 -05:00
Al
d4087be40c [geonames] Pre-escaping tabs, no quoting in geonames/postal code TSVs 2015-06-20 11:54:47 -05:00
Al
ab1fb3669f [geonames] Only take alternative names that are != to the canonical name, sort by name, population desc, geonames_id 2015-06-19 15:47:50 -05:00
Al
037d4575ae [geodisambig] Modifying GeoNames TSV again. Using files again and sorting 2015-06-15 17:51:09 -04:00
Al
7a4fa7d443 [geodisambig] Canonical country names from CLDR, adding alpha-2 and alpha-3 surface forms, writing results to stdout or a file for streaming 2015-06-15 01:58:43 -04:00
Al
26c2823208 [fix] comma 2015-03-14 18:58:18 -04:00
Al
3e20b4f600 [fix] Capturing GeoNames canonical and alternate names with a UNION ALL query, creating C headers with the field orderings for parsing the TSV file downstream 2015-03-14 18:02:14 -04:00
Al
284af74ba4 [geodisambig] Python scripts to prep GeoNames records for trie insertion 2015-03-13 11:56:48 -04:00