Commit Graph

28 Commits

Author SHA1 Message Date
Al
d6d5eab989 [geonames] Adding ability to lookup GeoNames alternate names (may obtain IDs from Quattroshapes). Not great for local-language primary names (OSM remains the best) but decent for extracting foreign toponyms 2015-11-25 17:07:14 -05:00
Al
151161cab3 [fix] Raising error in geonames output if a country cannot be localized 2015-10-07 03:45:56 -04:00
Al
c790a2b87f [fix] spoken/official 2015-10-02 19:50:11 -04:00
Al
db3364be30 [geonames] Using official country languages in GeoNames 2015-10-01 02:21:14 -04:00
Al
daad1a1313 [geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate) 2015-09-28 17:46:53 -04:00
Al
f6e521e3f3 [geonames] Adding covering index to geonames DB 2015-08-22 13:54:25 -04:00
Al
88d63c85d2 [utils] no-quote CSV dialect 2015-08-13 18:26:51 -04:00
Al
e64b6c3398 [geonames] NULL language and official language canonical should have the same sort value 2015-07-08 17:03:51 -04:00
Al
4a2be72350 [geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical) 2015-07-08 16:42:42 -04:00
Al
ef1ecb97f7 [geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information) 2015-07-08 13:30:57 -04:00
Al
6cc677ac0b [geonames] Adding defaults to schema and another index on country code 2015-07-08 13:16:01 -04:00
Al
0c5e741bb6 [geonames] Adding LC_ALL environment variable for utf8 sorting 2015-07-06 00:39:23 -04:00
Al
acd5d07d17 [geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together 2015-07-05 15:56:46 -04:00
Al
f825dcb939 [geonames] Fixing admin table DDL 2015-07-03 05:54:41 -04:00
Al
86b23ecca3 [fix] field name 2015-07-02 15:59:11 -04:00
Al
071d6bb392 [geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available 2015-06-30 18:00:07 -04:00
Al
b2e201f297 [fix] trailing comma 2015-06-20 15:14:41 -05:00
Al
d4087be40c [geonames] Pre-escaping tabs, no quoting in geonames/postal code TSVs 2015-06-20 11:54:47 -05:00
Al
ab1fb3669f [geonames] Only take alternative names that are != to the canonical name, sort by name, population desc, geonames_id 2015-06-19 15:47:50 -05:00
Al
037d4575ae [geodisambig] Modifying GeoNames TSV again. Using files again and sorting 2015-06-15 17:51:09 -04:00
Al
73f37fe66b [fix] Moving default Geonames DB path to a shared module 2015-06-15 12:53:00 -04:00
Al
7a4fa7d443 [geodisambig] Canonical country names from CLDR, adding alpha-2 and alpha-3 surface forms, writing results to stdout or a file for streaming 2015-06-15 01:58:43 -04:00
Al
43e023077c [fix] Changing logging to stderr for the Geonames scripts 2015-06-14 15:38:57 -04:00
Al
d1267145f7 [fix] args to wget 2015-04-13 19:02:50 -04:00
Al
d50d7d182e [fix] geonames import script for admin 1 codes 2015-04-12 12:16:08 -04:00
Al
26c2823208 [fix] comma 2015-03-14 18:58:18 -04:00
Al
3e20b4f600 [fix] Capturing GeoNames canonical and alternate names with a UNION ALL query, creating C headers with the field orderings for parsing the TSV file downstream 2015-03-14 18:02:14 -04:00
Al
284af74ba4 [geodisambig] Python scripts to prep GeoNames records for trie insertion 2015-03-13 11:56:48 -04:00