Al
|
a0f2ff1e2a
|
[fix] adding encoding declaration
|
2015-07-13 21:09:18 -04:00 |
|
Al
|
d15737b319
|
[osm] Validating lat/lon in OSM training data
|
2015-07-13 21:08:08 -04:00 |
|
Al
|
0c18a57c4e
|
[fix] planet url no longer needed
|
2015-07-13 14:27:26 -04:00 |
|
Al
|
e8348dde0e
|
[osm] removing all the fetch/convert arguments from training data generator
|
2015-07-13 14:24:54 -04:00 |
|
Al
|
5e9e08f6b1
|
[fix] making fetch script executable
|
2015-07-13 14:19:24 -04:00 |
|
Al
|
465bcd46aa
|
[fix] input file in OSM training data generator
|
2015-07-13 14:18:24 -04:00 |
|
Al
|
961606ac12
|
[fix] removing intermediate file in OSM fetch
|
2015-07-13 14:17:57 -04:00 |
|
Al
|
59bf23ae67
|
[osm] Planet admin bounds filter
|
2015-07-13 04:08:55 -04:00 |
|
Al
|
7c988fa717
|
[fix] imports
|
2015-07-13 01:50:42 -04:00 |
|
Al
|
e603bad9f3
|
[fix] adding admin_level to the allowed properties list for language polygons
|
2015-07-13 01:49:54 -04:00 |
|
Al
|
fcff210d77
|
[rtree] Language polygon index returns polygons from most specific admin level to least specific
|
2015-07-13 00:58:47 -04:00 |
|
Al
|
ec1e820268
|
[parsing] Changing to OpenCageData repo
|
2015-07-09 13:44:14 -04:00 |
|
Al
|
e64b6c3398
|
[geonames] NULL language and official language canonical should have the same sort value
|
2015-07-08 17:03:51 -04:00 |
|
Al
|
4a2be72350
|
[geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical)
|
2015-07-08 16:42:42 -04:00 |
|
Al
|
95a6845a85
|
[i18n] Adding regional languages as valid country languages
|
2015-07-08 14:54:00 -04:00 |
|
Al
|
ef1ecb97f7
|
[geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information)
|
2015-07-08 13:30:57 -04:00 |
|
Al
|
6cc677ac0b
|
[geonames] Adding defaults to schema and another index on country code
|
2015-07-08 13:16:01 -04:00 |
|
Al
|
0c5e741bb6
|
[geonames] Adding LC_ALL environment variable for utf8 sorting
|
2015-07-06 00:39:23 -04:00 |
|
Al
|
acd5d07d17
|
[geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together
|
2015-07-05 15:56:46 -04:00 |
|
Al
|
f825dcb939
|
[geonames] Fixing admin table DDL
|
2015-07-03 05:54:41 -04:00 |
|
Al
|
86b23ecca3
|
[fix] field name
|
2015-07-02 15:59:11 -04:00 |
|
Al
|
071d6bb392
|
[geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available
|
2015-06-30 18:00:07 -04:00 |
|
Al
|
a580ed0b1b
|
[transliteration] Adding numeric HTML escapes e.g. '&'
|
2015-06-29 15:02:34 -04:00 |
|
Al
|
8fb6a28e9c
|
[fix] using empty string instead of NULL for script languages so we can use fixed length arrays
|
2015-06-23 15:20:09 -05:00 |
|
Al
|
b21c3a3a2f
|
[transliteration] using different struct in script data header file
|
2015-06-22 22:06:16 -05:00 |
|
Al
|
c2b4744f55
|
[transliteration] Using a data file instead of a header for transliteration scripts
|
2015-06-21 05:37:56 -05:00 |
|
Al
|
b2e201f297
|
[fix] trailing comma
|
2015-06-20 15:14:41 -05:00 |
|
Al
|
d4087be40c
|
[geonames] Pre-escaping tabs, no quoting in geonames/postal code TSVs
|
2015-06-20 11:54:47 -05:00 |
|
Al
|
ab1fb3669f
|
[geonames] Only take alternative names that are != to the canonical name, sort by name, population desc, geonames_id
|
2015-06-19 15:47:50 -05:00 |
|
Al
|
84b9a6ff33
|
[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group
|
2015-06-17 23:42:31 -04:00 |
|
Al
|
f04fad0e93
|
[i18n] Generating Hangul syllable classes
|
2015-06-16 12:50:48 -04:00 |
|
Al
|
cb2035867b
|
[fix] osm geodata imports
|
2015-06-15 18:36:01 -04:00 |
|
Al
|
d2d25ead6f
|
[utils] Adding unicode_csv module
|
2015-06-15 18:06:54 -04:00 |
|
Al
|
ccb64f7ac2
|
[polygons] Adding address_normalizer polygons package
|
2015-06-15 17:55:27 -04:00 |
|
Al
|
22fa81b33f
|
[fix] __init__.py
|
2015-06-15 17:54:27 -04:00 |
|
Al
|
41dbd97bf2
|
[geodisambig] quattroshapes download can use default or specified location, unzips files
|
2015-06-15 17:54:08 -04:00 |
|
Al
|
037d4575ae
|
[geodisambig] Modifying GeoNames TSV again. Using files again and sorting
|
2015-06-15 17:51:09 -04:00 |
|
Al
|
67bd9f1a31
|
[i18n] Adding languages.py
|
2015-06-15 17:48:47 -04:00 |
|
Al
|
073fe43698
|
[geodisambig] Adding quattroshapes download script
|
2015-06-15 17:46:11 -04:00 |
|
Al
|
73f37fe66b
|
[fix] Moving default Geonames DB path to a shared module
|
2015-06-15 12:53:00 -04:00 |
|
Al
|
7a4fa7d443
|
[geodisambig] Canonical country names from CLDR, adding alpha-2 and alpha-3 surface forms, writing results to stdout or a file for streaming
|
2015-06-15 01:58:43 -04:00 |
|
Al
|
43e023077c
|
[fix] Changing logging to stderr for the Geonames scripts
|
2015-06-14 15:38:57 -04:00 |
|
Al
|
fc735bb5c3
|
[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500
|
2015-06-12 16:09:45 -04:00 |
|
Al
|
2d098fdab6
|
[numex] Adding ordinal_indicator rule type for CJK ordinals
|
2015-06-04 11:24:13 -04:00 |
|
Al
|
4c49f63caf
|
[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th
|
2015-06-04 03:09:39 -04:00 |
|
Al
|
b2fe9d4db0
|
[transliteration] Adding uppercase umlauts and Scandinativan a-ring
|
2015-06-03 22:55:45 -04:00 |
|
Al
|
2ea21dfffb
|
[fix] constants
|
2015-06-02 13:44:25 -04:00 |
|
Al
|
208366af98
|
[fix] removing stopwords index
|
2015-06-02 12:43:48 -04:00 |
|
Al
|
9d0d83bc14
|
[numex] adding stopword rules with the regular numex rules
|
2015-06-02 12:37:22 -04:00 |
|
Al
|
4ad978f22c
|
[numex] Using the new representation for generated data
|
2015-06-02 12:28:07 -04:00 |
|