Al
|
961606ac12
|
[fix] removing intermediate file in OSM fetch
|
2015-07-13 14:17:57 -04:00 |
|
Al
|
59bf23ae67
|
[osm] Planet admin bounds filter
|
2015-07-13 04:08:55 -04:00 |
|
Al
|
7c988fa717
|
[fix] imports
|
2015-07-13 01:50:42 -04:00 |
|
Al
|
e603bad9f3
|
[fix] adding admin_level to the allowed properties list for language polygons
|
2015-07-13 01:49:54 -04:00 |
|
Al
|
fcff210d77
|
[rtree] Language polygon index returns polygons from most specific admin level to least specific
|
2015-07-13 00:58:47 -04:00 |
|
Al
|
ec1e820268
|
[parsing] Changing to OpenCageData repo
|
2015-07-09 13:44:14 -04:00 |
|
Al
|
e64b6c3398
|
[geonames] NULL language and official language canonical should have the same sort value
|
2015-07-08 17:03:51 -04:00 |
|
Al
|
4a2be72350
|
[geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical)
|
2015-07-08 16:42:42 -04:00 |
|
Al
|
95a6845a85
|
[i18n] Adding regional languages as valid country languages
|
2015-07-08 14:54:00 -04:00 |
|
Al
|
ef1ecb97f7
|
[geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information)
|
2015-07-08 13:30:57 -04:00 |
|
Al
|
6cc677ac0b
|
[geonames] Adding defaults to schema and another index on country code
|
2015-07-08 13:16:01 -04:00 |
|
Al
|
0c5e741bb6
|
[geonames] Adding LC_ALL environment variable for utf8 sorting
|
2015-07-06 00:39:23 -04:00 |
|
Al
|
acd5d07d17
|
[geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together
|
2015-07-05 15:56:46 -04:00 |
|
Al
|
f825dcb939
|
[geonames] Fixing admin table DDL
|
2015-07-03 05:54:41 -04:00 |
|
Al
|
86b23ecca3
|
[fix] field name
|
2015-07-02 15:59:11 -04:00 |
|
Al
|
071d6bb392
|
[geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available
|
2015-06-30 18:00:07 -04:00 |
|
Al
|
a580ed0b1b
|
[transliteration] Adding numeric HTML escapes e.g. '&'
|
2015-06-29 15:02:34 -04:00 |
|
Al
|
8fb6a28e9c
|
[fix] using empty string instead of NULL for script languages so we can use fixed length arrays
|
2015-06-23 15:20:09 -05:00 |
|
Al
|
b21c3a3a2f
|
[transliteration] using different struct in script data header file
|
2015-06-22 22:06:16 -05:00 |
|
Al
|
c2b4744f55
|
[transliteration] Using a data file instead of a header for transliteration scripts
|
2015-06-21 05:37:56 -05:00 |
|
Al
|
b2e201f297
|
[fix] trailing comma
|
2015-06-20 15:14:41 -05:00 |
|
Al
|
d4087be40c
|
[geonames] Pre-escaping tabs, no quoting in geonames/postal code TSVs
|
2015-06-20 11:54:47 -05:00 |
|
Al
|
ab1fb3669f
|
[geonames] Only take alternative names that are != to the canonical name, sort by name, population desc, geonames_id
|
2015-06-19 15:47:50 -05:00 |
|
Al
|
84b9a6ff33
|
[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group
|
2015-06-17 23:42:31 -04:00 |
|
Al
|
f04fad0e93
|
[i18n] Generating Hangul syllable classes
|
2015-06-16 12:50:48 -04:00 |
|
Al
|
cb2035867b
|
[fix] osm geodata imports
|
2015-06-15 18:36:01 -04:00 |
|
Al
|
d2d25ead6f
|
[utils] Adding unicode_csv module
|
2015-06-15 18:06:54 -04:00 |
|
Al
|
ccb64f7ac2
|
[polygons] Adding address_normalizer polygons package
|
2015-06-15 17:55:27 -04:00 |
|
Al
|
22fa81b33f
|
[fix] __init__.py
|
2015-06-15 17:54:27 -04:00 |
|
Al
|
41dbd97bf2
|
[geodisambig] quattroshapes download can use default or specified location, unzips files
|
2015-06-15 17:54:08 -04:00 |
|
Al
|
037d4575ae
|
[geodisambig] Modifying GeoNames TSV again. Using files again and sorting
|
2015-06-15 17:51:09 -04:00 |
|
Al
|
67bd9f1a31
|
[i18n] Adding languages.py
|
2015-06-15 17:48:47 -04:00 |
|
Al
|
073fe43698
|
[geodisambig] Adding quattroshapes download script
|
2015-06-15 17:46:11 -04:00 |
|
Al
|
73f37fe66b
|
[fix] Moving default Geonames DB path to a shared module
|
2015-06-15 12:53:00 -04:00 |
|
Al
|
7a4fa7d443
|
[geodisambig] Canonical country names from CLDR, adding alpha-2 and alpha-3 surface forms, writing results to stdout or a file for streaming
|
2015-06-15 01:58:43 -04:00 |
|
Al
|
43e023077c
|
[fix] Changing logging to stderr for the Geonames scripts
|
2015-06-14 15:38:57 -04:00 |
|
Al
|
fc735bb5c3
|
[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500
|
2015-06-12 16:09:45 -04:00 |
|
Al
|
2d098fdab6
|
[numex] Adding ordinal_indicator rule type for CJK ordinals
|
2015-06-04 11:24:13 -04:00 |
|
Al
|
4c49f63caf
|
[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th
|
2015-06-04 03:09:39 -04:00 |
|
Al
|
b2fe9d4db0
|
[transliteration] Adding uppercase umlauts and Scandinativan a-ring
|
2015-06-03 22:55:45 -04:00 |
|
Al
|
2ea21dfffb
|
[fix] constants
|
2015-06-02 13:44:25 -04:00 |
|
Al
|
208366af98
|
[fix] removing stopwords index
|
2015-06-02 12:43:48 -04:00 |
|
Al
|
9d0d83bc14
|
[numex] adding stopword rules with the regular numex rules
|
2015-06-02 12:37:22 -04:00 |
|
Al
|
4ad978f22c
|
[numex] Using the new representation for generated data
|
2015-06-02 12:28:07 -04:00 |
|
Al
|
2dc870b3da
|
[numex] Python script to generate numex data
|
2015-06-02 10:15:02 -04:00 |
|
Al
|
6b3d434c31
|
[fix] removing unnecessary definition
|
2015-06-01 17:13:57 -04:00 |
|
Al
|
9c935c9cc7
|
[fix] Base data dir path
|
2015-06-01 17:13:06 -04:00 |
|
Al
|
6ac4ff6021
|
[transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin
|
2015-05-31 02:07:36 -04:00 |
|
Al
|
9547c93a38
|
[fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements
|
2015-05-29 19:47:49 -04:00 |
|
Al
|
a278cfd12c
|
[transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence
|
2015-05-29 16:54:05 -04:00 |
|