Al
|
6081df0cd1
|
[osm] adding admin1 ids to the OSM country rtree
|
2016-10-04 23:12:15 -04:00 |
|
Al
|
cb4408fea8
|
[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.
|
2016-08-20 18:17:46 -04:00 |
|
Al
|
93586c2592
|
[fix] aliasing all_languages
|
2016-08-18 02:24:59 -04:00 |
|
Al
|
1ef57ee7d2
|
[i18n/postcodes] Fetching postcode regexes from the data source used by Google's libaddressinput, caches requests for the length of the running program (e.g. generating parser data, so the regexes will get updated over time).
|
2016-07-26 17:42:50 -04:00 |
|
Al
|
cdf8829942
|
[fix] no longer requiring argv for unicode_properties script
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
6703da8fc3
|
[fix] languages and disambiguation do initialization by default
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
c506649252
|
[fix] languages_intialized
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
5e2d9f371e
|
[numex] Moving numex script to a different subpackage, adding function for creating ordinals
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
1bc92d6995
|
[fix] output path in numex.py
|
2016-03-29 11:25:36 -04:00 |
|
Al
|
2a2d1738a3
|
[fix] path for running numex.py
|
2016-03-29 11:15:24 -04:00 |
|
Al
|
da62ff309e
|
[transliteration] Fixing Malayalam script
|
2016-01-17 22:15:56 -05:00 |
|
Al
|
8030b235e6
|
[languages] Changing the definition in script languages so only languages that appear on street signs will be used
|
2016-01-17 22:03:41 -05:00 |
|
Al
|
e9e05bb929
|
[transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules
|
2015-12-23 13:07:44 -05:00 |
|
Al
|
e55ff54be1
|
[fix] Adding Korean-Latin-BGN to excluded transliterators
|
2015-12-21 16:24:50 -05:00 |
|
Al
|
682c316775
|
[transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either
|
2015-12-21 12:45:45 -05:00 |
|
Al
|
ccf509edb1
|
[fix] update to control characters for generating the transliteration rules
|
2015-12-20 15:40:38 -05:00 |
|
Al
|
b2a944830a
|
[transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format
|
2015-12-19 00:34:30 -05:00 |
|
Al
|
7f5cf89e84
|
[transliteration] Not escaping right side transliteration rules
|
2015-10-27 12:24:38 -04:00 |
|
Al
|
7dfbcce9ec
|
[languages] options for get_country_languages
|
2015-09-30 04:09:07 -04:00 |
|
Al
|
5417b4e602
|
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
|
2015-09-25 23:59:38 -04:00 |
|
Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
a916668f28
|
[i18n] Local file for ISO 15924
|
2015-09-01 23:58:36 -04:00 |
|
Al
|
b8e4c19146
|
[mv] Moving the get regional/country languages logic out of language polygons
|
2015-08-23 14:25:33 -04:00 |
|
Al
|
122a81b610
|
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
|
2015-08-23 02:26:06 -04:00 |
|
Al
|
0701bb6f08
|
[fix] import
|
2015-08-22 23:19:43 -04:00 |
|
Al
|
d97c725bbc
|
[languages] Allowing specification of multiple regional languages
|
2015-08-18 03:18:52 -04:00 |
|
Al
|
03febc7e20
|
[scripts] Better script code aliasing
|
2015-08-13 18:25:55 -04:00 |
|
Al
|
b54ff95ecc
|
[mv] csv_utils
|
2015-08-13 18:19:54 -04:00 |
|
Al
|
cf70615850
|
[transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps
|
2015-08-11 23:10:55 -04:00 |
|
Al
|
51addec5f2
|
[fix] check for local CLDR in unicode properties
|
2015-08-11 20:23:48 -04:00 |
|
Al
|
882e4c2ab8
|
[fix] ensure CLDR dir
|
2015-08-11 20:04:42 -04:00 |
|
Al
|
48566bf097
|
[fix] cldr languages dir
|
2015-08-11 20:04:25 -04:00 |
|
Al
|
dd391eabe5
|
[numex] Separating rules from keys for Linux gcc compilation
|
2015-08-09 01:00:57 -04:00 |
|
Al
|
1d39916aaa
|
[fix] Fixing warnings in unicode script data
|
2015-08-02 21:30:54 -06:00 |
|
Al
|
87566bb6a5
|
[numex] Adding validation checks for numex JSON
|
2015-07-24 15:22:07 -04:00 |
|
Al
|
64a63fdf51
|
[mv] Moving all repo data files to a resources dir, data is only for runtime files
|
2015-07-21 18:11:36 -04:00 |
|
Al
|
076c07e21f
|
[fix] Add minor languages to the language set
|
2015-07-16 00:58:58 -04:00 |
|
Al
|
95a6845a85
|
[i18n] Adding regional languages as valid country languages
|
2015-07-08 14:54:00 -04:00 |
|
Al
|
a580ed0b1b
|
[transliteration] Adding numeric HTML escapes e.g. '&'
|
2015-06-29 15:02:34 -04:00 |
|
Al
|
8fb6a28e9c
|
[fix] using empty string instead of NULL for script languages so we can use fixed length arrays
|
2015-06-23 15:20:09 -05:00 |
|
Al
|
b21c3a3a2f
|
[transliteration] using different struct in script data header file
|
2015-06-22 22:06:16 -05:00 |
|
Al
|
c2b4744f55
|
[transliteration] Using a data file instead of a header for transliteration scripts
|
2015-06-21 05:37:56 -05:00 |
|
Al
|
84b9a6ff33
|
[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group
|
2015-06-17 23:42:31 -04:00 |
|
Al
|
f04fad0e93
|
[i18n] Generating Hangul syllable classes
|
2015-06-16 12:50:48 -04:00 |
|
Al
|
67bd9f1a31
|
[i18n] Adding languages.py
|
2015-06-15 17:48:47 -04:00 |
|
Al
|
fc735bb5c3
|
[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500
|
2015-06-12 16:09:45 -04:00 |
|
Al
|
2d098fdab6
|
[numex] Adding ordinal_indicator rule type for CJK ordinals
|
2015-06-04 11:24:13 -04:00 |
|