Commit Graph

94 Commits

Author SHA1 Message Date
Al
cdf8829942 [fix] no longer requiring argv for unicode_properties script 2016-07-21 17:04:57 -04:00
Al
6703da8fc3 [fix] languages and disambiguation do initialization by default 2016-07-21 17:04:57 -04:00
Al
c506649252 [fix] languages_intialized 2016-07-21 17:04:57 -04:00
Al
5e2d9f371e [numex] Moving numex script to a different subpackage, adding function for creating ordinals 2016-07-21 17:04:57 -04:00
Al
1bc92d6995 [fix] output path in numex.py 2016-03-29 11:25:36 -04:00
Al
2a2d1738a3 [fix] path for running numex.py 2016-03-29 11:15:24 -04:00
Al
da62ff309e [transliteration] Fixing Malayalam script 2016-01-17 22:15:56 -05:00
Al
8030b235e6 [languages] Changing the definition in script languages so only languages that appear on street signs will be used 2016-01-17 22:03:41 -05:00
Al
e9e05bb929 [transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules 2015-12-23 13:07:44 -05:00
Al
e55ff54be1 [fix] Adding Korean-Latin-BGN to excluded transliterators 2015-12-21 16:24:50 -05:00
Al
682c316775 [transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either 2015-12-21 12:45:45 -05:00
Al
ccf509edb1 [fix] update to control characters for generating the transliteration rules 2015-12-20 15:40:38 -05:00
Al
b2a944830a [transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format 2015-12-19 00:34:30 -05:00
Al
7f5cf89e84 [transliteration] Not escaping right side transliteration rules 2015-10-27 12:24:38 -04:00
Al
7dfbcce9ec [languages] options for get_country_languages 2015-09-30 04:09:07 -04:00
Al
5417b4e602 [unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories 2015-09-25 23:59:38 -04:00
Al
abfb1d4a60 [transliteration] Wide char support in transliteration data generator 2015-09-23 03:56:12 -04:00
Al
13bcc35523 [unicode] Allowing wide chars in unicode properties 2015-09-23 00:34:07 -04:00
Al
b4593b6f88 [unicode/tokenization] Using new character classes including wide chars in scanner 2015-09-23 00:33:14 -04:00
Al
a76831df7a [unicode] Wide version of word breaks 2015-09-22 18:55:33 -04:00
Al
a916668f28 [i18n] Local file for ISO 15924 2015-09-01 23:58:36 -04:00
Al
b8e4c19146 [mv] Moving the get regional/country languages logic out of language polygons 2015-08-23 14:25:33 -04:00
Al
122a81b610 [languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib 2015-08-23 02:26:06 -04:00
Al
0701bb6f08 [fix] import 2015-08-22 23:19:43 -04:00
Al
d97c725bbc [languages] Allowing specification of multiple regional languages 2015-08-18 03:18:52 -04:00
Al
03febc7e20 [scripts] Better script code aliasing 2015-08-13 18:25:55 -04:00
Al
b54ff95ecc [mv] csv_utils 2015-08-13 18:19:54 -04:00
Al
cf70615850 [transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps 2015-08-11 23:10:55 -04:00
Al
51addec5f2 [fix] check for local CLDR in unicode properties 2015-08-11 20:23:48 -04:00
Al
882e4c2ab8 [fix] ensure CLDR dir 2015-08-11 20:04:42 -04:00
Al
48566bf097 [fix] cldr languages dir 2015-08-11 20:04:25 -04:00
Al
dd391eabe5 [numex] Separating rules from keys for Linux gcc compilation 2015-08-09 01:00:57 -04:00
Al
1d39916aaa [fix] Fixing warnings in unicode script data 2015-08-02 21:30:54 -06:00
Al
87566bb6a5 [numex] Adding validation checks for numex JSON 2015-07-24 15:22:07 -04:00
Al
64a63fdf51 [mv] Moving all repo data files to a resources dir, data is only for runtime files 2015-07-21 18:11:36 -04:00
Al
076c07e21f [fix] Add minor languages to the language set 2015-07-16 00:58:58 -04:00
Al
95a6845a85 [i18n] Adding regional languages as valid country languages 2015-07-08 14:54:00 -04:00
Al
a580ed0b1b [transliteration] Adding numeric HTML escapes e.g. '&' 2015-06-29 15:02:34 -04:00
Al
8fb6a28e9c [fix] using empty string instead of NULL for script languages so we can use fixed length arrays 2015-06-23 15:20:09 -05:00
Al
b21c3a3a2f [transliteration] using different struct in script data header file 2015-06-22 22:06:16 -05:00
Al
c2b4744f55 [transliteration] Using a data file instead of a header for transliteration scripts 2015-06-21 05:37:56 -05:00
Al
84b9a6ff33 [transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group 2015-06-17 23:42:31 -04:00
Al
f04fad0e93 [i18n] Generating Hangul syllable classes 2015-06-16 12:50:48 -04:00
Al
67bd9f1a31 [i18n] Adding languages.py 2015-06-15 17:48:47 -04:00
Al
fc735bb5c3 [numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500 2015-06-12 16:09:45 -04:00
Al
2d098fdab6 [numex] Adding ordinal_indicator rule type for CJK ordinals 2015-06-04 11:24:13 -04:00
Al
4c49f63caf [numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th 2015-06-04 03:09:39 -04:00
Al
b2fe9d4db0 [transliteration] Adding uppercase umlauts and Scandinativan a-ring 2015-06-03 22:55:45 -04:00
Al
2ea21dfffb [fix] constants 2015-06-02 13:44:25 -04:00
Al
208366af98 [fix] removing stopwords index 2015-06-02 12:43:48 -04:00