Al
|
abfb1d4a60
|
[transliteration] Wide char support in transliteration data generator
|
2015-09-23 03:56:12 -04:00 |
|
Al
|
13bcc35523
|
[unicode] Allowing wide chars in unicode properties
|
2015-09-23 00:34:07 -04:00 |
|
Al
|
b4593b6f88
|
[unicode/tokenization] Using new character classes including wide chars in scanner
|
2015-09-23 00:33:14 -04:00 |
|
Al
|
a76831df7a
|
[unicode] Wide version of word breaks
|
2015-09-22 18:55:33 -04:00 |
|
Al
|
a916668f28
|
[i18n] Local file for ISO 15924
|
2015-09-01 23:58:36 -04:00 |
|
Al
|
b8e4c19146
|
[mv] Moving the get regional/country languages logic out of language polygons
|
2015-08-23 14:25:33 -04:00 |
|
Al
|
122a81b610
|
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
|
2015-08-23 02:26:06 -04:00 |
|
Al
|
0701bb6f08
|
[fix] import
|
2015-08-22 23:19:43 -04:00 |
|
Al
|
d97c725bbc
|
[languages] Allowing specification of multiple regional languages
|
2015-08-18 03:18:52 -04:00 |
|
Al
|
03febc7e20
|
[scripts] Better script code aliasing
|
2015-08-13 18:25:55 -04:00 |
|
Al
|
b54ff95ecc
|
[mv] csv_utils
|
2015-08-13 18:19:54 -04:00 |
|
Al
|
cf70615850
|
[transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps
|
2015-08-11 23:10:55 -04:00 |
|
Al
|
51addec5f2
|
[fix] check for local CLDR in unicode properties
|
2015-08-11 20:23:48 -04:00 |
|
Al
|
882e4c2ab8
|
[fix] ensure CLDR dir
|
2015-08-11 20:04:42 -04:00 |
|
Al
|
48566bf097
|
[fix] cldr languages dir
|
2015-08-11 20:04:25 -04:00 |
|
Al
|
dd391eabe5
|
[numex] Separating rules from keys for Linux gcc compilation
|
2015-08-09 01:00:57 -04:00 |
|
Al
|
1d39916aaa
|
[fix] Fixing warnings in unicode script data
|
2015-08-02 21:30:54 -06:00 |
|
Al
|
87566bb6a5
|
[numex] Adding validation checks for numex JSON
|
2015-07-24 15:22:07 -04:00 |
|
Al
|
64a63fdf51
|
[mv] Moving all repo data files to a resources dir, data is only for runtime files
|
2015-07-21 18:11:36 -04:00 |
|
Al
|
076c07e21f
|
[fix] Add minor languages to the language set
|
2015-07-16 00:58:58 -04:00 |
|
Al
|
95a6845a85
|
[i18n] Adding regional languages as valid country languages
|
2015-07-08 14:54:00 -04:00 |
|
Al
|
a580ed0b1b
|
[transliteration] Adding numeric HTML escapes e.g. '&'
|
2015-06-29 15:02:34 -04:00 |
|
Al
|
8fb6a28e9c
|
[fix] using empty string instead of NULL for script languages so we can use fixed length arrays
|
2015-06-23 15:20:09 -05:00 |
|
Al
|
b21c3a3a2f
|
[transliteration] using different struct in script data header file
|
2015-06-22 22:06:16 -05:00 |
|
Al
|
c2b4744f55
|
[transliteration] Using a data file instead of a header for transliteration scripts
|
2015-06-21 05:37:56 -05:00 |
|
Al
|
84b9a6ff33
|
[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group
|
2015-06-17 23:42:31 -04:00 |
|
Al
|
f04fad0e93
|
[i18n] Generating Hangul syllable classes
|
2015-06-16 12:50:48 -04:00 |
|
Al
|
67bd9f1a31
|
[i18n] Adding languages.py
|
2015-06-15 17:48:47 -04:00 |
|
Al
|
fc735bb5c3
|
[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500
|
2015-06-12 16:09:45 -04:00 |
|
Al
|
2d098fdab6
|
[numex] Adding ordinal_indicator rule type for CJK ordinals
|
2015-06-04 11:24:13 -04:00 |
|
Al
|
4c49f63caf
|
[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th
|
2015-06-04 03:09:39 -04:00 |
|
Al
|
b2fe9d4db0
|
[transliteration] Adding uppercase umlauts and Scandinativan a-ring
|
2015-06-03 22:55:45 -04:00 |
|
Al
|
2ea21dfffb
|
[fix] constants
|
2015-06-02 13:44:25 -04:00 |
|
Al
|
208366af98
|
[fix] removing stopwords index
|
2015-06-02 12:43:48 -04:00 |
|
Al
|
9d0d83bc14
|
[numex] adding stopword rules with the regular numex rules
|
2015-06-02 12:37:22 -04:00 |
|
Al
|
4ad978f22c
|
[numex] Using the new representation for generated data
|
2015-06-02 12:28:07 -04:00 |
|
Al
|
2dc870b3da
|
[numex] Python script to generate numex data
|
2015-06-02 10:15:02 -04:00 |
|
Al
|
6b3d434c31
|
[fix] removing unnecessary definition
|
2015-06-01 17:13:57 -04:00 |
|
Al
|
9c935c9cc7
|
[fix] Base data dir path
|
2015-06-01 17:13:06 -04:00 |
|
Al
|
6ac4ff6021
|
[transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin
|
2015-05-31 02:07:36 -04:00 |
|
Al
|
9547c93a38
|
[fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements
|
2015-05-29 19:47:49 -04:00 |
|
Al
|
a278cfd12c
|
[transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence
|
2015-05-29 16:54:05 -04:00 |
|
Al
|
a9d5b91ac0
|
[transliteration] Not counting repeat character in group capture
|
2015-05-28 19:36:25 -04:00 |
|
Al
|
c00ecf6ea8
|
[fix] minimizing c* into (c|'')+, using empty transition instead of zero-length string
|
2015-05-22 18:11:54 -04:00 |
|
Al
|
b2d15b29cf
|
[fix] greek_latin_ungegn => greek-latin-ungegn
|
2015-05-22 09:52:08 -04:00 |
|
Al
|
d65f7747f0
|
[transliteration] Adding html escapes as the first step in the Latin-ASCII transformation
|
2015-05-20 14:44:55 -04:00 |
|
Al
|
4694371cdc
|
[fix] unicode escaping the German transliterations
|
2015-05-18 13:55:57 -04:00 |
|
Al
|
e25f039ee4
|
[transliteration] Escaped single quotes in rules + ignoring rules with codepoints > \uffff
|
2015-05-17 18:31:35 -04:00 |
|
Al
|
d72348d47e
|
[transliteratin] Using a restricted set of diacritical marks relevant to Greek, variants stand in for transliterator dependencies e.g. use Katakana-Latin-BGN if Katakana-Latin cannot be found
|
2015-05-17 17:42:37 -04:00 |
|
Al
|
30db201e8a
|
[fix] NUM_CHARS => NUM_CODEPOINTS
|
2015-05-17 13:53:19 -04:00 |
|