Commit Graph

180 Commits

Author SHA1 Message Date
Al
080f382065 [numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated 2015-06-01 17:12:07 -04:00
Al
a20b768237 [numex] Russian numex rules (a start at least, might need a native speaker to review the RBNF transform in CLDR) 2015-06-01 17:08:57 -04:00
Al
05ffbffb23 [numex] Latin numex rules i.e. Roman numerals, used for most languages 2015-06-01 17:08:04 -04:00
Al
028bb5a1aa [numex] German numex rules 2015-06-01 17:07:35 -04:00
Al
9bd75cee23 [numex] Romance language numex rules (Spanish, French, Italian, Portuguese) 2015-06-01 17:07:23 -04:00
Al
99aed992da [numex] English numex rules 2015-06-01 17:06:53 -04:00
Al
920e15bd4d [numex] Adding numex setup/IO methods 2015-06-01 15:43:23 -04:00
Al
c0347a3431 [numex] numex header and structs 2015-06-01 15:41:34 -04:00
Al
b74fa0da99 [config] Adding config header 2015-06-01 15:40:59 -04:00
Al
93172bd16d [transliteration] New transliterator_scripts file 2015-05-31 02:09:28 -04:00
Al
0575984144 [transliteration] New data file 2015-05-31 02:08:26 -04:00
Al
6ac4ff6021 [transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin 2015-05-31 02:07:36 -04:00
Al
664d5e90db [fix] Removing the stub comment and a few more random comments 2015-05-29 20:10:44 -04:00
Al
06318a6fab [fix] logging code 2015-05-29 20:08:49 -04:00
Al
55568e9ffa [fix] Removing commented out section 2015-05-29 20:01:17 -04:00
Al
583cadd44f [transliteration] transliterate implementation from trie (need to build/save the tables first) 2015-05-29 19:59:45 -04:00
Al
6239c2fcfc [transliteration] regenerated data file including InterIndic-Latin dependency 2015-05-29 19:48:19 -04:00
Al
9547c93a38 [fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements 2015-05-29 19:47:49 -04:00
Al
8b56d63fde [fix] only count non-set chars in parse_groups 2015-05-29 19:42:05 -04:00
Al
a278cfd12c [transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence 2015-05-29 16:54:05 -04:00
Al
a9d5b91ac0 [transliteration] Not counting repeat character in group capture 2015-05-28 19:36:25 -04:00
Al
0177fd4b13 [fix] trie_search using proper length in utf8proc_iterate 2015-05-27 16:08:19 -04:00
Al
ad8e92182c [phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index 2015-05-27 16:06:35 -04:00
Al
897c29ccb8 [fix] transliterate.h 2015-05-27 16:04:18 -04:00
Al
17f88c3adc [utils] using unsigned ints in file_utils, adding doubles 2015-05-27 16:03:36 -04:00
Al
8ac8f83b7f [utils] changing signature of utf8proc_iterate_reversed so it takes the same arguments as utf8proc_iterate for function pointer purposes 2015-05-25 15:35:28 -04:00
Al
26ff3292d2 [fix] new script name, prefix result 2015-05-23 21:41:11 -04:00
Al
31cc2bb5d1 [fix] merging repeat codepoints in trie builder 2015-05-22 22:45:23 -04:00
Al
c00ecf6ea8 [fix] minimizing c* into (c|'')+, using empty transition instead of zero-length string 2015-05-22 18:11:54 -04:00
Al
b2d15b29cf [fix] greek_latin_ungegn => greek-latin-ungegn 2015-05-22 09:52:08 -04:00
Al
27171e068d [phrases] constant for NULL prefix results 2015-05-22 09:08:07 -04:00
Al
cb14e5eef1 [phrases] trie_get_prefix_from_index takes an optinal tail position 2015-05-21 06:16:14 -04:00
Al
91ccdf6f7b [phrases] trie_get_prefix_* methods return a struct including tail position 2015-05-21 05:38:21 -04:00
Al
395fbcb8b5 [fix] get_prefix on tries searches tail as well 2015-05-21 05:22:44 -04:00
Al
e84f3d93d2 [fix] get_prefix on tries searches tail as well 2015-05-20 20:57:14 -04:00
Al
c9ff3f278f [transliteration] new transform data file 2015-05-20 14:45:16 -04:00
Al
d65f7747f0 [transliteration] Adding html escapes as the first step in the Latin-ASCII transformation 2015-05-20 14:44:55 -04:00
Al
1fee0a3e35 [phrases] separating get_data_node from tail_match for tries 2015-05-20 13:51:04 -04:00
Al
bfb9aa21a1 [fix] unused var 2015-05-19 18:04:06 -04:00
Al
3d25378456 [transliteration] fixing a few warnings 2015-05-19 18:03:36 -04:00
Al
fdf988cb27 [phrases] adding a public get_data_node method for tries 2015-05-19 18:02:29 -04:00
Al
9d309ca9d3 [fix] moving constant 2015-05-18 14:25:21 -04:00
Al
eecee39904 [fix] giving constant trie node names more specificity 2015-05-18 14:24:39 -04:00
Al
c66f6f0fbe [transliteration] adding begin set token for regex character sets and fixing off-by-one in concatenated trie keys 2015-05-18 14:00:14 -04:00
Al
3c1e5c0471 [transliteration] new data file with the escaped German transliterations 2015-05-18 13:57:45 -04:00
Al
58571f70cc [utils] adding a boolean flag on string tree iterators for single path trees 2015-05-18 13:57:11 -04:00
Al
4694371cdc [fix] unicode escaping the German transliterations 2015-05-18 13:55:57 -04:00
Al
7eaa94d2fb [transliteration] new data file 2015-05-17 18:31:52 -04:00
Al
e25f039ee4 [transliteration] Escaped single quotes in rules + ignoring rules with codepoints > \uffff 2015-05-17 18:31:35 -04:00
Al
c39a19a352 [transliteration] New data file with the Greek/Katakana additins 2015-05-17 17:59:39 -04:00