Commit Graph

207 Commits

Author SHA1 Message Date
Al
a278cfd12c [transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence 2015-05-29 16:54:05 -04:00
Al
a9d5b91ac0 [transliteration] Not counting repeat character in group capture 2015-05-28 19:36:25 -04:00
Al
0177fd4b13 [fix] trie_search using proper length in utf8proc_iterate 2015-05-27 16:08:19 -04:00
Al
ad8e92182c [phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index 2015-05-27 16:06:35 -04:00
Al
897c29ccb8 [fix] transliterate.h 2015-05-27 16:04:18 -04:00
Al
17f88c3adc [utils] using unsigned ints in file_utils, adding doubles 2015-05-27 16:03:36 -04:00
Al
8ac8f83b7f [utils] changing signature of utf8proc_iterate_reversed so it takes the same arguments as utf8proc_iterate for function pointer purposes 2015-05-25 15:35:28 -04:00
Al
26ff3292d2 [fix] new script name, prefix result 2015-05-23 21:41:11 -04:00
Al
31cc2bb5d1 [fix] merging repeat codepoints in trie builder 2015-05-22 22:45:23 -04:00
Al
c00ecf6ea8 [fix] minimizing c* into (c|'')+, using empty transition instead of zero-length string 2015-05-22 18:11:54 -04:00
Al
b2d15b29cf [fix] greek_latin_ungegn => greek-latin-ungegn 2015-05-22 09:52:08 -04:00
Al
27171e068d [phrases] constant for NULL prefix results 2015-05-22 09:08:07 -04:00
Al
cb14e5eef1 [phrases] trie_get_prefix_from_index takes an optinal tail position 2015-05-21 06:16:14 -04:00
Al
91ccdf6f7b [phrases] trie_get_prefix_* methods return a struct including tail position 2015-05-21 05:38:21 -04:00
Al
395fbcb8b5 [fix] get_prefix on tries searches tail as well 2015-05-21 05:22:44 -04:00
Al
e84f3d93d2 [fix] get_prefix on tries searches tail as well 2015-05-20 20:57:14 -04:00
Al
c9ff3f278f [transliteration] new transform data file 2015-05-20 14:45:16 -04:00
Al
1fee0a3e35 [phrases] separating get_data_node from tail_match for tries 2015-05-20 13:51:04 -04:00
Al
bfb9aa21a1 [fix] unused var 2015-05-19 18:04:06 -04:00
Al
3d25378456 [transliteration] fixing a few warnings 2015-05-19 18:03:36 -04:00
Al
fdf988cb27 [phrases] adding a public get_data_node method for tries 2015-05-19 18:02:29 -04:00
Al
9d309ca9d3 [fix] moving constant 2015-05-18 14:25:21 -04:00
Al
eecee39904 [fix] giving constant trie node names more specificity 2015-05-18 14:24:39 -04:00
Al
c66f6f0fbe [transliteration] adding begin set token for regex character sets and fixing off-by-one in concatenated trie keys 2015-05-18 14:00:14 -04:00
Al
3c1e5c0471 [transliteration] new data file with the escaped German transliterations 2015-05-18 13:57:45 -04:00
Al
58571f70cc [utils] adding a boolean flag on string tree iterators for single path trees 2015-05-18 13:57:11 -04:00
Al
7eaa94d2fb [transliteration] new data file 2015-05-17 18:31:52 -04:00
Al
c39a19a352 [transliteration] New data file with the Greek/Katakana additins 2015-05-17 17:59:39 -04:00
Al
30db201e8a [fix] NUM_CHARS => NUM_CODEPOINTS 2015-05-17 13:53:19 -04:00
Al
1348cc8906 [transliteration] Switching the begin/end set chars 2015-05-17 12:02:46 -04:00
Al
f1cfb30209 [transliteration] generated scripts file 2015-05-17 00:00:14 -04:00
Al
b983a83a89 [transliteration] transliteration struct definitions, memory allocaiton, builder methods and I/O, stubbing transliterate method for the moment 2015-05-16 23:23:25 -04:00
Al
3a74a8c179 [transliteration] script to build transliteration table, trie, C structures, etc. from the rules 2015-05-16 23:22:16 -04:00
Al
65624c8985 [fix] vector_*_pop returns the element 2015-05-16 23:20:28 -04:00
Al
4a67294fbf [phrases] adding get_prefix methods for tries, remove add_nodes_only, fixing a few things and inlining a few functions 2015-05-16 23:19:59 -04:00
Al
e8fdd4564d [utils] adding string_tree for listing sets of token alternatives and string_tree_iterator to generate permutations over the strings, needed for transliteration and ambiguous address elements/place names 2015-05-16 23:16:10 -04:00
Al
f151a2232c [transliteration] new transliteration rules data file 2015-05-16 23:14:47 -04:00
Al
5983cb6af0 [i18n] Adding NUM_SCRIPTS to the end of the scripts enum 2015-05-16 12:19:40 -04:00
Al
8699409f15 [transliteration] resulting data file 2015-05-14 16:34:49 -04:00
Al
2d49369e78 [utils] Adding read/write for 64-bit ints to file_utils 2015-05-13 17:51:03 -04:00
Al
6898f8ecd9 [transliteration] Adding context types back to transtlieration rule struct since they don't matter in the actual transliteration table 2015-05-13 16:51:07 -04:00
Al
b777b60e07 [transliteration] new data file 2015-05-13 16:21:16 -04:00
Al
cbe83376f2 [transliteration] Adding new, even smaller, generated data file 2015-05-12 18:58:38 -04:00
Al
0984fb9ea4 [transliteration] new, more compact transliteration data file 2015-05-12 12:13:11 -04:00
Al
2a69488f9b [fix] for transliteration rules, allowing the parsing of set differencees and arbitrarily nested character set expressions, using non-NUL byte for the empty transition. Adding resulting data file. 2015-05-08 17:14:26 -04:00
Al
10ebaf147a [transliteration] literal ^ and $ escaped 2015-05-01 19:16:36 -04:00
Al
ff851a464c [fix] escaping curly braces for regex compilation 2015-04-30 13:27:17 -04:00
Al
fa43abd8d9 [transliteration] For ruleset steps in transliteration, the name is just the step number, which can be appended to the trie as part of the key 2015-04-29 14:31:15 -04:00
Al
1c25238af7 [fix] string lengths on the various transliteration rules 2015-04-27 13:51:35 -04:00
Al
1373843b86 [fix] setting last_node in tokenized trie search in the case where a prefix phrase matches but the longer string doesn't. 2015-04-27 01:49:08 -04:00