Al
|
a278cfd12c
|
[transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence
|
2015-05-29 16:54:05 -04:00 |
|
Al
|
a9d5b91ac0
|
[transliteration] Not counting repeat character in group capture
|
2015-05-28 19:36:25 -04:00 |
|
Al
|
0177fd4b13
|
[fix] trie_search using proper length in utf8proc_iterate
|
2015-05-27 16:08:19 -04:00 |
|
Al
|
ad8e92182c
|
[phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index
|
2015-05-27 16:06:35 -04:00 |
|
Al
|
897c29ccb8
|
[fix] transliterate.h
|
2015-05-27 16:04:18 -04:00 |
|
Al
|
17f88c3adc
|
[utils] using unsigned ints in file_utils, adding doubles
|
2015-05-27 16:03:36 -04:00 |
|
Al
|
8ac8f83b7f
|
[utils] changing signature of utf8proc_iterate_reversed so it takes the same arguments as utf8proc_iterate for function pointer purposes
|
2015-05-25 15:35:28 -04:00 |
|
Al
|
26ff3292d2
|
[fix] new script name, prefix result
|
2015-05-23 21:41:11 -04:00 |
|
Al
|
31cc2bb5d1
|
[fix] merging repeat codepoints in trie builder
|
2015-05-22 22:45:23 -04:00 |
|
Al
|
c00ecf6ea8
|
[fix] minimizing c* into (c|'')+, using empty transition instead of zero-length string
|
2015-05-22 18:11:54 -04:00 |
|
Al
|
b2d15b29cf
|
[fix] greek_latin_ungegn => greek-latin-ungegn
|
2015-05-22 09:52:08 -04:00 |
|
Al
|
27171e068d
|
[phrases] constant for NULL prefix results
|
2015-05-22 09:08:07 -04:00 |
|
Al
|
cb14e5eef1
|
[phrases] trie_get_prefix_from_index takes an optinal tail position
|
2015-05-21 06:16:14 -04:00 |
|
Al
|
91ccdf6f7b
|
[phrases] trie_get_prefix_* methods return a struct including tail position
|
2015-05-21 05:38:21 -04:00 |
|
Al
|
395fbcb8b5
|
[fix] get_prefix on tries searches tail as well
|
2015-05-21 05:22:44 -04:00 |
|
Al
|
e84f3d93d2
|
[fix] get_prefix on tries searches tail as well
|
2015-05-20 20:57:14 -04:00 |
|
Al
|
c9ff3f278f
|
[transliteration] new transform data file
|
2015-05-20 14:45:16 -04:00 |
|
Al
|
1fee0a3e35
|
[phrases] separating get_data_node from tail_match for tries
|
2015-05-20 13:51:04 -04:00 |
|
Al
|
bfb9aa21a1
|
[fix] unused var
|
2015-05-19 18:04:06 -04:00 |
|
Al
|
3d25378456
|
[transliteration] fixing a few warnings
|
2015-05-19 18:03:36 -04:00 |
|
Al
|
fdf988cb27
|
[phrases] adding a public get_data_node method for tries
|
2015-05-19 18:02:29 -04:00 |
|
Al
|
9d309ca9d3
|
[fix] moving constant
|
2015-05-18 14:25:21 -04:00 |
|
Al
|
eecee39904
|
[fix] giving constant trie node names more specificity
|
2015-05-18 14:24:39 -04:00 |
|
Al
|
c66f6f0fbe
|
[transliteration] adding begin set token for regex character sets and fixing off-by-one in concatenated trie keys
|
2015-05-18 14:00:14 -04:00 |
|
Al
|
3c1e5c0471
|
[transliteration] new data file with the escaped German transliterations
|
2015-05-18 13:57:45 -04:00 |
|
Al
|
58571f70cc
|
[utils] adding a boolean flag on string tree iterators for single path trees
|
2015-05-18 13:57:11 -04:00 |
|
Al
|
7eaa94d2fb
|
[transliteration] new data file
|
2015-05-17 18:31:52 -04:00 |
|
Al
|
c39a19a352
|
[transliteration] New data file with the Greek/Katakana additins
|
2015-05-17 17:59:39 -04:00 |
|
Al
|
30db201e8a
|
[fix] NUM_CHARS => NUM_CODEPOINTS
|
2015-05-17 13:53:19 -04:00 |
|
Al
|
1348cc8906
|
[transliteration] Switching the begin/end set chars
|
2015-05-17 12:02:46 -04:00 |
|
Al
|
f1cfb30209
|
[transliteration] generated scripts file
|
2015-05-17 00:00:14 -04:00 |
|
Al
|
b983a83a89
|
[transliteration] transliteration struct definitions, memory allocaiton, builder methods and I/O, stubbing transliterate method for the moment
|
2015-05-16 23:23:25 -04:00 |
|
Al
|
3a74a8c179
|
[transliteration] script to build transliteration table, trie, C structures, etc. from the rules
|
2015-05-16 23:22:16 -04:00 |
|
Al
|
65624c8985
|
[fix] vector_*_pop returns the element
|
2015-05-16 23:20:28 -04:00 |
|
Al
|
4a67294fbf
|
[phrases] adding get_prefix methods for tries, remove add_nodes_only, fixing a few things and inlining a few functions
|
2015-05-16 23:19:59 -04:00 |
|
Al
|
e8fdd4564d
|
[utils] adding string_tree for listing sets of token alternatives and string_tree_iterator to generate permutations over the strings, needed for transliteration and ambiguous address elements/place names
|
2015-05-16 23:16:10 -04:00 |
|
Al
|
f151a2232c
|
[transliteration] new transliteration rules data file
|
2015-05-16 23:14:47 -04:00 |
|
Al
|
5983cb6af0
|
[i18n] Adding NUM_SCRIPTS to the end of the scripts enum
|
2015-05-16 12:19:40 -04:00 |
|
Al
|
8699409f15
|
[transliteration] resulting data file
|
2015-05-14 16:34:49 -04:00 |
|
Al
|
2d49369e78
|
[utils] Adding read/write for 64-bit ints to file_utils
|
2015-05-13 17:51:03 -04:00 |
|
Al
|
6898f8ecd9
|
[transliteration] Adding context types back to transtlieration rule struct since they don't matter in the actual transliteration table
|
2015-05-13 16:51:07 -04:00 |
|
Al
|
b777b60e07
|
[transliteration] new data file
|
2015-05-13 16:21:16 -04:00 |
|
Al
|
cbe83376f2
|
[transliteration] Adding new, even smaller, generated data file
|
2015-05-12 18:58:38 -04:00 |
|
Al
|
0984fb9ea4
|
[transliteration] new, more compact transliteration data file
|
2015-05-12 12:13:11 -04:00 |
|
Al
|
2a69488f9b
|
[fix] for transliteration rules, allowing the parsing of set differencees and arbitrarily nested character set expressions, using non-NUL byte for the empty transition. Adding resulting data file.
|
2015-05-08 17:14:26 -04:00 |
|
Al
|
10ebaf147a
|
[transliteration] literal ^ and $ escaped
|
2015-05-01 19:16:36 -04:00 |
|
Al
|
ff851a464c
|
[fix] escaping curly braces for regex compilation
|
2015-04-30 13:27:17 -04:00 |
|
Al
|
fa43abd8d9
|
[transliteration] For ruleset steps in transliteration, the name is just the step number, which can be appended to the trie as part of the key
|
2015-04-29 14:31:15 -04:00 |
|
Al
|
1c25238af7
|
[fix] string lengths on the various transliteration rules
|
2015-04-27 13:51:35 -04:00 |
|
Al
|
1373843b86
|
[fix] setting last_node in tokenized trie search in the case where a prefix phrase matches but the longer string doesn't.
|
2015-04-27 01:49:08 -04:00 |
|