Al
|
cb14e5eef1
|
[phrases] trie_get_prefix_from_index takes an optinal tail position
|
2015-05-21 06:16:14 -04:00 |
|
Al
|
91ccdf6f7b
|
[phrases] trie_get_prefix_* methods return a struct including tail position
|
2015-05-21 05:38:21 -04:00 |
|
Al
|
395fbcb8b5
|
[fix] get_prefix on tries searches tail as well
|
2015-05-21 05:22:44 -04:00 |
|
Al
|
e84f3d93d2
|
[fix] get_prefix on tries searches tail as well
|
2015-05-20 20:57:14 -04:00 |
|
Al
|
c9ff3f278f
|
[transliteration] new transform data file
|
2015-05-20 14:45:16 -04:00 |
|
Al
|
1fee0a3e35
|
[phrases] separating get_data_node from tail_match for tries
|
2015-05-20 13:51:04 -04:00 |
|
Al
|
bfb9aa21a1
|
[fix] unused var
|
2015-05-19 18:04:06 -04:00 |
|
Al
|
3d25378456
|
[transliteration] fixing a few warnings
|
2015-05-19 18:03:36 -04:00 |
|
Al
|
fdf988cb27
|
[phrases] adding a public get_data_node method for tries
|
2015-05-19 18:02:29 -04:00 |
|
Al
|
9d309ca9d3
|
[fix] moving constant
|
2015-05-18 14:25:21 -04:00 |
|
Al
|
eecee39904
|
[fix] giving constant trie node names more specificity
|
2015-05-18 14:24:39 -04:00 |
|
Al
|
c66f6f0fbe
|
[transliteration] adding begin set token for regex character sets and fixing off-by-one in concatenated trie keys
|
2015-05-18 14:00:14 -04:00 |
|
Al
|
3c1e5c0471
|
[transliteration] new data file with the escaped German transliterations
|
2015-05-18 13:57:45 -04:00 |
|
Al
|
58571f70cc
|
[utils] adding a boolean flag on string tree iterators for single path trees
|
2015-05-18 13:57:11 -04:00 |
|
Al
|
7eaa94d2fb
|
[transliteration] new data file
|
2015-05-17 18:31:52 -04:00 |
|
Al
|
c39a19a352
|
[transliteration] New data file with the Greek/Katakana additins
|
2015-05-17 17:59:39 -04:00 |
|
Al
|
30db201e8a
|
[fix] NUM_CHARS => NUM_CODEPOINTS
|
2015-05-17 13:53:19 -04:00 |
|
Al
|
1348cc8906
|
[transliteration] Switching the begin/end set chars
|
2015-05-17 12:02:46 -04:00 |
|
Al
|
f1cfb30209
|
[transliteration] generated scripts file
|
2015-05-17 00:00:14 -04:00 |
|
Al
|
b983a83a89
|
[transliteration] transliteration struct definitions, memory allocaiton, builder methods and I/O, stubbing transliterate method for the moment
|
2015-05-16 23:23:25 -04:00 |
|
Al
|
3a74a8c179
|
[transliteration] script to build transliteration table, trie, C structures, etc. from the rules
|
2015-05-16 23:22:16 -04:00 |
|
Al
|
65624c8985
|
[fix] vector_*_pop returns the element
|
2015-05-16 23:20:28 -04:00 |
|
Al
|
4a67294fbf
|
[phrases] adding get_prefix methods for tries, remove add_nodes_only, fixing a few things and inlining a few functions
|
2015-05-16 23:19:59 -04:00 |
|
Al
|
e8fdd4564d
|
[utils] adding string_tree for listing sets of token alternatives and string_tree_iterator to generate permutations over the strings, needed for transliteration and ambiguous address elements/place names
|
2015-05-16 23:16:10 -04:00 |
|
Al
|
f151a2232c
|
[transliteration] new transliteration rules data file
|
2015-05-16 23:14:47 -04:00 |
|
Al
|
5983cb6af0
|
[i18n] Adding NUM_SCRIPTS to the end of the scripts enum
|
2015-05-16 12:19:40 -04:00 |
|
Al
|
8699409f15
|
[transliteration] resulting data file
|
2015-05-14 16:34:49 -04:00 |
|
Al
|
2d49369e78
|
[utils] Adding read/write for 64-bit ints to file_utils
|
2015-05-13 17:51:03 -04:00 |
|
Al
|
6898f8ecd9
|
[transliteration] Adding context types back to transtlieration rule struct since they don't matter in the actual transliteration table
|
2015-05-13 16:51:07 -04:00 |
|
Al
|
b777b60e07
|
[transliteration] new data file
|
2015-05-13 16:21:16 -04:00 |
|
Al
|
cbe83376f2
|
[transliteration] Adding new, even smaller, generated data file
|
2015-05-12 18:58:38 -04:00 |
|
Al
|
0984fb9ea4
|
[transliteration] new, more compact transliteration data file
|
2015-05-12 12:13:11 -04:00 |
|
Al
|
2a69488f9b
|
[fix] for transliteration rules, allowing the parsing of set differencees and arbitrarily nested character set expressions, using non-NUL byte for the empty transition. Adding resulting data file.
|
2015-05-08 17:14:26 -04:00 |
|
Al
|
10ebaf147a
|
[transliteration] literal ^ and $ escaped
|
2015-05-01 19:16:36 -04:00 |
|
Al
|
ff851a464c
|
[fix] escaping curly braces for regex compilation
|
2015-04-30 13:27:17 -04:00 |
|
Al
|
fa43abd8d9
|
[transliteration] For ruleset steps in transliteration, the name is just the step number, which can be appended to the trie as part of the key
|
2015-04-29 14:31:15 -04:00 |
|
Al
|
1c25238af7
|
[fix] string lengths on the various transliteration rules
|
2015-04-27 13:51:35 -04:00 |
|
Al
|
1373843b86
|
[fix] setting last_node in tokenized trie search in the case where a prefix phrase matches but the longer string doesn't.
|
2015-04-27 01:49:08 -04:00 |
|
Al
|
b2ba629f95
|
[fix] trie_get methods just return node index rather than data value
|
2015-04-27 01:28:05 -04:00 |
|
Al
|
8fb9bacfa6
|
[phrases] New trie_add_nodes_only method for concatenating strings to the trie, plus boolean return values on trie_add_* APIs
|
2015-04-27 01:01:43 -04:00 |
|
Al
|
8bc77372ef
|
[phrases] exposing trie_add_at_index and trie_get_from_index for more control in the transliteration tries
|
2015-04-26 22:24:02 -04:00 |
|
Al
|
6ebea11640
|
[transliteration] fixing transliteration rules, fixing escape characters, adding sizes to all the strings as they may have null characters
|
2015-04-26 19:47:54 -04:00 |
|
Al
|
ff9b6735f8
|
[transliteration] Adding header + generated C data file for simplified transliteration rules
|
2015-04-25 15:44:36 -04:00 |
|
Al
|
1b33744956
|
[tokenization] Numeric tokens must end in number or letter
|
2015-04-22 14:55:18 -04:00 |
|
Al
|
9c0126a01c
|
[utils] two set types in collections.h
|
2015-04-19 09:32:53 -04:00 |
|
Al
|
908e3dc03c
|
[phrases] trie_search now only takes the original string and the token array. Fixed a bug where certain phrases were being found in string search but not in tokenized search
|
2015-04-19 09:32:20 -04:00 |
|
Al
|
606a669c01
|
[tokenization] breaking dashes or double hyphens break a word while other dashes don't
|
2015-04-17 19:14:42 -04:00 |
|
Al
|
6718182443
|
[tokenization] non-breaking dashes can be mid-word, em-dashes, etc. break words
|
2015-04-17 15:21:22 -04:00 |
|
Al
|
e21873635c
|
[utils] Using token offsets to calculate lengths for contiguous string arrays, inlining a few functions
|
2015-04-15 20:17:03 -04:00 |
|
Al
|
e241c1dfc8
|
[rm] Removing dependency on sds, char_array and cstring_array have similar benefits/functionality with fewer drawbacks
|
2015-04-12 18:07:33 -04:00 |
|