Al
|
27171e068d
|
[phrases] constant for NULL prefix results
|
2015-05-22 09:08:07 -04:00 |
|
Al
|
cb14e5eef1
|
[phrases] trie_get_prefix_from_index takes an optinal tail position
|
2015-05-21 06:16:14 -04:00 |
|
Al
|
91ccdf6f7b
|
[phrases] trie_get_prefix_* methods return a struct including tail position
|
2015-05-21 05:38:21 -04:00 |
|
Al
|
395fbcb8b5
|
[fix] get_prefix on tries searches tail as well
|
2015-05-21 05:22:44 -04:00 |
|
Al
|
e84f3d93d2
|
[fix] get_prefix on tries searches tail as well
|
2015-05-20 20:57:14 -04:00 |
|
Al
|
c9ff3f278f
|
[transliteration] new transform data file
|
2015-05-20 14:45:16 -04:00 |
|
Al
|
d65f7747f0
|
[transliteration] Adding html escapes as the first step in the Latin-ASCII transformation
|
2015-05-20 14:44:55 -04:00 |
|
Al
|
1fee0a3e35
|
[phrases] separating get_data_node from tail_match for tries
|
2015-05-20 13:51:04 -04:00 |
|
Al
|
bfb9aa21a1
|
[fix] unused var
|
2015-05-19 18:04:06 -04:00 |
|
Al
|
3d25378456
|
[transliteration] fixing a few warnings
|
2015-05-19 18:03:36 -04:00 |
|
Al
|
fdf988cb27
|
[phrases] adding a public get_data_node method for tries
|
2015-05-19 18:02:29 -04:00 |
|
Al
|
9d309ca9d3
|
[fix] moving constant
|
2015-05-18 14:25:21 -04:00 |
|
Al
|
eecee39904
|
[fix] giving constant trie node names more specificity
|
2015-05-18 14:24:39 -04:00 |
|
Al
|
c66f6f0fbe
|
[transliteration] adding begin set token for regex character sets and fixing off-by-one in concatenated trie keys
|
2015-05-18 14:00:14 -04:00 |
|
Al
|
3c1e5c0471
|
[transliteration] new data file with the escaped German transliterations
|
2015-05-18 13:57:45 -04:00 |
|
Al
|
58571f70cc
|
[utils] adding a boolean flag on string tree iterators for single path trees
|
2015-05-18 13:57:11 -04:00 |
|
Al
|
4694371cdc
|
[fix] unicode escaping the German transliterations
|
2015-05-18 13:55:57 -04:00 |
|
Al
|
7eaa94d2fb
|
[transliteration] new data file
|
2015-05-17 18:31:52 -04:00 |
|
Al
|
e25f039ee4
|
[transliteration] Escaped single quotes in rules + ignoring rules with codepoints > \uffff
|
2015-05-17 18:31:35 -04:00 |
|
Al
|
c39a19a352
|
[transliteration] New data file with the Greek/Katakana additins
|
2015-05-17 17:59:39 -04:00 |
|
Al
|
d72348d47e
|
[transliteratin] Using a restricted set of diacritical marks relevant to Greek, variants stand in for transliterator dependencies e.g. use Katakana-Latin-BGN if Katakana-Latin cannot be found
|
2015-05-17 17:42:37 -04:00 |
|
Al
|
30db201e8a
|
[fix] NUM_CHARS => NUM_CODEPOINTS
|
2015-05-17 13:53:19 -04:00 |
|
Al
|
1348cc8906
|
[transliteration] Switching the begin/end set chars
|
2015-05-17 12:02:46 -04:00 |
|
Al
|
f1cfb30209
|
[transliteration] generated scripts file
|
2015-05-17 00:00:14 -04:00 |
|
Al
|
b983a83a89
|
[transliteration] transliteration struct definitions, memory allocaiton, builder methods and I/O, stubbing transliterate method for the moment
|
2015-05-16 23:23:25 -04:00 |
|
Al
|
3a74a8c179
|
[transliteration] script to build transliteration table, trie, C structures, etc. from the rules
|
2015-05-16 23:22:16 -04:00 |
|
Al
|
65624c8985
|
[fix] vector_*_pop returns the element
|
2015-05-16 23:20:28 -04:00 |
|
Al
|
4a67294fbf
|
[phrases] adding get_prefix methods for tries, remove add_nodes_only, fixing a few things and inlining a few functions
|
2015-05-16 23:19:59 -04:00 |
|
Al
|
e8fdd4564d
|
[utils] adding string_tree for listing sets of token alternatives and string_tree_iterator to generate permutations over the strings, needed for transliteration and ambiguous address elements/place names
|
2015-05-16 23:16:10 -04:00 |
|
Al
|
f151a2232c
|
[transliteration] new transliteration rules data file
|
2015-05-16 23:14:47 -04:00 |
|
Al
|
99115fa53c
|
[transliteration] converting one of the more complicated and frequently used rules to its utf8proc equivalent, adding better support for escaped unicode characters and set differences, generating a header file indicating which unicode script/language pairs warrant various transliterators.
|
2015-05-16 23:13:01 -04:00 |
|
Al
|
5983cb6af0
|
[i18n] Adding NUM_SCRIPTS to the end of the scripts enum
|
2015-05-16 12:19:40 -04:00 |
|
Al
|
8699409f15
|
[transliteration] resulting data file
|
2015-05-14 16:34:49 -04:00 |
|
Al
|
1f3ac0c3f9
|
[transliteration] using a proper lexer on the entire rule to correct some parses, allowing bracketed multiple characters in sets, fixing optionals
|
2015-05-14 16:34:03 -04:00 |
|
Al
|
2d49369e78
|
[utils] Adding read/write for 64-bit ints to file_utils
|
2015-05-13 17:51:03 -04:00 |
|
Al
|
6898f8ecd9
|
[transliteration] Adding context types back to transtlieration rule struct since they don't matter in the actual transliteration table
|
2015-05-13 16:51:07 -04:00 |
|
Al
|
b777b60e07
|
[transliteration] new data file
|
2015-05-13 16:21:16 -04:00 |
|
Al
|
304dc9525a
|
[transliteration] fixing variable assignments, literal wide characters (for narrow Python builds), ignoring rules related to spaced Han
|
2015-05-13 16:20:52 -04:00 |
|
Al
|
cbe83376f2
|
[transliteration] Adding new, even smaller, generated data file
|
2015-05-12 18:58:38 -04:00 |
|
Al
|
5bbf71ccbb
|
[transliteration] Using breadth-first search for tracking dependencies between transforms, removing Han-Spacedhan since our tokenizer does the equivalent already
|
2015-05-12 18:57:57 -04:00 |
|
Al
|
b55db5fcda
|
[fix] usage text
|
2015-05-12 12:15:51 -04:00 |
|
Al
|
d5f9d8a29a
|
[mv] unicode_scripts => unicode_properties
|
2015-05-12 12:14:59 -04:00 |
|
Al
|
0984fb9ea4
|
[transliteration] new, more compact transliteration data file
|
2015-05-12 12:13:11 -04:00 |
|
Al
|
ff0e7cb9e1
|
[i18n] downloading several files from the Unicode Character Database
|
2015-05-12 12:12:17 -04:00 |
|
Al
|
3814af52ec
|
[transliteration] Python script now implements the full TR-35 spec, including filter rules, which cuts down significantly on the size of the data file and complexity of generating the trie
|
2015-05-12 12:10:15 -04:00 |
|
Al
|
fe044cebef
|
[transliteration] char set mapping for some of the more complicated sets found in CLDR
|
2015-05-10 18:34:53 -04:00 |
|
Al
|
2a69488f9b
|
[fix] for transliteration rules, allowing the parsing of set differencees and arbitrarily nested character set expressions, using non-NUL byte for the empty transition. Adding resulting data file.
|
2015-05-08 17:14:26 -04:00 |
|
Al
|
10ebaf147a
|
[transliteration] literal ^ and $ escaped
|
2015-05-01 19:16:36 -04:00 |
|
Al
|
ff851a464c
|
[fix] escaping curly braces for regex compilation
|
2015-04-30 13:27:17 -04:00 |
|
Al
|
fa43abd8d9
|
[transliteration] For ruleset steps in transliteration, the name is just the step number, which can be appended to the trie as part of the key
|
2015-04-29 14:31:15 -04:00 |
|