1196 Commits

Author SHA1 Message Date
Al
3400a59e1c [numex] adding a NUMEX_NULL_RULE 2015-06-04 17:21:16 -04:00
Al
95a4bb8e7c [numex] teardown in numex table builder 2015-06-04 17:20:26 -04:00
Al
114b728f96 [fix] var 2015-06-04 17:18:05 -04:00
Al
528dd05983 [numex] Adding utf8_is_number_or_letter 2015-06-04 14:49:12 -04:00
Al
ca746304e3 [utils] Adding a few methods to string_utils for finding utf8proc category groups 2015-06-04 13:20:14 -04:00
Al
eac7a296ba [numex] New numex data file including top 15 languages in OSM 2015-06-04 11:55:07 -04:00
Al
d98c535c52 [numex] Adding ordinal indicator to enum 2015-06-04 11:25:24 -04:00
Al
3cb8b2d297 [numex] trie builder adding a separate suffix-based namespace for looking up ordinal indicators 2015-06-04 03:17:03 -04:00
Al
7d3ef39463 [numex] struct/method changes for new ordinal indicators 2015-06-04 03:15:51 -04:00
Al
65abde908b [numex] New numex data file 2015-06-04 03:10:00 -04:00
Al
3d95875a11 [phrases] trie_add_len 2015-06-04 02:41:48 -04:00
Al
fa784677f2 [phrases] trie_add_suffix_at_index method 2015-06-04 02:30:53 -04:00
Al
9bdf118423 [transliteration] Fix to transliteration in cases where the pre/post context doesn't match and we fall back to the no-context match 2015-06-03 22:58:29 -04:00
Al
48d2ca31c4 [transliteration] New ggenerated data file with the German/Scandinavian additions 2015-06-03 22:56:50 -04:00
Al
760714a234 [fix] warnings in transliterate.c 2015-06-03 19:29:35 -04:00
Al
7dcb4bf6f4 [numex] correct signature 2015-06-02 16:08:25 -04:00
Al
93d65d0186 [numex] numex table builder, fix to constant 2015-06-02 13:57:34 -04:00
Al
a44997c71c [fix] new generated numex data file 2015-06-02 13:45:06 -04:00
Al
2d5d854754 [fix] compilation/warnings 2015-06-02 13:43:55 -04:00
Al
208366af98 [fix] removing stopwords index 2015-06-02 12:43:48 -04:00
Al
49816382c1 [numex] New generated data file 2015-06-02 12:37:39 -04:00
Al
9d0d83bc14 [numex] adding stopword rules with the regular numex rules 2015-06-02 12:37:22 -04:00
Al
816a0408ab [numex] numex_rule.h 2015-06-02 12:30:56 -04:00
Al
8ef3a50b79 [numex] Initial generated numex data file 2015-06-02 12:28:28 -04:00
Al
4ad978f22c [numex] Using the new representation for generated data 2015-06-02 12:28:07 -04:00
Al
958c219b88 [utils] constants.h 2015-06-02 12:26:19 -04:00
Al
505456d9d2 [fix] removing unnecessary header 2015-06-01 17:12:33 -04:00
Al
080f382065 [numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated 2015-06-01 17:12:07 -04:00
Al
920e15bd4d [numex] Adding numex setup/IO methods 2015-06-01 15:43:23 -04:00
Al
c0347a3431 [numex] numex header and structs 2015-06-01 15:41:34 -04:00
Al
b74fa0da99 [config] Adding config header 2015-06-01 15:40:59 -04:00
Al
93172bd16d [transliteration] New transliterator_scripts file 2015-05-31 02:09:28 -04:00
Al
0575984144 [transliteration] New data file 2015-05-31 02:08:26 -04:00
Al
664d5e90db [fix] Removing the stub comment and a few more random comments 2015-05-29 20:10:44 -04:00
Al
06318a6fab [fix] logging code 2015-05-29 20:08:49 -04:00
Al
55568e9ffa [fix] Removing commented out section 2015-05-29 20:01:17 -04:00
Al
583cadd44f [transliteration] transliterate implementation from trie (need to build/save the tables first) 2015-05-29 19:59:45 -04:00
Al
6239c2fcfc [transliteration] regenerated data file including InterIndic-Latin dependency 2015-05-29 19:48:19 -04:00
Al
8b56d63fde [fix] only count non-set chars in parse_groups 2015-05-29 19:42:05 -04:00
Al
a278cfd12c [transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence 2015-05-29 16:54:05 -04:00
Al
a9d5b91ac0 [transliteration] Not counting repeat character in group capture 2015-05-28 19:36:25 -04:00
Al
0177fd4b13 [fix] trie_search using proper length in utf8proc_iterate 2015-05-27 16:08:19 -04:00
Al
ad8e92182c [phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index 2015-05-27 16:06:35 -04:00
Al
897c29ccb8 [fix] transliterate.h 2015-05-27 16:04:18 -04:00
Al
17f88c3adc [utils] using unsigned ints in file_utils, adding doubles 2015-05-27 16:03:36 -04:00
Al
8ac8f83b7f [utils] changing signature of utf8proc_iterate_reversed so it takes the same arguments as utf8proc_iterate for function pointer purposes 2015-05-25 15:35:28 -04:00
Al
26ff3292d2 [fix] new script name, prefix result 2015-05-23 21:41:11 -04:00
Al
31cc2bb5d1 [fix] merging repeat codepoints in trie builder 2015-05-22 22:45:23 -04:00
Al
c00ecf6ea8 [fix] minimizing c* into (c|'')+, using empty transition instead of zero-length string 2015-05-22 18:11:54 -04:00
Al
b2d15b29cf [fix] greek_latin_ungegn => greek-latin-ungegn 2015-05-22 09:52:08 -04:00