Commit Graph

205 Commits

Author SHA1 Message Date
Al
ab802bc361 [numex] Changes to existing numex rules files. Adding Dutch, Japanese, Polish, Danish, Swedish and Finnish numex rules (priority based on frequency in OpenStreetMap) 2015-06-04 03:13:39 -04:00
Al
65abde908b [numex] New numex data file 2015-06-04 03:10:00 -04:00
Al
4c49f63caf [numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th 2015-06-04 03:09:39 -04:00
Al
3d95875a11 [phrases] trie_add_len 2015-06-04 02:41:48 -04:00
Al
fa784677f2 [phrases] trie_add_suffix_at_index method 2015-06-04 02:30:53 -04:00
Al
9bdf118423 [transliteration] Fix to transliteration in cases where the pre/post context doesn't match and we fall back to the no-context match 2015-06-03 22:58:29 -04:00
Al
48d2ca31c4 [transliteration] New ggenerated data file with the German/Scandinavian additions 2015-06-03 22:56:50 -04:00
Al
b2fe9d4db0 [transliteration] Adding uppercase umlauts and Scandinativan a-ring 2015-06-03 22:55:45 -04:00
Al
760714a234 [fix] warnings in transliterate.c 2015-06-03 19:29:35 -04:00
Al
7dcb4bf6f4 [numex] correct signature 2015-06-02 16:08:25 -04:00
Al
93d65d0186 [numex] numex table builder, fix to constant 2015-06-02 13:57:34 -04:00
Al
a44997c71c [fix] new generated numex data file 2015-06-02 13:45:06 -04:00
Al
2ea21dfffb [fix] constants 2015-06-02 13:44:25 -04:00
Al
2d5d854754 [fix] compilation/warnings 2015-06-02 13:43:55 -04:00
Al
208366af98 [fix] removing stopwords index 2015-06-02 12:43:48 -04:00
Al
49816382c1 [numex] New generated data file 2015-06-02 12:37:39 -04:00
Al
9d0d83bc14 [numex] adding stopword rules with the regular numex rules 2015-06-02 12:37:22 -04:00
Al
816a0408ab [numex] numex_rule.h 2015-06-02 12:30:56 -04:00
Al
8ef3a50b79 [numex] Initial generated numex data file 2015-06-02 12:28:28 -04:00
Al
4ad978f22c [numex] Using the new representation for generated data 2015-06-02 12:28:07 -04:00
Al
958c219b88 [utils] constants.h 2015-06-02 12:26:19 -04:00
Al
2dc870b3da [numex] Python script to generate numex data 2015-06-02 10:15:02 -04:00
Al
6b3d434c31 [fix] removing unnecessary definition 2015-06-01 17:13:57 -04:00
Al
9c935c9cc7 [fix] Base data dir path 2015-06-01 17:13:06 -04:00
Al
505456d9d2 [fix] removing unnecessary header 2015-06-01 17:12:33 -04:00
Al
080f382065 [numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated 2015-06-01 17:12:07 -04:00
Al
a20b768237 [numex] Russian numex rules (a start at least, might need a native speaker to review the RBNF transform in CLDR) 2015-06-01 17:08:57 -04:00
Al
05ffbffb23 [numex] Latin numex rules i.e. Roman numerals, used for most languages 2015-06-01 17:08:04 -04:00
Al
028bb5a1aa [numex] German numex rules 2015-06-01 17:07:35 -04:00
Al
9bd75cee23 [numex] Romance language numex rules (Spanish, French, Italian, Portuguese) 2015-06-01 17:07:23 -04:00
Al
99aed992da [numex] English numex rules 2015-06-01 17:06:53 -04:00
Al
920e15bd4d [numex] Adding numex setup/IO methods 2015-06-01 15:43:23 -04:00
Al
c0347a3431 [numex] numex header and structs 2015-06-01 15:41:34 -04:00
Al
b74fa0da99 [config] Adding config header 2015-06-01 15:40:59 -04:00
Al
93172bd16d [transliteration] New transliterator_scripts file 2015-05-31 02:09:28 -04:00
Al
0575984144 [transliteration] New data file 2015-05-31 02:08:26 -04:00
Al
6ac4ff6021 [transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin 2015-05-31 02:07:36 -04:00
Al
664d5e90db [fix] Removing the stub comment and a few more random comments 2015-05-29 20:10:44 -04:00
Al
06318a6fab [fix] logging code 2015-05-29 20:08:49 -04:00
Al
55568e9ffa [fix] Removing commented out section 2015-05-29 20:01:17 -04:00
Al
583cadd44f [transliteration] transliterate implementation from trie (need to build/save the tables first) 2015-05-29 19:59:45 -04:00
Al
6239c2fcfc [transliteration] regenerated data file including InterIndic-Latin dependency 2015-05-29 19:48:19 -04:00
Al
9547c93a38 [fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements 2015-05-29 19:47:49 -04:00
Al
8b56d63fde [fix] only count non-set chars in parse_groups 2015-05-29 19:42:05 -04:00
Al
a278cfd12c [transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence 2015-05-29 16:54:05 -04:00
Al
a9d5b91ac0 [transliteration] Not counting repeat character in group capture 2015-05-28 19:36:25 -04:00
Al
0177fd4b13 [fix] trie_search using proper length in utf8proc_iterate 2015-05-27 16:08:19 -04:00
Al
ad8e92182c [phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index 2015-05-27 16:06:35 -04:00
Al
897c29ccb8 [fix] transliterate.h 2015-05-27 16:04:18 -04:00
Al
17f88c3adc [utils] using unsigned ints in file_utils, adding doubles 2015-05-27 16:03:36 -04:00