Al
|
ab802bc361
|
[numex] Changes to existing numex rules files. Adding Dutch, Japanese, Polish, Danish, Swedish and Finnish numex rules (priority based on frequency in OpenStreetMap)
|
2015-06-04 03:13:39 -04:00 |
|
Al
|
65abde908b
|
[numex] New numex data file
|
2015-06-04 03:10:00 -04:00 |
|
Al
|
4c49f63caf
|
[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th
|
2015-06-04 03:09:39 -04:00 |
|
Al
|
3d95875a11
|
[phrases] trie_add_len
|
2015-06-04 02:41:48 -04:00 |
|
Al
|
fa784677f2
|
[phrases] trie_add_suffix_at_index method
|
2015-06-04 02:30:53 -04:00 |
|
Al
|
9bdf118423
|
[transliteration] Fix to transliteration in cases where the pre/post context doesn't match and we fall back to the no-context match
|
2015-06-03 22:58:29 -04:00 |
|
Al
|
48d2ca31c4
|
[transliteration] New ggenerated data file with the German/Scandinavian additions
|
2015-06-03 22:56:50 -04:00 |
|
Al
|
b2fe9d4db0
|
[transliteration] Adding uppercase umlauts and Scandinativan a-ring
|
2015-06-03 22:55:45 -04:00 |
|
Al
|
760714a234
|
[fix] warnings in transliterate.c
|
2015-06-03 19:29:35 -04:00 |
|
Al
|
7dcb4bf6f4
|
[numex] correct signature
|
2015-06-02 16:08:25 -04:00 |
|
Al
|
93d65d0186
|
[numex] numex table builder, fix to constant
|
2015-06-02 13:57:34 -04:00 |
|
Al
|
a44997c71c
|
[fix] new generated numex data file
|
2015-06-02 13:45:06 -04:00 |
|
Al
|
2ea21dfffb
|
[fix] constants
|
2015-06-02 13:44:25 -04:00 |
|
Al
|
2d5d854754
|
[fix] compilation/warnings
|
2015-06-02 13:43:55 -04:00 |
|
Al
|
208366af98
|
[fix] removing stopwords index
|
2015-06-02 12:43:48 -04:00 |
|
Al
|
49816382c1
|
[numex] New generated data file
|
2015-06-02 12:37:39 -04:00 |
|
Al
|
9d0d83bc14
|
[numex] adding stopword rules with the regular numex rules
|
2015-06-02 12:37:22 -04:00 |
|
Al
|
816a0408ab
|
[numex] numex_rule.h
|
2015-06-02 12:30:56 -04:00 |
|
Al
|
8ef3a50b79
|
[numex] Initial generated numex data file
|
2015-06-02 12:28:28 -04:00 |
|
Al
|
4ad978f22c
|
[numex] Using the new representation for generated data
|
2015-06-02 12:28:07 -04:00 |
|
Al
|
958c219b88
|
[utils] constants.h
|
2015-06-02 12:26:19 -04:00 |
|
Al
|
2dc870b3da
|
[numex] Python script to generate numex data
|
2015-06-02 10:15:02 -04:00 |
|
Al
|
6b3d434c31
|
[fix] removing unnecessary definition
|
2015-06-01 17:13:57 -04:00 |
|
Al
|
9c935c9cc7
|
[fix] Base data dir path
|
2015-06-01 17:13:06 -04:00 |
|
Al
|
505456d9d2
|
[fix] removing unnecessary header
|
2015-06-01 17:12:33 -04:00 |
|
Al
|
080f382065
|
[numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated
|
2015-06-01 17:12:07 -04:00 |
|
Al
|
a20b768237
|
[numex] Russian numex rules (a start at least, might need a native speaker to review the RBNF transform in CLDR)
|
2015-06-01 17:08:57 -04:00 |
|
Al
|
05ffbffb23
|
[numex] Latin numex rules i.e. Roman numerals, used for most languages
|
2015-06-01 17:08:04 -04:00 |
|
Al
|
028bb5a1aa
|
[numex] German numex rules
|
2015-06-01 17:07:35 -04:00 |
|
Al
|
9bd75cee23
|
[numex] Romance language numex rules (Spanish, French, Italian, Portuguese)
|
2015-06-01 17:07:23 -04:00 |
|
Al
|
99aed992da
|
[numex] English numex rules
|
2015-06-01 17:06:53 -04:00 |
|
Al
|
920e15bd4d
|
[numex] Adding numex setup/IO methods
|
2015-06-01 15:43:23 -04:00 |
|
Al
|
c0347a3431
|
[numex] numex header and structs
|
2015-06-01 15:41:34 -04:00 |
|
Al
|
b74fa0da99
|
[config] Adding config header
|
2015-06-01 15:40:59 -04:00 |
|
Al
|
93172bd16d
|
[transliteration] New transliterator_scripts file
|
2015-05-31 02:09:28 -04:00 |
|
Al
|
0575984144
|
[transliteration] New data file
|
2015-05-31 02:08:26 -04:00 |
|
Al
|
6ac4ff6021
|
[transliteration] Adding reverse/bidirectional transforms e.g. for Katakana-Latin
|
2015-05-31 02:07:36 -04:00 |
|
Al
|
664d5e90db
|
[fix] Removing the stub comment and a few more random comments
|
2015-05-29 20:10:44 -04:00 |
|
Al
|
06318a6fab
|
[fix] logging code
|
2015-05-29 20:08:49 -04:00 |
|
Al
|
55568e9ffa
|
[fix] Removing commented out section
|
2015-05-29 20:01:17 -04:00 |
|
Al
|
583cadd44f
|
[transliteration] transliterate implementation from trie (need to build/save the tables first)
|
2015-05-29 19:59:45 -04:00 |
|
Al
|
6239c2fcfc
|
[transliteration] regenerated data file including InterIndic-Latin dependency
|
2015-05-29 19:48:19 -04:00 |
|
Al
|
9547c93a38
|
[fix] InterIndic-Latin is an internal transliterator, but needed for most of the Indic languages. Also fixing the string lengths for HTML entity replacements
|
2015-05-29 19:47:49 -04:00 |
|
Al
|
8b56d63fde
|
[fix] only count non-set chars in parse_groups
|
2015-05-29 19:42:05 -04:00 |
|
Al
|
a278cfd12c
|
[transliteration] Using revisit strings instead of keeping a backtrack count so we don't have to later map logical characters to the actual string, removing any duplicate keys in the table builder so that if any rules happen to overlap within a step, the first will take precedence
|
2015-05-29 16:54:05 -04:00 |
|
Al
|
a9d5b91ac0
|
[transliteration] Not counting repeat character in group capture
|
2015-05-28 19:36:25 -04:00 |
|
Al
|
0177fd4b13
|
[fix] trie_search using proper length in utf8proc_iterate
|
2015-05-27 16:08:19 -04:00 |
|
Al
|
ad8e92182c
|
[phrases] trie I/O using the uint APIs, fixes to trie_get_prefix_result_from_index
|
2015-05-27 16:06:35 -04:00 |
|
Al
|
897c29ccb8
|
[fix] transliterate.h
|
2015-05-27 16:04:18 -04:00 |
|
Al
|
17f88c3adc
|
[utils] using unsigned ints in file_utils, adding doubles
|
2015-05-27 16:03:36 -04:00 |
|