Al
|
8520df96c8
|
[utils] utf8 comparison can handle a non-valid UTF-8 sequence e.g. for trie suffix comparison where we may be in the middle of a multi-byte character. Adding a standard utf8_common_prefix method
|
2015-06-12 16:11:40 -04:00 |
|
Al
|
5c2839e534
|
[numx] header and table builder changes to support whole words languages
|
2015-06-12 16:10:57 -04:00 |
|
Al
|
6b60446dbe
|
[phrases] no longer ignoring spaces in the input string, just trying different methods for hyphens, getting indexes right in the case where a space or hyphen precedes the match and backtracking on matches if the rest of the string falls off the trie
|
2015-06-12 11:30:24 -04:00 |
|
Al
|
3442b9ad92
|
[utils] require at least one non-space/non-hyphen match in utf8_common_prefix_len_ignore_separators
|
2015-06-12 11:19:37 -04:00 |
|
Al
|
6841ed8fb3
|
[phrases] Ignoring separators and dashes in trie_search_prefixes so it can be used for languages like German where numbers, phrases, etc. may just be concatenated together as a single token
|
2015-06-11 11:05:56 -04:00 |
|
Al
|
ab5ea6d791
|
[utils] Common prefix-style return value instead of a utf8 strcmp
|
2015-06-11 10:59:51 -04:00 |
|
Al
|
aad5f3edd3
|
[utils] UTF-8 lowercasing and string comparison, including a version which ignores dashes/spaces
|
2015-06-10 18:27:14 -04:00 |
|
Al
|
cb603562e0
|
[phrases] Adding *_from_index methods to trie_search
|
2015-06-09 11:14:42 -04:00 |
|
Al
|
81be8e771e
|
[numex] regen data file. utf8_is_hyphen requires a character, all other methods use category
|
2015-06-08 21:32:38 -04:00 |
|
Al
|
c1bed8b410
|
[numex] header changes
|
2015-06-08 21:29:36 -04:00 |
|
Al
|
fd1ebba720
|
[numex] Initial implementation of multilingual numeric expression parser
|
2015-06-08 21:29:04 -04:00 |
|
Al
|
6267b3a431
|
[numex] Adding numex phrase structure to the API
|
2015-06-07 23:56:24 -04:00 |
|
Al
|
06835d5c37
|
[utils] string_utils category functions take a category instead of a codepoint
|
2015-06-06 20:41:07 -04:00 |
|
Al
|
2856c2b401
|
[utils] string_utils category functions take a category instead of a codepoint
|
2015-06-05 16:55:21 -04:00 |
|
Al
|
3030dbe4be
|
[fix] transliteration states
|
2015-06-05 00:09:29 -04:00 |
|
Al
|
e32916f3df
|
[fix] closing file in numex table builder
|
2015-06-04 23:59:21 -04:00 |
|
Al
|
b244aa30f2
|
[numex] Setting numex_table to NULL during teardown, adding some logging
|
2015-06-04 23:57:52 -04:00 |
|
Al
|
3bd5172afd
|
[numex] Adding NUMEX_NULL_RULE at the first index
|
2015-06-04 17:21:44 -04:00 |
|
Al
|
3400a59e1c
|
[numex] adding a NUMEX_NULL_RULE
|
2015-06-04 17:21:16 -04:00 |
|
Al
|
95a4bb8e7c
|
[numex] teardown in numex table builder
|
2015-06-04 17:20:26 -04:00 |
|
Al
|
114b728f96
|
[fix] var
|
2015-06-04 17:18:05 -04:00 |
|
Al
|
528dd05983
|
[numex] Adding utf8_is_number_or_letter
|
2015-06-04 14:49:12 -04:00 |
|
Al
|
ca746304e3
|
[utils] Adding a few methods to string_utils for finding utf8proc category groups
|
2015-06-04 13:20:14 -04:00 |
|
Al
|
eac7a296ba
|
[numex] New numex data file including top 15 languages in OSM
|
2015-06-04 11:55:07 -04:00 |
|
Al
|
d98c535c52
|
[numex] Adding ordinal indicator to enum
|
2015-06-04 11:25:24 -04:00 |
|
Al
|
3cb8b2d297
|
[numex] trie builder adding a separate suffix-based namespace for looking up ordinal indicators
|
2015-06-04 03:17:03 -04:00 |
|
Al
|
7d3ef39463
|
[numex] struct/method changes for new ordinal indicators
|
2015-06-04 03:15:51 -04:00 |
|
Al
|
65abde908b
|
[numex] New numex data file
|
2015-06-04 03:10:00 -04:00 |
|
Al
|
3d95875a11
|
[phrases] trie_add_len
|
2015-06-04 02:41:48 -04:00 |
|
Al
|
fa784677f2
|
[phrases] trie_add_suffix_at_index method
|
2015-06-04 02:30:53 -04:00 |
|
Al
|
9bdf118423
|
[transliteration] Fix to transliteration in cases where the pre/post context doesn't match and we fall back to the no-context match
|
2015-06-03 22:58:29 -04:00 |
|
Al
|
48d2ca31c4
|
[transliteration] New ggenerated data file with the German/Scandinavian additions
|
2015-06-03 22:56:50 -04:00 |
|
Al
|
760714a234
|
[fix] warnings in transliterate.c
|
2015-06-03 19:29:35 -04:00 |
|
Al
|
7dcb4bf6f4
|
[numex] correct signature
|
2015-06-02 16:08:25 -04:00 |
|
Al
|
93d65d0186
|
[numex] numex table builder, fix to constant
|
2015-06-02 13:57:34 -04:00 |
|
Al
|
a44997c71c
|
[fix] new generated numex data file
|
2015-06-02 13:45:06 -04:00 |
|
Al
|
2d5d854754
|
[fix] compilation/warnings
|
2015-06-02 13:43:55 -04:00 |
|
Al
|
208366af98
|
[fix] removing stopwords index
|
2015-06-02 12:43:48 -04:00 |
|
Al
|
49816382c1
|
[numex] New generated data file
|
2015-06-02 12:37:39 -04:00 |
|
Al
|
9d0d83bc14
|
[numex] adding stopword rules with the regular numex rules
|
2015-06-02 12:37:22 -04:00 |
|
Al
|
816a0408ab
|
[numex] numex_rule.h
|
2015-06-02 12:30:56 -04:00 |
|
Al
|
8ef3a50b79
|
[numex] Initial generated numex data file
|
2015-06-02 12:28:28 -04:00 |
|
Al
|
4ad978f22c
|
[numex] Using the new representation for generated data
|
2015-06-02 12:28:07 -04:00 |
|
Al
|
958c219b88
|
[utils] constants.h
|
2015-06-02 12:26:19 -04:00 |
|
Al
|
505456d9d2
|
[fix] removing unnecessary header
|
2015-06-01 17:12:33 -04:00 |
|
Al
|
080f382065
|
[numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated
|
2015-06-01 17:12:07 -04:00 |
|
Al
|
920e15bd4d
|
[numex] Adding numex setup/IO methods
|
2015-06-01 15:43:23 -04:00 |
|
Al
|
c0347a3431
|
[numex] numex header and structs
|
2015-06-01 15:41:34 -04:00 |
|
Al
|
b74fa0da99
|
[config] Adding config header
|
2015-06-01 15:40:59 -04:00 |
|
Al
|
93172bd16d
|
[transliteration] New transliterator_scripts file
|
2015-05-31 02:09:28 -04:00 |
|