Al
|
8520df96c8
|
[utils] utf8 comparison can handle a non-valid UTF-8 sequence e.g. for trie suffix comparison where we may be in the middle of a multi-byte character. Adding a standard utf8_common_prefix method
|
2015-06-12 16:11:40 -04:00 |
|
Al
|
5c2839e534
|
[numx] header and table builder changes to support whole words languages
|
2015-06-12 16:10:57 -04:00 |
|
Al
|
1c4657b631
|
[numex] Setting Latin to whole_words_only
|
2015-06-12 16:10:07 -04:00 |
|
Al
|
fc735bb5c3
|
[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500
|
2015-06-12 16:09:45 -04:00 |
|
Al
|
6b60446dbe
|
[phrases] no longer ignoring spaces in the input string, just trying different methods for hyphens, getting indexes right in the case where a space or hyphen precedes the match and backtracking on matches if the rest of the string falls off the trie
|
2015-06-12 11:30:24 -04:00 |
|
Al
|
3442b9ad92
|
[utils] require at least one non-space/non-hyphen match in utf8_common_prefix_len_ignore_separators
|
2015-06-12 11:19:37 -04:00 |
|
Al
|
6841ed8fb3
|
[phrases] Ignoring separators and dashes in trie_search_prefixes so it can be used for languages like German where numbers, phrases, etc. may just be concatenated together as a single token
|
2015-06-11 11:05:56 -04:00 |
|
Al
|
ab5ea6d791
|
[utils] Common prefix-style return value instead of a utf8 strcmp
|
2015-06-11 10:59:51 -04:00 |
|
Al
|
aad5f3edd3
|
[utils] UTF-8 lowercasing and string comparison, including a version which ignores dashes/spaces
|
2015-06-10 18:27:14 -04:00 |
|
Al
|
cb603562e0
|
[phrases] Adding *_from_index methods to trie_search
|
2015-06-09 11:14:42 -04:00 |
|
Al
|
81be8e771e
|
[numex] regen data file. utf8_is_hyphen requires a character, all other methods use category
|
2015-06-08 21:32:38 -04:00 |
|
Al
|
c1d0afa52c
|
[fix] additional French numex
|
2015-06-08 21:30:32 -04:00 |
|
Al
|
c1bed8b410
|
[numex] header changes
|
2015-06-08 21:29:36 -04:00 |
|
Al
|
fd1ebba720
|
[numex] Initial implementation of multilingual numeric expression parser
|
2015-06-08 21:29:04 -04:00 |
|
Al
|
6267b3a431
|
[numex] Adding numex phrase structure to the API
|
2015-06-07 23:56:24 -04:00 |
|
Al
|
06835d5c37
|
[utils] string_utils category functions take a category instead of a codepoint
|
2015-06-06 20:41:07 -04:00 |
|
Al
|
fc250724e1
|
[numex] tercera=>3ra
|
2015-06-06 20:39:57 -04:00 |
|
Al
|
7c613a068f
|
[dictionaries] English dictionary updates
|
2015-06-06 20:39:27 -04:00 |
|
Al
|
2856c2b401
|
[utils] string_utils category functions take a category instead of a codepoint
|
2015-06-05 16:55:21 -04:00 |
|
Al
|
3030dbe4be
|
[fix] transliteration states
|
2015-06-05 00:09:29 -04:00 |
|
Al
|
e32916f3df
|
[fix] closing file in numex table builder
|
2015-06-04 23:59:21 -04:00 |
|
Al
|
b244aa30f2
|
[numex] Setting numex_table to NULL during teardown, adding some logging
|
2015-06-04 23:57:52 -04:00 |
|
Al
|
3bd5172afd
|
[numex] Adding NUMEX_NULL_RULE at the first index
|
2015-06-04 17:21:44 -04:00 |
|
Al
|
3400a59e1c
|
[numex] adding a NUMEX_NULL_RULE
|
2015-06-04 17:21:16 -04:00 |
|
Al
|
95a4bb8e7c
|
[numex] teardown in numex table builder
|
2015-06-04 17:20:26 -04:00 |
|
Al
|
114b728f96
|
[fix] var
|
2015-06-04 17:18:05 -04:00 |
|
Al
|
528dd05983
|
[numex] Adding utf8_is_number_or_letter
|
2015-06-04 14:49:12 -04:00 |
|
Al
|
ca746304e3
|
[utils] Adding a few methods to string_utils for finding utf8proc category groups
|
2015-06-04 13:20:14 -04:00 |
|
Al
|
eac7a296ba
|
[numex] New numex data file including top 15 languages in OSM
|
2015-06-04 11:55:07 -04:00 |
|
Al
|
6470cbe467
|
[numex] Catalan and Chinese numex rules converted from RBNF, now covering top 15 languages in OSM addresses
|
2015-06-04 11:53:43 -04:00 |
|
Al
|
e2c8c08772
|
[numex] 1era for Spanish feminine ordinal indicator
|
2015-06-04 11:52:50 -04:00 |
|
Al
|
0429db3507
|
[numex] Adding ordinal indicator type for Japanese
|
2015-06-04 11:52:25 -04:00 |
|
Al
|
d98c535c52
|
[numex] Adding ordinal indicator to enum
|
2015-06-04 11:25:24 -04:00 |
|
Al
|
2d098fdab6
|
[numex] Adding ordinal_indicator rule type for CJK ordinals
|
2015-06-04 11:24:13 -04:00 |
|
Al
|
3cb8b2d297
|
[numex] trie builder adding a separate suffix-based namespace for looking up ordinal indicators
|
2015-06-04 03:17:03 -04:00 |
|
Al
|
7d3ef39463
|
[numex] struct/method changes for new ordinal indicators
|
2015-06-04 03:15:51 -04:00 |
|
Al
|
ab802bc361
|
[numex] Changes to existing numex rules files. Adding Dutch, Japanese, Polish, Danish, Swedish and Finnish numex rules (priority based on frequency in OpenStreetMap)
|
2015-06-04 03:13:39 -04:00 |
|
Al
|
65abde908b
|
[numex] New numex data file
|
2015-06-04 03:10:00 -04:00 |
|
Al
|
4c49f63caf
|
[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th
|
2015-06-04 03:09:39 -04:00 |
|
Al
|
3d95875a11
|
[phrases] trie_add_len
|
2015-06-04 02:41:48 -04:00 |
|
Al
|
fa784677f2
|
[phrases] trie_add_suffix_at_index method
|
2015-06-04 02:30:53 -04:00 |
|
Al
|
9bdf118423
|
[transliteration] Fix to transliteration in cases where the pre/post context doesn't match and we fall back to the no-context match
|
2015-06-03 22:58:29 -04:00 |
|
Al
|
48d2ca31c4
|
[transliteration] New ggenerated data file with the German/Scandinavian additions
|
2015-06-03 22:56:50 -04:00 |
|
Al
|
b2fe9d4db0
|
[transliteration] Adding uppercase umlauts and Scandinativan a-ring
|
2015-06-03 22:55:45 -04:00 |
|
Al
|
760714a234
|
[fix] warnings in transliterate.c
|
2015-06-03 19:29:35 -04:00 |
|
Al
|
7dcb4bf6f4
|
[numex] correct signature
|
2015-06-02 16:08:25 -04:00 |
|
Al
|
93d65d0186
|
[numex] numex table builder, fix to constant
|
2015-06-02 13:57:34 -04:00 |
|
Al
|
a44997c71c
|
[fix] new generated numex data file
|
2015-06-02 13:45:06 -04:00 |
|
Al
|
2ea21dfffb
|
[fix] constants
|
2015-06-02 13:44:25 -04:00 |
|
Al
|
2d5d854754
|
[fix] compilation/warnings
|
2015-06-02 13:43:55 -04:00 |
|