Al
|
b4fdc51bf9
|
[numex] changing is_roman_numeral to is_likely_roman_numeral to get rid of most of the false positives like \"La\" in Spanish which could be L(=50) + the ordinal suffix \"a\", but in practice it never means that. For Roman numerals that are shorter than two characters (whether on their own like "DC" or "MD", or attached to a potential ordinal suffix like \"Ce\" in French), will be ignored unless they're composed of more likely, smaller, Roman numerals: I, V, and X, so VI, IX, etc. are expanded as Roman numerals but LI is not.
|
2017-12-27 19:38:02 -05:00 |
|
Al
|
1a64ad682b
|
[merge] merging in the Ohio expansion numex changes from master
|
2017-11-29 11:51:43 -05:00 |
|
Al
|
ef098fd2e7
|
[numex] implementing the numex concat_only_if_number left context, which helps in the case of e.g. Columbus, OH in #271
|
2017-11-24 15:42:50 -05:00 |
|
Al
|
e38e57b8e8
|
[numex] fixing edge case where something like "IV Michael" could cause a partial Roman numeral to get added for the MI portion of "Michael"
|
2017-10-27 04:04:12 -04:00 |
|
Al
|
1fbc238b60
|
[numex] adding functions to parse and validate a Roman numeral
|
2017-10-20 02:45:32 -04:00 |
|
Al
|
9d2a111286
|
[numex] when parsing numex, bail on rules in whole_tokens_only languages if there are contiguous rules with no right context rules (example: something that wouldn't make sense like VL in Latin)
|
2017-10-20 02:34:30 -04:00 |
|
Al
|
97044f5a8b
|
[fix] 32-bit safety in numex table loading
|
2017-07-20 17:55:43 -04:00 |
|
Iestyn Pryce
|
73d27caeb9
|
Fix log_* formats which expect long long uint but receive uint64_t.
|
2017-05-21 10:57:20 +01:00 |
|
Al
|
f3adde746e
|
[numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token
|
2017-04-19 20:18:21 -04:00 |
|
Al
|
92051863ba
|
[numex] adding ordinal suffixes themselves to the numex trie so they can be removed from strings
|
2017-04-18 17:20:02 -04:00 |
|
Al
|
413c584f08
|
[fix] need to set prev_state to the NULL state in numex parsing after a non-space/non-hyphen is encountered and the previous match, if any, is added to the result array
|
2017-04-13 16:01:46 -04:00 |
|
Al
|
b464eb6c07
|
[numex] fix numex parsing when the spelled-out number is followed by a comma or other punctuation
|
2017-04-11 16:28:33 -04:00 |
|
Al
|
df89387b5c
|
[fix] calloc instead of malloc when performing initialization on structs that may fail halfway and need to clean up while partially initialized (calloc will set all the bytes to zero so the member pointers are NULL instead of garbage memory)
|
2017-01-13 18:30:04 -05:00 |
|
Al
|
0356b45069
|
[fix] Log errors in numex module if not loaded
|
2016-03-21 18:15:53 -04:00 |
|
Al
|
b5807926bc
|
[fix] Using PRId64 in all cases for int64_t printf formatting
|
2016-03-02 16:47:49 -05:00 |
|
Al
|
d35f97f6f1
|
[fix] All file_read_uint64 calls that use stack variables read into a uint64_t not a size_t so as not to smash the stack under a 32-bit arch (issue #18)
|
2016-02-29 22:36:00 -05:00 |
|
Federico Mena Quintero
|
2ae2450db7
|
[fix] Check the return of malloc() in numex.c
|
2016-02-25 14:53:27 -06:00 |
|
Al
|
98c395d34c
|
[numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four
|
2016-02-10 09:21:31 -05:00 |
|
Al
|
59cf5bfc62
|
[numex] Fixing cases with stopwords not attached to a numeric expression
|
2016-02-10 08:30:01 -05:00 |
|
Al
|
1e65fafaaf
|
[fix] char *
|
2016-01-30 13:39:36 -05:00 |
|
Al
|
f8de9d8e5a
|
[fix] static methods in numex table loading, mallocs instead of stack variables
|
2016-01-30 13:25:48 -05:00 |
|
Al
|
deeb8f007e
|
[fix] Check for result.len > 0 in false start continuation numex parsing, plus additional safety check during replacement
|
2015-12-24 02:26:53 -05:00 |
|
Al
|
2eea999692
|
[fix] Fixing false start continuations in numex parsing
|
2015-12-23 19:19:14 -05:00 |
|
Al
|
39e83961ef
|
[fix] Bug in suffix expansion affecting inseparable suffixes like burg as well as ordinal suffixes like first=>1st
|
2015-12-19 01:30:08 -05:00 |
|
Al
|
e0c0ed2d04
|
[numex] Return true if numex table already loaded
|
2015-12-15 14:28:40 -05:00 |
|
Al
|
1a1d74785c
|
[fix] Compiler warnings for casts/printf
|
2015-10-26 18:52:18 -04:00 |
|
Al
|
b11362ab98
|
[numex] using module init method for building, otherwise passing NULL path uses the default path
|
2015-09-16 21:13:05 -04:00 |
|
Al
|
e122824448
|
[expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language
|
2015-09-15 14:00:26 -04:00 |
|
Al
|
2eb67ad850
|
[phrases] trie_search_prefixes/trie_search_suffixes now take a length param
|
2015-08-09 02:01:37 -04:00 |
|
Al
|
5df9e123af
|
[numex] Fix to whole_tokens_only numeric experession parsing where numex was pushing a number onto the stack even on encountering a new rule context even though the token was not completely parsed
|
2015-08-08 20:49:54 -04:00 |
|
Al
|
df1410da8c
|
[numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre
|
2015-07-28 18:11:23 -04:00 |
|
Al
|
a16f0dabcb
|
[numex] Fixing hyphen-initial numeric phrases that end the string
|
2015-07-28 03:28:44 -04:00 |
|
Al
|
d2539f5b57
|
[numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals
|
2015-07-27 01:44:33 -04:00 |
|
Al
|
359cd62e20
|
[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"
|
2015-07-24 15:31:05 -04:00 |
|
Al
|
4fd4fa7dca
|
[fix] moving int string size constants to string_utils.h
|
2015-07-02 17:50:09 -04:00 |
|
Al
|
6a8ab48662
|
[numex] Adding method to get ordinal suffixes, using single representation
|
2015-06-25 17:28:06 -04:00 |
|
Al
|
5f5efad6ac
|
[numex] Working numex implemenation. Tested on most languages, Germanic, Latin/whole_tokens_only, English concatenated or with separators, French numerals like quatre-vignt-douze, Spanish multiple-token ordinals, Japanese numerals, etc. All looking good
|
2015-06-12 16:21:36 -04:00 |
|
Al
|
fd1ebba720
|
[numex] Initial implementation of multilingual numeric expression parser
|
2015-06-08 21:29:04 -04:00 |
|
Al
|
b244aa30f2
|
[numex] Setting numex_table to NULL during teardown, adding some logging
|
2015-06-04 23:57:52 -04:00 |
|
Al
|
3bd5172afd
|
[numex] Adding NUMEX_NULL_RULE at the first index
|
2015-06-04 17:21:44 -04:00 |
|
Al
|
7d3ef39463
|
[numex] struct/method changes for new ordinal indicators
|
2015-06-04 03:15:51 -04:00 |
|
Al
|
2d5d854754
|
[fix] compilation/warnings
|
2015-06-02 13:43:55 -04:00 |
|
Al
|
080f382065
|
[numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated
|
2015-06-01 17:12:07 -04:00 |
|
Al
|
920e15bd4d
|
[numex] Adding numex setup/IO methods
|
2015-06-01 15:43:23 -04:00 |
|