Commit Graph

29 Commits

Author SHA1 Message Date
Al
283be99b44 [numex] helper function to retrieve ordinal suffix lengths from a tokenized string for use in deduping 2018-02-24 00:31:26 -05:00
Al
b4fdc51bf9 [numex] changing is_roman_numeral to is_likely_roman_numeral to get rid of most of the false positives like \"La\" in Spanish which could be L(=50) + the ordinal suffix \"a\", but in practice it never means that. For Roman numerals that are shorter than two characters (whether on their own like "DC" or "MD", or attached to a potential ordinal suffix like \"Ce\" in French), will be ignored unless they're composed of more likely, smaller, Roman numerals: I, V, and X, so VI, IX, etc. are expanded as Roman numerals but LI is not. 2017-12-27 19:38:02 -05:00
Al
1a64ad682b [merge] merging in the Ohio expansion numex changes from master 2017-11-29 11:51:43 -05:00
Al
c276cf1529 [numex] adding a new type of left context for numeric expressions called conat_only_if_number (for something like "oh" which can be "Columbus, OH" or something like "Twenty-One Oh One" 2017-11-24 15:36:53 -05:00
Al
1fbc238b60 [numex] adding functions to parse and validate a Roman numeral 2017-10-20 02:45:32 -04:00
Al
f3adde746e [numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token 2017-04-19 20:18:21 -04:00
Al
92051863ba [numex] adding ordinal suffixes themselves to the numex trie so they can be removed from strings 2017-04-18 17:20:02 -04:00
Al
a3506131fe [build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime 2017-01-09 16:11:26 -05:00
Al
b11362ab98 [numex] using module init method for building, otherwise passing NULL path uses the default path 2015-09-16 21:13:05 -04:00
Al
bbaa302e2e [fix] NUMEX_STOPWORD_RULE define 2015-08-09 01:03:23 -04:00
Al
9b69d1f67a [fix] Removing C++ checks from all but the main API functions 2015-08-07 17:15:39 -04:00
Al
359a1efb03 [fix] Adding stdint.h include to most of the header files for portability 2015-08-07 02:43:44 -04:00
Al
61d586fa1d [config] config.h=>libpostal_config.h so as not to conflict with autoconf 2015-08-06 17:50:55 -04:00
Al
9bc902f575 [numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization 2015-07-28 18:12:12 -04:00
Al
b812d90c59 [fix] specifying numex dir with cross-platform PATH_SEPARATOR 2015-07-27 12:36:06 -04:00
Al
359cd62e20 [numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty" 2015-07-24 15:31:05 -04:00
Al
6a8ab48662 [numex] Adding method to get ordinal suffixes, using single representation 2015-06-25 17:28:06 -04:00
Al
5c2839e534 [numx] header and table builder changes to support whole words languages 2015-06-12 16:10:57 -04:00
Al
c1bed8b410 [numex] header changes 2015-06-08 21:29:36 -04:00
Al
6267b3a431 [numex] Adding numex phrase structure to the API 2015-06-07 23:56:24 -04:00
Al
3400a59e1c [numex] adding a NUMEX_NULL_RULE 2015-06-04 17:21:16 -04:00
Al
d98c535c52 [numex] Adding ordinal indicator to enum 2015-06-04 11:25:24 -04:00
Al
7d3ef39463 [numex] struct/method changes for new ordinal indicators 2015-06-04 03:15:51 -04:00
Al
7dcb4bf6f4 [numex] correct signature 2015-06-02 16:08:25 -04:00
Al
93d65d0186 [numex] numex table builder, fix to constant 2015-06-02 13:57:34 -04:00
Al
2d5d854754 [fix] compilation/warnings 2015-06-02 13:43:55 -04:00
Al
4ad978f22c [numex] Using the new representation for generated data 2015-06-02 12:28:07 -04:00
Al
080f382065 [numex] Removing concatenated property from language struct as all numeric spellouts might be concatenated 2015-06-01 17:12:07 -04:00
Al
c0347a3431 [numex] numex header and structs 2015-06-01 15:41:34 -04:00