Al
|
90d4da9e72
|
[geodb] Adding an is_canonical bit field to geodb trie values
|
2015-07-28 19:08:24 -04:00 |
|
Al
|
9bc902f575
|
[numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization
|
2015-07-28 18:12:12 -04:00 |
|
Al
|
df1410da8c
|
[numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre
|
2015-07-28 18:11:23 -04:00 |
|
Al
|
a16f0dabcb
|
[numex] Fixing hyphen-initial numeric phrases that end the string
|
2015-07-28 03:28:44 -04:00 |
|
Al
|
3dc6115a4e
|
[dictionaries] Updates to English and Spanish dictionaries on looking through a data set of real test addresses
|
2015-07-27 16:42:09 -04:00 |
|
Al
|
0f5b69c06b
|
[fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node
|
2015-07-27 16:35:27 -04:00 |
|
Al
|
243f327928
|
[fix] NULL check
|
2015-07-27 16:32:01 -04:00 |
|
Al
|
7aee159c0c
|
[utils] string_tree_num_tokens
|
2015-07-27 12:36:34 -04:00 |
|
Al
|
b812d90c59
|
[fix] specifying numex dir with cross-platform PATH_SEPARATOR
|
2015-07-27 12:36:06 -04:00 |
|
Al
|
7ff9a6054d
|
[geodb] trim strings in geodb builder
|
2015-07-27 02:37:20 -04:00 |
|
Al
|
053b987d58
|
[normalize] adding an option for string trimming in normalize
|
2015-07-27 01:59:14 -04:00 |
|
Al
|
b94526a27b
|
[utils] Making string_trim handle all kinds of UTF-8 whitespace/separators
|
2015-07-27 01:55:46 -04:00 |
|
Al
|
eab4c554d6
|
[numex] Regenerating numex data file
|
2015-07-27 01:53:13 -04:00 |
|
Al
|
0ab1434f20
|
[numex] Making all languages except the ideographic writing systems (CJK) whole_tokens_only for numex. Otherwise non-number prefixes may accidentally get converted into numbers. May add some more options around this in the future.
|
2015-07-27 01:52:44 -04:00 |
|
Al
|
d2539f5b57
|
[numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals
|
2015-07-27 01:44:33 -04:00 |
|
Al
|
8ff4ace63b
|
[phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly
|
2015-07-26 23:41:57 -04:00 |
|
Al
|
38b10b9dd0
|
[fix] Clearing paths before reuse in geodb_builder
|
2015-07-26 23:36:34 -04:00 |
|
Al
|
93042761ac
|
[fix] warnings in string_utils.c
|
2015-07-26 23:36:03 -04:00 |
|
Al
|
50ee95ff7d
|
[geodb] Adding a msgpack'd list of ids for naked string keys in geodb builder
|
2015-07-25 18:42:13 -04:00 |
|
Al
|
a67ec44a08
|
[utils] cstring_array_terminate, moving msgpack_utils to separate file
|
2015-07-25 18:41:02 -04:00 |
|
Al
|
42f6be7434
|
[fix] county road
|
2015-07-25 14:19:38 -04:00 |
|
Al
|
2ff8c0fd1e
|
[transliteration] fixing length-based transliteration
|
2015-07-25 13:53:28 -04:00 |
|
Al
|
71ffdf9cbc
|
[expansion] tokenized version of search_address_dictionaries
|
2015-07-25 13:50:53 -04:00 |
|
Al
|
ee96dab93c
|
[fix] unnecessary headers
|
2015-07-25 13:49:42 -04:00 |
|
Al
|
e549e76806
|
[utils] string_tree_iterator_foreach_token
|
2015-07-25 13:49:02 -04:00 |
|
Al
|
2adaf475c2
|
[utils] cstring_array (contiguous) to array of malloc'd strings
|
2015-07-25 12:14:01 -04:00 |
|
Al
|
e9277d7339
|
[utils] vector extend method
|
2015-07-25 01:33:45 -04:00 |
|
Al
|
cdb9afddd3
|
[fix] address training data carriage returns
|
2015-07-25 00:35:27 -04:00 |
|
Al
|
9fb1eae877
|
[expansion] Regenerating address data file
|
2015-07-24 16:09:22 -04:00 |
|
Al
|
cff72a0cb3
|
[dictionaries] Adding a few versions of the phrase "centro commerical" in French, Spanish and Italian after a review of addresses in those languages
|
2015-07-24 16:07:34 -04:00 |
|
Al
|
351c7c8c2e
|
[expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace
|
2015-07-24 16:02:47 -04:00 |
|
Al
|
90a91cadd0
|
[search] Modifying trie_search_prefixes to use the new key schema
|
2015-07-24 15:59:49 -04:00 |
|
Al
|
bb7688d8d1
|
[phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc.
|
2015-07-24 15:56:19 -04:00 |
|
Al
|
359cd62e20
|
[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"
|
2015-07-24 15:31:05 -04:00 |
|
Al
|
12959aa483
|
[numex] Re-generating numex data
|
2015-07-24 15:24:03 -04:00 |
|
Al
|
5239c365d0
|
[docs] Adding some documentation for normalize.h options
|
2015-07-24 15:23:25 -04:00 |
|
Al
|
caf714f06f
|
[fix] typo and frivolous key
|
2015-07-24 15:22:57 -04:00 |
|
Al
|
87566bb6a5
|
[numex] Adding validation checks for numex JSON
|
2015-07-24 15:22:07 -04:00 |
|
Al
|
96538469dd
|
[utils] Adding a cstring_array_foreach macro
|
2015-07-23 15:57:12 -04:00 |
|
Al
|
27af28eacf
|
[expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array
|
2015-07-22 20:35:29 -04:00 |
|
Al
|
454be89121
|
[expansion] generated header and data files
|
2015-07-22 20:31:54 -04:00 |
|
Al
|
b27af13f8a
|
[expansion] Adding an array of dictionaries to each (phrase, canonical) pair
|
2015-07-22 20:24:14 -04:00 |
|
Al
|
0a9e92f11f
|
[expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary
|
2015-07-22 17:21:09 -04:00 |
|
Al
|
09004aa5f1
|
[expansion] Constant for the "all" dictionary
|
2015-07-22 17:18:19 -04:00 |
|
Al
|
f61d993157
|
[expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language
|
2015-07-22 03:51:28 -04:00 |
|
Al
|
3da4b5d8c2
|
[numex] New numex generated data file
|
2015-07-22 02:24:16 -04:00 |
|
Al
|
ba8ff2b0c6
|
[expansion] Language prefixed keys
|
2015-07-22 02:16:22 -04:00 |
|
Al
|
157727d249
|
[fix] method name, strlen and fclose
|
2015-07-22 02:15:45 -04:00 |
|
Al
|
64a63fdf51
|
[mv] Moving all repo data files to a resources dir, data is only for runtime files
|
2015-07-21 18:11:36 -04:00 |
|
Al
|
a38b924c5d
|
[fix] add_token_alternatives
|
2015-07-21 17:26:59 -04:00 |
|