Al
|
71ffdf9cbc
|
[expansion] tokenized version of search_address_dictionaries
|
2015-07-25 13:50:53 -04:00 |
|
Al
|
ee96dab93c
|
[fix] unnecessary headers
|
2015-07-25 13:49:42 -04:00 |
|
Al
|
e549e76806
|
[utils] string_tree_iterator_foreach_token
|
2015-07-25 13:49:02 -04:00 |
|
Al
|
2adaf475c2
|
[utils] cstring_array (contiguous) to array of malloc'd strings
|
2015-07-25 12:14:01 -04:00 |
|
Al
|
e9277d7339
|
[utils] vector extend method
|
2015-07-25 01:33:45 -04:00 |
|
Al
|
9fb1eae877
|
[expansion] Regenerating address data file
|
2015-07-24 16:09:22 -04:00 |
|
Al
|
351c7c8c2e
|
[expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace
|
2015-07-24 16:02:47 -04:00 |
|
Al
|
90a91cadd0
|
[search] Modifying trie_search_prefixes to use the new key schema
|
2015-07-24 15:59:49 -04:00 |
|
Al
|
bb7688d8d1
|
[phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc.
|
2015-07-24 15:56:19 -04:00 |
|
Al
|
359cd62e20
|
[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"
|
2015-07-24 15:31:05 -04:00 |
|
Al
|
12959aa483
|
[numex] Re-generating numex data
|
2015-07-24 15:24:03 -04:00 |
|
Al
|
5239c365d0
|
[docs] Adding some documentation for normalize.h options
|
2015-07-24 15:23:25 -04:00 |
|
Al
|
96538469dd
|
[utils] Adding a cstring_array_foreach macro
|
2015-07-23 15:57:12 -04:00 |
|
Al
|
27af28eacf
|
[expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array
|
2015-07-22 20:35:29 -04:00 |
|
Al
|
454be89121
|
[expansion] generated header and data files
|
2015-07-22 20:31:54 -04:00 |
|
Al
|
0a9e92f11f
|
[expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary
|
2015-07-22 17:21:09 -04:00 |
|
Al
|
09004aa5f1
|
[expansion] Constant for the "all" dictionary
|
2015-07-22 17:18:19 -04:00 |
|
Al
|
f61d993157
|
[expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language
|
2015-07-22 03:51:28 -04:00 |
|
Al
|
3da4b5d8c2
|
[numex] New numex generated data file
|
2015-07-22 02:24:16 -04:00 |
|
Al
|
ba8ff2b0c6
|
[expansion] Language prefixed keys
|
2015-07-22 02:16:22 -04:00 |
|
Al
|
157727d249
|
[fix] method name, strlen and fclose
|
2015-07-22 02:15:45 -04:00 |
|
Al
|
a38b924c5d
|
[fix] add_token_alternatives
|
2015-07-21 17:26:59 -04:00 |
|
Al
|
71be52275d
|
[tokenization] Adding a version which of tokenize which keeps whitespace tokens
|
2015-07-21 17:26:20 -04:00 |
|
Al
|
5d21cb1604
|
[expansion] Address dictionary builder
|
2015-07-21 16:46:57 -04:00 |
|
Al
|
6eccde0df8
|
[fix] trie_set_data_at_index
|
2015-07-21 16:46:38 -04:00 |
|
Al
|
c798876b3d
|
[expansion] Address dictionary allocation, I/O, get/set
|
2015-07-21 16:46:15 -04:00 |
|
Al
|
3509b203f8
|
[gazetteers] Moving data out of the header file
|
2015-07-21 16:06:49 -04:00 |
|
Al
|
179918917a
|
[fix] header guard and include
|
2015-07-21 15:38:45 -04:00 |
|
Al
|
f99a90d64e
|
[expansion] Generated data file for address expansions
|
2015-07-21 15:38:10 -04:00 |
|
Al
|
68a6d8ee33
|
[fix] return NULL from transliterator_read on failure
|
2015-07-21 00:58:01 -04:00 |
|
Al
|
9360ff2c4b
|
[geodb] geodb_builder using new trie_get/set_data_at_index methds
|
2015-07-20 16:53:48 -04:00 |
|
Al
|
9374745140
|
[fix] var name and placement
|
2015-07-20 16:53:19 -04:00 |
|
Al
|
9f697e0256
|
[transliteration] transliterate now using the new trie_get_data_at_index API
|
2015-07-20 16:47:56 -04:00 |
|
Al
|
7f96726e82
|
[phrases] Adding trie_get_data/trie_set_data + at_index methods
|
2015-07-20 16:39:58 -04:00 |
|
Al
|
b9771921fc
|
[fix] Path joins in geodb_builder use new char_array methods
|
2015-07-20 16:31:43 -04:00 |
|
Al
|
d55d505329
|
[phrases] trie_get_data and trie_set_data interface for simpler dictionary-style trie get/set
|
2015-07-20 16:29:48 -04:00 |
|
Al
|
1d7247d7e1
|
[polygons] Adding Belgium regional languages
|
2015-07-17 00:53:25 -04:00 |
|
Al
|
5f2be3022b
|
[expansion] dictionary_type_t enum instead of uint64_t
|
2015-07-16 03:49:37 -04:00 |
|
Al
|
f713c53993
|
[utils] Adding an option to char_array_add_joined to strip separators for path manipulation
|
2015-07-16 03:49:00 -04:00 |
|
Al
|
f181c04e7a
|
[expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space)
|
2015-07-16 02:49:53 -04:00 |
|
Al
|
a8b2fb5b90
|
[tokenization] Regenerating scanner file
|
2015-07-14 18:16:24 -04:00 |
|
Al
|
43293d0ae3
|
[tokenization] Fixing a tokenization where mid-number characters appear in the middle of a word+numeric sequence e.g. Zigor,2 should be 3 separate tokens. Sequences like 35,37,39 are still treated as a single token for the moment.
|
2015-07-14 18:15:58 -04:00 |
|
Al
|
a9967ec9bd
|
[numex] Regenerating numex file
|
2015-07-13 01:16:39 -04:00 |
|
Al
|
86fe289320
|
[numex] Re-generated numex data file
|
2015-07-13 00:56:48 -04:00 |
|
Al
|
fbef0a15fe
|
[geodb] Adding sparkey dependency
|
2015-07-09 15:26:11 -04:00 |
|
Al
|
4f1b4756d0
|
[geodb] Adding builder program (requires 11GB disk space and ~4GB RAM to build, but only ~300MB RAM to use after building)
|
2015-07-09 15:25:29 -04:00 |
|
Al
|
8889a5c0c3
|
[geodb] GeoDB memory allocation and I/O
|
2015-07-09 15:01:06 -04:00 |
|
Al
|
2d5641892a
|
[config] lower Bloom filter error rate
|
2015-07-09 14:59:23 -04:00 |
|
Al
|
20c6436e6d
|
[geodisambig] Return success if admin1/admin2 IDs are 0
|
2015-07-09 04:19:49 -04:00 |
|
Al
|
20303ad94f
|
[geohash] Adding bounds checks from python-geohash
|
2015-07-09 04:13:53 -04:00 |
|