Al
|
b5807926bc
|
[fix] Using PRId64 in all cases for int64_t printf formatting
|
2016-03-02 16:47:49 -05:00 |
|
Al
|
72fa6c0a6c
|
[fix] numex_table builder program using new API (heap-allocated strings)
|
2016-03-02 16:28:28 -05:00 |
|
Al
|
999a9e24cb
|
[numex] Regenerating numex_data.c
|
2016-03-02 16:11:09 -05:00 |
|
Al
|
d1f62ddc63
|
[dictionaries] Re-generating address_expansion_data.c
|
2016-03-02 16:10:30 -05:00 |
|
Al
|
1ddc69d984
|
[fix] var declaration during trie creation
|
2016-03-02 16:05:32 -05:00 |
|
Al
|
122397759d
|
[dictionaries] Re-generating address_expansion_data.c
|
2016-03-01 18:37:40 -05:00 |
|
Al
|
d35f97f6f1
|
[fix] All file_read_uint64 calls that use stack variables read into a uint64_t not a size_t so as not to smash the stack under a 32-bit arch (issue #18)
|
2016-02-29 22:36:00 -05:00 |
|
Federico Mena Quintero
|
4eac38c40c
|
[fix] Check the return of malloc() in geonames.c
|
2016-02-25 14:53:31 -06:00 |
|
Federico Mena Quintero
|
2ae2450db7
|
[fix] Check the return of malloc() in numex.c
|
2016-02-25 14:53:27 -06:00 |
|
Federico Mena Quintero
|
b172071d3b
|
[fix] Remove superfluous #define; the caller actually uses sizeof(DEFAULT_ALPHABET) itself
|
2016-02-25 14:53:27 -06:00 |
|
Federico Mena Quintero
|
10c6768b5b
|
[fix] Don't leak the trie if the number of nodes can't be read from a file
|
2016-02-25 14:53:27 -06:00 |
|
Federico Mena Quintero
|
e60ad47677
|
[fix] Check return of malloc() in trie.c
|
2016-02-25 14:53:22 -06:00 |
|
Al
|
87cf63942e
|
[dictionaries] Regenerating address_expansion_data.c
|
2016-02-22 18:39:38 -05:00 |
|
Al
|
37c09d1ed9
|
[api] Adding function to free expansions from expand_address
|
2016-02-16 10:56:45 -05:00 |
|
Al
|
98165e89ad
|
[api] Using bools instead of bit fields in the public API
|
2016-02-15 18:33:39 -05:00 |
|
Al
|
cf2a79bef1
|
[api] Default options accessible through getters, not static structs
|
2016-02-15 17:34:00 -05:00 |
|
Al
|
98c395d34c
|
[numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four
|
2016-02-10 09:21:31 -05:00 |
|
Al
|
59cf5bfc62
|
[numex] Fixing cases with stopwords not attached to a numeric expression
|
2016-02-10 08:30:01 -05:00 |
|
Al
|
c32ef9ccf8
|
[fix] freeing up iterator in normalize_string
|
2016-02-09 01:06:51 -05:00 |
|
Al
|
12c2477359
|
[phrases] Another fix to tail token search
|
2016-02-08 17:55:21 -05:00 |
|
Al
|
39f162b029
|
[phrases] fix in tokenized tail search when whitespace tokens are preserved
|
2016-02-08 16:37:52 -05:00 |
|
Al
|
84d5ba18f0
|
[api] Fixing multi-language expansions with overlapping expansions, whitespace, utf8 normalization of canonical strings
|
2016-02-08 02:50:34 -05:00 |
|
Al
|
0695738253
|
[fix] cleaning up memory in normalize_string_languages
|
2016-02-08 02:43:12 -05:00 |
|
Al
|
afd5844f21
|
[normalize] Permuting transliterators only once on the entire string rather than at each script break (so # permutations is bounded and can't get huge). Fixing some spacing issues. Adding method to check for an alpha+numeric token in normalization.
|
2016-02-08 01:16:47 -05:00 |
|
Al
|
aaad213a20
|
[cli] Adding printf while models are being loaded in address parser cli
|
2016-02-08 01:10:06 -05:00 |
|
Al
|
9ac0379a65
|
[phrases] Case where trie search finds a match, makes progress beyond the next token but has to fall back. Adding trie search test case
|
2016-02-08 01:07:56 -05:00 |
|
Al
|
3701d8380f
|
[cli] Command-line expansion client now supports piping in stdin, Unix-style
|
2016-02-03 13:48:51 -05:00 |
|
Al Barrentine
|
7536fa4647
|
[fix] static inline
|
2016-02-02 00:53:13 -05:00 |
|
Al
|
c0b548833b
|
[fix] create data dir if it doesn't exist
|
2016-01-30 13:40:10 -05:00 |
|
Al
|
1e65fafaaf
|
[fix] char *
|
2016-01-30 13:39:36 -05:00 |
|
Al
|
f8de9d8e5a
|
[fix] static methods in numex table loading, mallocs instead of stack variables
|
2016-01-30 13:25:48 -05:00 |
|
Al
|
085bfd6ada
|
[fix] static methods for libpostal.c
|
2016-01-30 02:20:59 -05:00 |
|
Al
|
63d239eef0
|
[tokenization] Using the new re2c 0.16 generates a 75% smaller DFA for scanner, should speed up compile times on gcc
|
2016-01-30 02:20:01 -05:00 |
|
Al
|
9b3296914a
|
[build] Defining LIBPOSTAL_DATA_DIR at compile time, not configure
|
2016-01-30 02:18:12 -05:00 |
|
Al
|
cd76c660d8
|
[fix] French numex
|
2016-01-28 16:40:50 -05:00 |
|
Al
|
95a7978131
|
[build] Adding relevant language_classifier sources to build
|
2016-01-27 03:34:35 -05:00 |
|
Al
|
93ed2bf15b
|
[api] Making language optional in libpostal cli
|
2016-01-27 03:32:29 -05:00 |
|
Al
|
789db8f582
|
[build] Adding language classifier to data file download script. As the current file is rather large, added multipart downloads from S3 to speed things up
|
2016-01-27 03:31:45 -05:00 |
|
Al
|
42d169feee
|
[api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding!
|
2016-01-27 03:23:51 -05:00 |
|
Al
|
71c51f2e45
|
[language_classification] Making directory optional on language_classifier client/test program
|
2016-01-27 03:18:53 -05:00 |
|
Al
|
c770468d03
|
[expansion] Regenerated address_expansion_data.c
|
2016-01-27 03:17:59 -05:00 |
|
Al
|
36f52d9707
|
[fix] Removing feature printing
|
2016-01-26 15:34:56 -05:00 |
|
Al
|
5077462754
|
[fix] temporary files for language classifier training
|
2016-01-26 01:42:21 -05:00 |
|
Al
|
426edccbf8
|
[language_classification] Simple accuracy-based test program for language classifier.
|
2016-01-26 01:29:56 -05:00 |
|
Al
|
9abbf42bf4
|
[language_classifier] Command-line client for language classification
|
2016-01-26 01:20:59 -05:00 |
|
Al
|
314b65e192
|
[build] Adding shuffle.c to language_classifier_train
|
2016-01-26 01:18:35 -05:00 |
|
Al
|
ababb8f2d0
|
[fix] sign comparison in regularized gradient computation for logistic regression
|
2016-01-26 01:16:16 -05:00 |
|
Al
|
ae2b839f17
|
[build] Adding language classifier train/test/cli programs to the build
|
2016-01-26 00:09:07 -05:00 |
|
Al
|
5d5d5713cc
|
[transliteration] Regenerating transliterator scripts
|
2016-01-18 12:04:14 -05:00 |
|
Al
|
0dfd8d6439
|
[language_classification] Adding script feature for any non-Latin script. Even if the script doesn't directly identify the language, it can act as a modified intercept (all Han script addresses will share the Han feature, even if we haven't seen one of the > 80k Han characters)
|
2016-01-17 21:37:45 -05:00 |
|