Al
|
98c395d34c
|
[numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four
|
2016-02-10 09:21:31 -05:00 |
|
Al
|
59cf5bfc62
|
[numex] Fixing cases with stopwords not attached to a numeric expression
|
2016-02-10 08:30:01 -05:00 |
|
Al
|
c32ef9ccf8
|
[fix] freeing up iterator in normalize_string
|
2016-02-09 01:06:51 -05:00 |
|
Al
|
12c2477359
|
[phrases] Another fix to tail token search
|
2016-02-08 17:55:21 -05:00 |
|
Al
|
39f162b029
|
[phrases] fix in tokenized tail search when whitespace tokens are preserved
|
2016-02-08 16:37:52 -05:00 |
|
Al
|
84d5ba18f0
|
[api] Fixing multi-language expansions with overlapping expansions, whitespace, utf8 normalization of canonical strings
|
2016-02-08 02:50:34 -05:00 |
|
Al
|
0695738253
|
[fix] cleaning up memory in normalize_string_languages
|
2016-02-08 02:43:12 -05:00 |
|
Al
|
afd5844f21
|
[normalize] Permuting transliterators only once on the entire string rather than at each script break (so # permutations is bounded and can't get huge). Fixing some spacing issues. Adding method to check for an alpha+numeric token in normalization.
|
2016-02-08 01:16:47 -05:00 |
|
Al
|
aaad213a20
|
[cli] Adding printf while models are being loaded in address parser cli
|
2016-02-08 01:10:06 -05:00 |
|
Al
|
9ac0379a65
|
[phrases] Case where trie search finds a match, makes progress beyond the next token but has to fall back. Adding trie search test case
|
2016-02-08 01:07:56 -05:00 |
|
Al
|
3701d8380f
|
[cli] Command-line expansion client now supports piping in stdin, Unix-style
|
2016-02-03 13:48:51 -05:00 |
|
Al Barrentine
|
7536fa4647
|
[fix] static inline
|
2016-02-02 00:53:13 -05:00 |
|
Al
|
c0b548833b
|
[fix] create data dir if it doesn't exist
|
2016-01-30 13:40:10 -05:00 |
|
Al
|
1e65fafaaf
|
[fix] char *
|
2016-01-30 13:39:36 -05:00 |
|
Al
|
f8de9d8e5a
|
[fix] static methods in numex table loading, mallocs instead of stack variables
|
2016-01-30 13:25:48 -05:00 |
|
Al
|
085bfd6ada
|
[fix] static methods for libpostal.c
|
2016-01-30 02:20:59 -05:00 |
|
Al
|
63d239eef0
|
[tokenization] Using the new re2c 0.16 generates a 75% smaller DFA for scanner, should speed up compile times on gcc
|
2016-01-30 02:20:01 -05:00 |
|
Al
|
9b3296914a
|
[build] Defining LIBPOSTAL_DATA_DIR at compile time, not configure
|
2016-01-30 02:18:12 -05:00 |
|
Al
|
cd76c660d8
|
[fix] French numex
|
2016-01-28 16:40:50 -05:00 |
|
Al
|
95a7978131
|
[build] Adding relevant language_classifier sources to build
|
2016-01-27 03:34:35 -05:00 |
|
Al
|
93ed2bf15b
|
[api] Making language optional in libpostal cli
|
2016-01-27 03:32:29 -05:00 |
|
Al
|
789db8f582
|
[build] Adding language classifier to data file download script. As the current file is rather large, added multipart downloads from S3 to speed things up
|
2016-01-27 03:31:45 -05:00 |
|
Al
|
42d169feee
|
[api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding!
|
2016-01-27 03:23:51 -05:00 |
|
Al
|
71c51f2e45
|
[language_classification] Making directory optional on language_classifier client/test program
|
2016-01-27 03:18:53 -05:00 |
|
Al
|
c770468d03
|
[expansion] Regenerated address_expansion_data.c
|
2016-01-27 03:17:59 -05:00 |
|
Al
|
36f52d9707
|
[fix] Removing feature printing
|
2016-01-26 15:34:56 -05:00 |
|
Al
|
5077462754
|
[fix] temporary files for language classifier training
|
2016-01-26 01:42:21 -05:00 |
|
Al
|
426edccbf8
|
[language_classification] Simple accuracy-based test program for language classifier.
|
2016-01-26 01:29:56 -05:00 |
|
Al
|
9abbf42bf4
|
[language_classifier] Command-line client for language classification
|
2016-01-26 01:20:59 -05:00 |
|
Al
|
314b65e192
|
[build] Adding shuffle.c to language_classifier_train
|
2016-01-26 01:18:35 -05:00 |
|
Al
|
ababb8f2d0
|
[fix] sign comparison in regularized gradient computation for logistic regression
|
2016-01-26 01:16:16 -05:00 |
|
Al
|
ae2b839f17
|
[build] Adding language classifier train/test/cli programs to the build
|
2016-01-26 00:09:07 -05:00 |
|
Al
|
5d5d5713cc
|
[transliteration] Regenerating transliterator scripts
|
2016-01-18 12:04:14 -05:00 |
|
Al
|
0dfd8d6439
|
[language_classification] Adding script feature for any non-Latin script. Even if the script doesn't directly identify the language, it can act as a modified intercept (all Han script addresses will share the Han feature, even if we haven't seen one of the > 80k Han characters)
|
2016-01-17 21:37:45 -05:00 |
|
Al
|
b9a3230f65
|
[language_classification] Removing the per-country classifier, text-based alone is doing close to 99% accuracy now
|
2016-01-17 21:13:14 -05:00 |
|
Al
|
f808f74271
|
[language_classification] Automatic hyperparameter optimization using either the cross-validation set or two distinct subsets of the training set
|
2016-01-17 21:11:37 -05:00 |
|
Al
|
af5689ee52
|
[fix] removing unused var
|
2016-01-17 21:00:17 -05:00 |
|
Al
|
7d727fc8f0
|
[optimization] Using adapted learning rate in stochastic gradient descent (if lambda > 0)
|
2016-01-17 20:59:47 -05:00 |
|
Al
|
7b300639f1
|
[fix] Trie prefix search tail comparison
|
2016-01-17 20:56:37 -05:00 |
|
Al
|
70dbfdd560
|
[unicode] Regenerating unicode_script_data.c
|
2016-01-17 20:53:44 -05:00 |
|
Al
|
de240d2b94
|
[fix] tokenize_add_tokens respects specified length
|
2016-01-17 20:51:47 -05:00 |
|
Al
|
10cadc67d7
|
[io] matrix_read using array I/O functions
|
2016-01-17 20:40:18 -05:00 |
|
Al
|
baba826d21
|
[io] Cutting down on system calls in trie_read
|
2016-01-17 20:39:19 -05:00 |
|
Al
|
cba2acc21f
|
[io] Sparse matrix using array I/O methods
|
2016-01-17 20:38:16 -05:00 |
|
Al
|
46b35c5202
|
[utils] Adding functions to read numeric arrays from files
|
2016-01-17 20:36:57 -05:00 |
|
Al
|
d4143c1685
|
[parsing] Adding an optimization to the parser API where, if the entire input is a single known geographic phrase like New York, it returns the most likely label from the training data. That way e.g. a search for 'Florida' doesn't get tagged as 'house.' This doesn't affect training, only prediction.
|
2016-01-15 20:07:21 -05:00 |
|
Al
|
622dc354e7
|
[optimization] Adding learning rate to lazy sparse update in stochastic gradient descent
|
2016-01-12 11:04:16 -05:00 |
|
Al
|
79f2b7c192
|
[build] Removing source from libpostal shared lib
|
2016-01-12 10:31:22 -05:00 |
|
Al
|
6a9c1e8c6d
|
[build] Adding trie_utils.c to address parser train/test
|
2016-01-12 10:22:34 -05:00 |
|
Al
|
7cc201dec3
|
[optimization] Moving gamma_t calculation to the header in SGD
|
2016-01-11 16:40:50 -05:00 |
|