Commit Graph

716 Commits

Author SHA1 Message Date
Al
d9d53ce17e [math] Matrix method updates 2015-12-08 15:39:52 -05:00
Al
48ee665e71 [scripts] Benchmark script using default options 2015-12-08 15:38:44 -05:00
Al
2fcc72ae07 [fix] multitoken canonical strings 2015-12-08 15:38:04 -05:00
Al
a857138d95 [api] Adding place name expansions by default 2015-12-08 15:31:36 -05:00
Al
beec43fe15 [expansion] regenerating expansion data 2015-12-08 15:28:54 -05:00
Al
e1ea2ac704 [expansion] Toponym dictionaries can apply to street names and place names 2015-12-08 02:10:22 -05:00
Al
cbe5cd7429 [expansion] The ambiguous expansions dictionary shouldn't add to the component bitset 2015-12-07 20:36:56 -05:00
Al
d35f519629 [expansion] Fixing case where non-ideographic tokens like # can potentially be concatenated with surrounding tokens and should normalized with whitespace in between 2015-12-07 19:18:46 -05:00
Al
f5739dd42b [math] Signatures for array_exp and array_log 2015-12-07 18:10:04 -05:00
Al
0d8d396108 [expansion] Fixing cases like ML King where a global (all languages) expansion subsumes the specific language expansion (like English) 2015-12-07 18:09:25 -05:00
Al
9bab70909d [numex] Always adding a version of the string without Roman numeral expansion since many times those tokens can be ambiguous 2015-12-07 14:29:18 -05:00
Al
a066ee9aad [math] Only reallocate on matrix_resize if needed 2015-12-07 01:20:16 -05:00
Al
cfd0dc69f2 [parsing] Using the entire phrase as the ith word 2015-12-07 01:19:38 -05:00
Al
8186e2606e [dictionaries] Regenerating address expansion data file 2015-12-06 16:56:27 -05:00
Al
44f7fd0844 [math] Matrix resize 2015-12-06 03:20:03 -05:00
Al
596c5ffdd3 [fix] Tokenized trie search 2015-12-05 15:21:52 -05:00
Al
24208c209f [parsing] Adding a training data derived index of complete phrases from suburb up to country. Only adding bias and word features for non phrases, using UNKNOWN_WORD and UNKNOWN_NUMERIC for infrequent tokens (not meeting minimum vocab count threshold). 2015-12-05 14:34:19 -05:00
Al
f41158b8b3 [osm] Avoid using the alternate name (e.g. Brooklyn instead of Kings County) when it is the same as city 2015-12-05 14:21:07 -05:00
Al
25e89bcc41 [fix] tokenized trie search edge case where tail is stored on the space node 2015-12-03 12:25:21 -05:00
Al
43287db90a [normalization/phrases] Fixing a bug which occurs with an already-separated elision 2015-12-02 16:04:39 -05:00
Al
746b5d0f34 [fix] transliterate using string_equals 2015-12-02 13:09:43 -05:00
Al
d0aaff1482 [utils] string_equals with NULL check 2015-12-01 13:12:08 -05:00
Al
f322ae0a1c [build] adding shuffle.c to Makefile rule 2015-12-01 11:28:33 -05:00
Al
b94264b745 [parser] Forgot to add shuffle.h/.c 2015-12-01 11:25:28 -05:00
Al
116fe857db [parser] gshuf (Mac equivalent of shuf) is quite a bit slower than shuf, so removing it. Need to train on Linux unless a better alternative is found for shuffling large files on Mac 2015-12-01 11:24:44 -05:00
Al
5f13041140 [parsing/build] Makefile changes for address parser 2015-11-30 14:51:43 -05:00
Al
4ca911baf8 [parsing] Adding a command-line client (with history) to test address parsing 2015-11-30 14:51:01 -05:00
Al
89677d94a3 [parsing] Initial commit of the address parser, training/testing, feature function, I/O 2015-11-30 14:48:13 -05:00
Al
e62eb1e697 [math] Matrix file I/O 2015-11-30 12:53:18 -05:00
Al
5682c347ac [fix] close file handle 2015-11-30 12:51:13 -05:00
Al
feab77970b [cli] Adding antirez's linenoise for command-line interfaces 2015-11-29 11:28:31 -05:00
Al
d3040036ec [fix] moving separator definitions 2015-11-28 13:53:13 -05:00
Al
094a5bf5f4 [dictionaries] adding Jnr and Snr forms for generational suffixes 2015-10-28 00:00:34 -04:00
Al
c2d112f4fc [fix] compile flags in Makefile.am 2015-10-27 19:01:37 -04:00
Al
6aaa08c220 [fix] Usage on libpostal_data script 2015-10-27 13:33:03 -04:00
Al
40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) 2015-10-27 13:32:47 -04:00
Al
3fe2365234 [fix] signed size_t in trie_set_tail 2015-10-27 13:21:26 -04:00
Al
ad59ba7a7b [fix] Re-generating transliteration tables 2015-10-27 12:28:08 -04:00
Al
1a1d74785c [fix] Compiler warnings for casts/printf 2015-10-26 18:52:18 -04:00
Al
6b456025b4 [fix] warnings in klib/ksort.h 2015-10-26 18:50:22 -04:00
Al
3b3513ffe3 [fix] warnings in collections.h/vector_math.h 2015-10-26 18:49:58 -04:00
Al
83c6a87ab1 [build] substitution for use of LIBPOSTAL_DATA_DIR in Makefile.am 2015-10-26 18:47:07 -04:00
Al
a319c1f6a0 [build] defining LIBPOSTAL_DATA_DIR in Autoconf rather than Automake, becomes part of config.h 2015-10-26 18:06:05 -04:00
Al
309d41a652 [math] adding matrix_zero method 2015-10-25 21:38:59 -04:00
Al
a2ad829d52 [math] matrix scalar arithmetic 2015-10-16 16:26:27 -04:00
Al
080ccf0ddd [fix] logging warnings in transliterate 2015-10-12 13:50:42 -05:00
Al
baef090793 [logging] Wrapping logging statements in a do while (0) so the compiler always at least sees debug code 2015-10-12 13:42:10 -05:00
Al
b88f237d82 [build] Adding separate Makefile target for downloading geodb 2015-10-11 22:27:25 -05:00
Al
588cf1df86 [build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3 2015-10-11 22:25:37 -05:00
Al
372e952cd3 [geodb] Adding some logging to geodb 2015-10-11 01:00:08 -05:00