Commit Graph

685 Commits

Author SHA1 Message Date
Travis
576e91d3fa [auto][ci skip] Adding data files from Travis build #84 2016-03-16 19:08:17 +00:00
Travis
2dc9643b29 [auto][ci skip] Adding data files from Travis build #82 2016-03-14 16:29:21 +00:00
Al
0d7f9f2032 [data] Using UTC dates for libpostal data file tracking for #38. Also silencing curl when checking if file was updated 2016-03-10 16:44:02 -05:00
Travis
c4203c6ea9 [auto][ci skip] Adding data files from Travis build #63 2016-03-06 18:00:40 +00:00
Travis
73140a8239 [auto][ci skip] Adding data files from Travis build #62 2016-03-06 17:51:23 +00:00
Travis
d8e0945d5b [auto][build] Adding data files from Travis build #57 2016-03-06 16:11:32 +00:00
Al
b5807926bc [fix] Using PRId64 in all cases for int64_t printf formatting 2016-03-02 16:47:49 -05:00
Al
72fa6c0a6c [fix] numex_table builder program using new API (heap-allocated strings) 2016-03-02 16:28:28 -05:00
Al
999a9e24cb [numex] Regenerating numex_data.c 2016-03-02 16:11:09 -05:00
Al
d1f62ddc63 [dictionaries] Re-generating address_expansion_data.c 2016-03-02 16:10:30 -05:00
Al
1ddc69d984 [fix] var declaration during trie creation 2016-03-02 16:05:32 -05:00
Al
122397759d [dictionaries] Re-generating address_expansion_data.c 2016-03-01 18:37:40 -05:00
Al
d35f97f6f1 [fix] All file_read_uint64 calls that use stack variables read into a uint64_t not a size_t so as not to smash the stack under a 32-bit arch (issue #18) 2016-02-29 22:36:00 -05:00
Federico Mena Quintero
4eac38c40c [fix] Check the return of malloc() in geonames.c 2016-02-25 14:53:31 -06:00
Federico Mena Quintero
2ae2450db7 [fix] Check the return of malloc() in numex.c 2016-02-25 14:53:27 -06:00
Federico Mena Quintero
b172071d3b [fix] Remove superfluous #define; the caller actually uses sizeof(DEFAULT_ALPHABET) itself 2016-02-25 14:53:27 -06:00
Federico Mena Quintero
10c6768b5b [fix] Don't leak the trie if the number of nodes can't be read from a file 2016-02-25 14:53:27 -06:00
Federico Mena Quintero
e60ad47677 [fix] Check return of malloc() in trie.c 2016-02-25 14:53:22 -06:00
Al
87cf63942e [dictionaries] Regenerating address_expansion_data.c 2016-02-22 18:39:38 -05:00
Al
37c09d1ed9 [api] Adding function to free expansions from expand_address 2016-02-16 10:56:45 -05:00
Al
98165e89ad [api] Using bools instead of bit fields in the public API 2016-02-15 18:33:39 -05:00
Al
cf2a79bef1 [api] Default options accessible through getters, not static structs 2016-02-15 17:34:00 -05:00
Al
98c395d34c [numex] Concatenating a string of numeric expressions with no intervening tokens like Seventeen Eighty or Ten Oh Four 2016-02-10 09:21:31 -05:00
Al
59cf5bfc62 [numex] Fixing cases with stopwords not attached to a numeric expression 2016-02-10 08:30:01 -05:00
Al
c32ef9ccf8 [fix] freeing up iterator in normalize_string 2016-02-09 01:06:51 -05:00
Al
12c2477359 [phrases] Another fix to tail token search 2016-02-08 17:55:21 -05:00
Al
39f162b029 [phrases] fix in tokenized tail search when whitespace tokens are preserved 2016-02-08 16:37:52 -05:00
Al
84d5ba18f0 [api] Fixing multi-language expansions with overlapping expansions, whitespace, utf8 normalization of canonical strings 2016-02-08 02:50:34 -05:00
Al
0695738253 [fix] cleaning up memory in normalize_string_languages 2016-02-08 02:43:12 -05:00
Al
afd5844f21 [normalize] Permuting transliterators only once on the entire string rather than at each script break (so # permutations is bounded and can't get huge). Fixing some spacing issues. Adding method to check for an alpha+numeric token in normalization. 2016-02-08 01:16:47 -05:00
Al
aaad213a20 [cli] Adding printf while models are being loaded in address parser cli 2016-02-08 01:10:06 -05:00
Al
9ac0379a65 [phrases] Case where trie search finds a match, makes progress beyond the next token but has to fall back. Adding trie search test case 2016-02-08 01:07:56 -05:00
Al
3701d8380f [cli] Command-line expansion client now supports piping in stdin, Unix-style 2016-02-03 13:48:51 -05:00
Al Barrentine
7536fa4647 [fix] static inline 2016-02-02 00:53:13 -05:00
Al
c0b548833b [fix] create data dir if it doesn't exist 2016-01-30 13:40:10 -05:00
Al
1e65fafaaf [fix] char * 2016-01-30 13:39:36 -05:00
Al
f8de9d8e5a [fix] static methods in numex table loading, mallocs instead of stack variables 2016-01-30 13:25:48 -05:00
Al
085bfd6ada [fix] static methods for libpostal.c 2016-01-30 02:20:59 -05:00
Al
63d239eef0 [tokenization] Using the new re2c 0.16 generates a 75% smaller DFA for scanner, should speed up compile times on gcc 2016-01-30 02:20:01 -05:00
Al
9b3296914a [build] Defining LIBPOSTAL_DATA_DIR at compile time, not configure 2016-01-30 02:18:12 -05:00
Al
cd76c660d8 [fix] French numex 2016-01-28 16:40:50 -05:00
Al
95a7978131 [build] Adding relevant language_classifier sources to build 2016-01-27 03:34:35 -05:00
Al
93ed2bf15b [api] Making language optional in libpostal cli 2016-01-27 03:32:29 -05:00
Al
789db8f582 [build] Adding language classifier to data file download script. As the current file is rather large, added multipart downloads from S3 to speed things up 2016-01-27 03:31:45 -05:00
Al
42d169feee [api] Libpostal expand API will now detect language automatically using a high accuracy language classifier trained on OSM streets/addresses/toponyms. Hooray batch geocoding! 2016-01-27 03:23:51 -05:00
Al
71c51f2e45 [language_classification] Making directory optional on language_classifier client/test program 2016-01-27 03:18:53 -05:00
Al
c770468d03 [expansion] Regenerated address_expansion_data.c 2016-01-27 03:17:59 -05:00
Al
36f52d9707 [fix] Removing feature printing 2016-01-26 15:34:56 -05:00
Al
5077462754 [fix] temporary files for language classifier training 2016-01-26 01:42:21 -05:00
Al
426edccbf8 [language_classification] Simple accuracy-based test program for language classifier. 2016-01-26 01:29:56 -05:00