Commit Graph

741 Commits

Author SHA1 Message Date
Al
8fe7958969 [build] allowing --disable-data-download option to configure. N.B. this is mostly for people building Docker images. The data files are NOT optional. 2016-12-22 12:31:27 -05:00
Al
09b4e2ba2f [build] pulling in change from parser-data that allows user to pass CFLAGS 2016-12-21 14:39:27 -05:00
Al
8f1e69960f [fix] loading transliteration module in address_parser_test.c as well 2016-12-12 11:37:27 -05:00
Al
3939dd0ca6 [fix] cstring_array_split calls 2016-12-12 11:37:27 -05:00
Al
a42d0e917a [fix] brace 2016-12-12 11:37:27 -05:00
Al
ced8f9ae27 [parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent 2016-12-12 11:37:27 -05:00
Al
b1816e9b70 [utils] Adding cstring_array_split_ignore_consecutive 2016-12-12 11:37:27 -05:00
Al
6baa7087fe [fix] calls and NULL checks 2016-12-12 11:37:27 -05:00
Al
5e07f5e8c5 [fix] tokenized_string_t should copy its source string 2016-12-12 11:37:27 -05:00
Al
521a094a47 [fix] Need to load transliteration module for Latin-ASCII normalization 2016-12-12 11:37:27 -05:00
Al
d575caba8a [data] using UTC for libpostal data files on the Mac version of the download script as well 2016-12-09 19:43:05 -05:00
Al
c3f3896b48 [fix] update test for date function in data download script 2016-12-09 19:29:00 -05:00
Travis
04f8130c46 [auto][ci skip] Adding data files from Travis build #168 2016-10-07 00:46:48 +00:00
Al
01afbf80ef [data] Each curl process will retry the chunk up to 3 times 2016-08-25 23:18:39 -04:00
Travis
de1255af00 [auto][ci skip] Adding data files from Travis build #161 2016-08-23 22:48:20 +00:00
Travis
f19c9852aa [auto][ci skip] Adding data files from Travis build #160 2016-08-23 22:24:19 +00:00
Travis
d797d6c863 [auto][ci skip] Adding data files from Travis build #159 2016-08-23 22:14:07 +00:00
Tom Davis
18c8e90eb3 Use xargs to start workers as soon as possible 2016-07-27 17:46:44 -04:00
Tom Davis
11abf6cb22 Use posix sh for systems without bash 2016-07-26 20:17:18 -04:00
Al Barrentine
65c4688f89 Merge pull request #97 from uberbaud/multipart_edgecase
Don't call `download_multipart` for 1 chunk
2016-07-24 00:03:51 -04:00
Travis
3f0eff228e [auto][ci skip] Adding data files from Travis build #145 2016-07-23 22:28:32 +00:00
Tom Davis
2991ffd193 Don't call download_multipart for 1 chunk
Previously, where a file was larger than `$LARGE_FILE_SIZE` but smaller
than `$CHUNK_SIZE*2`, `download_multipart` would be called but would
only download one (1) chunk that was the whole file.

This fix keeps the same download performance as before but optimizes
processing chunks out.
2016-07-23 16:41:04 -04:00
Tom Davis
24e0314e71 Remove call to seq which may not exist 2016-07-23 01:03:15 -04:00
Al Barrentine
e02c6adc85 Merge pull request #91 from uberbaud/openbsd
Add support for OpenBSD
2016-07-20 19:47:18 -04:00
Tom Davis
c0366147e8 Add support for OpenBSD 2016-07-20 18:19:31 -04:00
Tom Davis
a8bb798ce0 Call libpostal_data in source path, not build path
This fix updates Makefile to find the actual libpostal_data file when
`configure` is called from another directory, which it uses as the build
directory.
2016-07-20 17:31:52 -04:00
Travis
a0f6e100f1 [auto][ci skip] Adding data files from Travis build #133 2016-07-17 19:13:46 +00:00
Al
12d50aac12 Merge branch 'master' of https://github.com/openvenues/libpostal 2016-07-17 15:03:52 -04:00
Al
83381e9d8a [expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84. Closes #85 2016-07-17 15:02:47 -04:00
Travis
2fb677ca73 [auto][ci skip] Adding data files from Travis build #132 2016-07-17 18:47:28 +00:00
David Farrell
a7a9708d2b don't error on multiple setup_parser() 2016-07-17 11:25:03 -04:00
Al
d7996ed56c [fix] setting garbage pointer to NULL on language_classifier_teardown (fixes #82) 2016-07-17 01:56:09 -04:00
Al
ce78064988 [fix] NULL checks 2016-07-15 13:23:23 -04:00
Al
2f5f226faa [fix] Add original string to normalizations if all options were set to false 2016-07-15 13:23:23 -04:00
Al
e816b4f77e [parser] Ignore language/country options explicitly in the parser. The purpose of these options is not to be able to create language-specific/country-specific models at some point, shouldn't be used in the global model 2016-07-06 14:56:46 -04:00
Al
58a5dbe7e0 [logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs 2016-07-01 14:51:04 -04:00
Al
ad9dfb46bd [build] Using a process pool with 64MB chunks (similar to aws cli) for S3 downloads. Setting the max concurrent requeests to 10, also the default in aws cli. 2016-07-01 14:37:13 -04:00
Al
a9ba61585b [fix] Adding set -e to data download script so it fails if any subcommands fail 2016-05-04 23:08:06 -04:00
Al
9819ebf949 [fix] always include expansions in the ambiguous expansion dictionary, no matter which component 2016-04-29 13:26:13 -04:00
Al
0bc3550c11 [expansion] Adding address_expansion_in_dictionary 2016-04-29 13:23:48 -04:00
Al
59e5fcd1b4 [fix] LC_ALL=C in data download script 2016-04-11 12:47:50 -04:00
Travis
b8d4d71522 [auto][ci skip] Adding data files from Travis build #112 2016-03-30 20:04:52 +00:00
Al
14e8f50cf1 [fix] Expansions when passing in the address_components= option. Was only limiting results at the phrase level, should work at the individual expansion level 2016-03-29 16:46:29 -04:00
Travis
2795d258d1 [auto][ci skip] Adding data files from Travis build #108 2016-03-29 19:11:57 +00:00
Al
6dad58c696 [fix][ci skip] last remaining instance of vignt in libpostal 2016-03-29 12:51:19 -04:00
Travis
08d873ac15 [auto][ci skip] Adding data files from Travis build #105 2016-03-29 15:39:14 +00:00
Travis
49adcfe9b5 [auto][ci skip] Adding data files from Travis build #97 2016-03-22 14:33:13 +00:00
Al
25c8ba8603 [fix] Log more helpful error message in language_classifier if not loaded 2016-03-21 18:18:25 -04:00
Al
0356b45069 [fix] Log errors in numex module if not loaded 2016-03-21 18:15:53 -04:00
Al
943cd4443a [fix] Log errors if address dictionaries not loaded 2016-03-21 18:13:14 -04:00