Commit Graph

752 Commits

Author SHA1 Message Date
Rinigus
67624f89d0 cstring_array_from_char_array: return empty initializes cstring_array from empty string 2017-01-14 10:43:47 +02:00
Al
df89387b5c [fix] calloc instead of malloc when performing initialization on structs that may fail halfway and need to clean up while partially initialized (calloc will set all the bytes to zero so the member pointers are NULL instead of garbage memory) 2017-01-13 18:30:04 -05:00
Al
1398df1260 [fix] accept 0 for array_new_size 2017-01-13 17:49:31 -05:00
Al
e1f258171f [fix] handle cstring_array_from_char_array where char_array is NULL or 0-length 2017-01-13 16:52:41 -05:00
Al
a3506131fe [build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime 2017-01-09 16:11:26 -05:00
Al
953a26e54e [utils] char_array_add_vjoined to stay consistent (add_* methods NUL termiante) 2017-01-09 16:10:07 -05:00
Rinigus
26aeb0ebec drop AC_FUNC_MALLOC and _REALLOC and check for them as regular functions; add extra cflags for scanner 2017-01-05 07:34:24 +02:00
Travis
d61e90a33d [auto][ci skip] Adding data files from Travis build #188 2017-01-01 19:20:54 +00:00
Travis
6c35eb9e65 [auto][ci skip] Adding data files from Travis build #186 2016-12-28 06:29:35 +00:00
Travis
dc528affd5 [auto][ci skip] Adding data files from Travis build #184 2016-12-27 23:45:40 +00:00
Brad Hards
fb68e22bbf [fix] Use UTC date reference to avoid repeating S3 downloads.
Resolves https://github.com/openvenues/libpostal/issues/143
2016-12-26 12:04:02 +11:00
Al
8fe7958969 [build] allowing --disable-data-download option to configure. N.B. this is mostly for people building Docker images. The data files are NOT optional. 2016-12-22 12:31:27 -05:00
Al
09b4e2ba2f [build] pulling in change from parser-data that allows user to pass CFLAGS 2016-12-21 14:39:27 -05:00
Al
8f1e69960f [fix] loading transliteration module in address_parser_test.c as well 2016-12-12 11:37:27 -05:00
Al
3939dd0ca6 [fix] cstring_array_split calls 2016-12-12 11:37:27 -05:00
Al
a42d0e917a [fix] brace 2016-12-12 11:37:27 -05:00
Al
ced8f9ae27 [parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent 2016-12-12 11:37:27 -05:00
Al
b1816e9b70 [utils] Adding cstring_array_split_ignore_consecutive 2016-12-12 11:37:27 -05:00
Al
6baa7087fe [fix] calls and NULL checks 2016-12-12 11:37:27 -05:00
Al
5e07f5e8c5 [fix] tokenized_string_t should copy its source string 2016-12-12 11:37:27 -05:00
Al
521a094a47 [fix] Need to load transliteration module for Latin-ASCII normalization 2016-12-12 11:37:27 -05:00
Al
d575caba8a [data] using UTC for libpostal data files on the Mac version of the download script as well 2016-12-09 19:43:05 -05:00
Al
c3f3896b48 [fix] update test for date function in data download script 2016-12-09 19:29:00 -05:00
Travis
04f8130c46 [auto][ci skip] Adding data files from Travis build #168 2016-10-07 00:46:48 +00:00
Al
01afbf80ef [data] Each curl process will retry the chunk up to 3 times 2016-08-25 23:18:39 -04:00
Travis
de1255af00 [auto][ci skip] Adding data files from Travis build #161 2016-08-23 22:48:20 +00:00
Travis
f19c9852aa [auto][ci skip] Adding data files from Travis build #160 2016-08-23 22:24:19 +00:00
Travis
d797d6c863 [auto][ci skip] Adding data files from Travis build #159 2016-08-23 22:14:07 +00:00
Tom Davis
18c8e90eb3 Use xargs to start workers as soon as possible 2016-07-27 17:46:44 -04:00
Tom Davis
11abf6cb22 Use posix sh for systems without bash 2016-07-26 20:17:18 -04:00
Al Barrentine
65c4688f89 Merge pull request #97 from uberbaud/multipart_edgecase
Don't call `download_multipart` for 1 chunk
2016-07-24 00:03:51 -04:00
Travis
3f0eff228e [auto][ci skip] Adding data files from Travis build #145 2016-07-23 22:28:32 +00:00
Tom Davis
2991ffd193 Don't call download_multipart for 1 chunk
Previously, where a file was larger than `$LARGE_FILE_SIZE` but smaller
than `$CHUNK_SIZE*2`, `download_multipart` would be called but would
only download one (1) chunk that was the whole file.

This fix keeps the same download performance as before but optimizes
processing chunks out.
2016-07-23 16:41:04 -04:00
Tom Davis
24e0314e71 Remove call to seq which may not exist 2016-07-23 01:03:15 -04:00
Al Barrentine
e02c6adc85 Merge pull request #91 from uberbaud/openbsd
Add support for OpenBSD
2016-07-20 19:47:18 -04:00
Tom Davis
c0366147e8 Add support for OpenBSD 2016-07-20 18:19:31 -04:00
Tom Davis
a8bb798ce0 Call libpostal_data in source path, not build path
This fix updates Makefile to find the actual libpostal_data file when
`configure` is called from another directory, which it uses as the build
directory.
2016-07-20 17:31:52 -04:00
Travis
a0f6e100f1 [auto][ci skip] Adding data files from Travis build #133 2016-07-17 19:13:46 +00:00
Al
12d50aac12 Merge branch 'master' of https://github.com/openvenues/libpostal 2016-07-17 15:03:52 -04:00
Al
83381e9d8a [expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84. Closes #85 2016-07-17 15:02:47 -04:00
Travis
2fb677ca73 [auto][ci skip] Adding data files from Travis build #132 2016-07-17 18:47:28 +00:00
David Farrell
a7a9708d2b don't error on multiple setup_parser() 2016-07-17 11:25:03 -04:00
Al
d7996ed56c [fix] setting garbage pointer to NULL on language_classifier_teardown (fixes #82) 2016-07-17 01:56:09 -04:00
Al
ce78064988 [fix] NULL checks 2016-07-15 13:23:23 -04:00
Al
2f5f226faa [fix] Add original string to normalizations if all options were set to false 2016-07-15 13:23:23 -04:00
Al
e816b4f77e [parser] Ignore language/country options explicitly in the parser. The purpose of these options is not to be able to create language-specific/country-specific models at some point, shouldn't be used in the global model 2016-07-06 14:56:46 -04:00
Al
58a5dbe7e0 [logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs 2016-07-01 14:51:04 -04:00
Al
ad9dfb46bd [build] Using a process pool with 64MB chunks (similar to aws cli) for S3 downloads. Setting the max concurrent requeests to 10, also the default in aws cli. 2016-07-01 14:37:13 -04:00
Al
a9ba61585b [fix] Adding set -e to data download script so it fails if any subcommands fail 2016-05-04 23:08:06 -04:00
Al
9819ebf949 [fix] always include expansions in the ambiguous expansion dictionary, no matter which component 2016-04-29 13:26:13 -04:00