Rinigus
67624f89d0
cstring_array_from_char_array: return empty initializes cstring_array from empty string
2017-01-14 10:43:47 +02:00
Al
df89387b5c
[fix] calloc instead of malloc when performing initialization on structs that may fail halfway and need to clean up while partially initialized (calloc will set all the bytes to zero so the member pointers are NULL instead of garbage memory)
2017-01-13 18:30:04 -05:00
Al
1398df1260
[fix] accept 0 for array_new_size
2017-01-13 17:49:31 -05:00
Al
e1f258171f
[fix] handle cstring_array_from_char_array where char_array is NULL or 0-length
2017-01-13 16:52:41 -05:00
Al
a3506131fe
[build] adding libpostal_setup_datadir, libpostal_setup_parser_datadir, libpostal_setup_language_classifier_datadir functions for configuring the datadir at runtime
2017-01-09 16:11:26 -05:00
Al
953a26e54e
[utils] char_array_add_vjoined to stay consistent (add_* methods NUL termiante)
2017-01-09 16:10:07 -05:00
Rinigus
26aeb0ebec
drop AC_FUNC_MALLOC and _REALLOC and check for them as regular functions; add extra cflags for scanner
2017-01-05 07:34:24 +02:00
Travis
d61e90a33d
[auto][ci skip] Adding data files from Travis build #188
2017-01-01 19:20:54 +00:00
Travis
6c35eb9e65
[auto][ci skip] Adding data files from Travis build #186
2016-12-28 06:29:35 +00:00
Travis
dc528affd5
[auto][ci skip] Adding data files from Travis build #184
2016-12-27 23:45:40 +00:00
Brad Hards
fb68e22bbf
[fix] Use UTC date reference to avoid repeating S3 downloads.
...
Resolves https://github.com/openvenues/libpostal/issues/143
2016-12-26 12:04:02 +11:00
Al
8fe7958969
[build] allowing --disable-data-download option to configure. N.B. this is mostly for people building Docker images. The data files are NOT optional.
2016-12-22 12:31:27 -05:00
Al
09b4e2ba2f
[build] pulling in change from parser-data that allows user to pass CFLAGS
2016-12-21 14:39:27 -05:00
Al
8f1e69960f
[fix] loading transliteration module in address_parser_test.c as well
2016-12-12 11:37:27 -05:00
Al
3939dd0ca6
[fix] cstring_array_split calls
2016-12-12 11:37:27 -05:00
Al
a42d0e917a
[fix] brace
2016-12-12 11:37:27 -05:00
Al
ced8f9ae27
[parser] Ignore multiple spaces in parser input post-normalization. If normalizing the string creates several distinct tokens (namely in Vulgar fractions e.g. ½ => 1/2), add all the sub-tokens with the same label as the parent
2016-12-12 11:37:27 -05:00
Al
b1816e9b70
[utils] Adding cstring_array_split_ignore_consecutive
2016-12-12 11:37:27 -05:00
Al
6baa7087fe
[fix] calls and NULL checks
2016-12-12 11:37:27 -05:00
Al
5e07f5e8c5
[fix] tokenized_string_t should copy its source string
2016-12-12 11:37:27 -05:00
Al
521a094a47
[fix] Need to load transliteration module for Latin-ASCII normalization
2016-12-12 11:37:27 -05:00
Al
d575caba8a
[data] using UTC for libpostal data files on the Mac version of the download script as well
2016-12-09 19:43:05 -05:00
Al
c3f3896b48
[fix] update test for date function in data download script
2016-12-09 19:29:00 -05:00
Travis
04f8130c46
[auto][ci skip] Adding data files from Travis build #168
2016-10-07 00:46:48 +00:00
Al
01afbf80ef
[data] Each curl process will retry the chunk up to 3 times
2016-08-25 23:18:39 -04:00
Travis
de1255af00
[auto][ci skip] Adding data files from Travis build #161
2016-08-23 22:48:20 +00:00
Travis
f19c9852aa
[auto][ci skip] Adding data files from Travis build #160
2016-08-23 22:24:19 +00:00
Travis
d797d6c863
[auto][ci skip] Adding data files from Travis build #159
2016-08-23 22:14:07 +00:00
Tom Davis
18c8e90eb3
Use xargs to start workers as soon as possible
2016-07-27 17:46:44 -04:00
Tom Davis
11abf6cb22
Use posix sh for systems without bash
2016-07-26 20:17:18 -04:00
Al Barrentine
65c4688f89
Merge pull request #97 from uberbaud/multipart_edgecase
...
Don't call `download_multipart` for 1 chunk
2016-07-24 00:03:51 -04:00
Travis
3f0eff228e
[auto][ci skip] Adding data files from Travis build #145
2016-07-23 22:28:32 +00:00
Tom Davis
2991ffd193
Don't call download_multipart for 1 chunk
...
Previously, where a file was larger than `$LARGE_FILE_SIZE` but smaller
than `$CHUNK_SIZE*2`, `download_multipart` would be called but would
only download one (1) chunk that was the whole file.
This fix keeps the same download performance as before but optimizes
processing chunks out.
2016-07-23 16:41:04 -04:00
Tom Davis
24e0314e71
Remove call to seq which may not exist
2016-07-23 01:03:15 -04:00
Al Barrentine
e02c6adc85
Merge pull request #91 from uberbaud/openbsd
...
Add support for OpenBSD
2016-07-20 19:47:18 -04:00
Tom Davis
c0366147e8
Add support for OpenBSD
2016-07-20 18:19:31 -04:00
Tom Davis
a8bb798ce0
Call libpostal_data in source path, not build path
...
This fix updates Makefile to find the actual libpostal_data file when
`configure` is called from another directory, which it uses as the build
directory.
2016-07-20 17:31:52 -04:00
Travis
a0f6e100f1
[auto][ci skip] Adding data files from Travis build #133
2016-07-17 19:13:46 +00:00
Al
12d50aac12
Merge branch 'master' of https://github.com/openvenues/libpostal
2016-07-17 15:03:52 -04:00
Al
83381e9d8a
[expand] Adding exception for a few types of special punctuation (ampersand, plus, pound sign) which should be left in the original string and separated by whitespace. Closes #84 . Closes #85
2016-07-17 15:02:47 -04:00
Travis
2fb677ca73
[auto][ci skip] Adding data files from Travis build #132
2016-07-17 18:47:28 +00:00
David Farrell
a7a9708d2b
don't error on multiple setup_parser()
2016-07-17 11:25:03 -04:00
Al
d7996ed56c
[fix] setting garbage pointer to NULL on language_classifier_teardown ( fixes #82 )
2016-07-17 01:56:09 -04:00
Al
ce78064988
[fix] NULL checks
2016-07-15 13:23:23 -04:00
Al
2f5f226faa
[fix] Add original string to normalizations if all options were set to false
2016-07-15 13:23:23 -04:00
Al
e816b4f77e
[parser] Ignore language/country options explicitly in the parser. The purpose of these options is not to be able to create language-specific/country-specific models at some point, shouldn't be used in the global model
2016-07-06 14:56:46 -04:00
Al
58a5dbe7e0
[logging] Logging the value of LIBPOSTAL_DATA_DIR when a setup error occurs
2016-07-01 14:51:04 -04:00
Al
ad9dfb46bd
[build] Using a process pool with 64MB chunks (similar to aws cli) for S3 downloads. Setting the max concurrent requeests to 10, also the default in aws cli.
2016-07-01 14:37:13 -04:00
Al
a9ba61585b
[fix] Adding set -e to data download script so it fails if any subcommands fail
2016-05-04 23:08:06 -04:00
Al
9819ebf949
[fix] always include expansions in the ambiguous expansion dictionary, no matter which component
2016-04-29 13:26:13 -04:00