Commit Graph

25 Commits

Author SHA1 Message Date
Al
d2732922c2 [data] deployed model files and training data to CloudFront for easier downloading around the world and in places like China where the Great Fire Wall may prevent large downloads from abroad. TTL is set to 0 so it still caches the files themselves but checks with origin for the If-Modified-Since headers, allowing the files to be updated dynamically 2017-04-17 14:11:44 -04:00
Al
7f7aada32a [build] add another housekeeping file in the datadir for data_version. Blow away the exiting files if that file either doesn't exist or doesn't contain a matching version string to help with upgrades 2017-04-07 17:40:27 -04:00
Al
5a96be5d5c [fix][ci skip] S3 upload paths in data upload/download script 2017-04-06 00:37:12 -04:00
Al
267be6c05c [data] 12 worker pool in data download instead of 10 to download the new parser in one shot 2017-03-31 15:52:17 -04:00
Al
a64c81b45b [data/models] updating libpostal download script to download new models. The simple data files are stored by libpostal major version, whereas the models are stored by the version of the training data they used. A file called "latest" is stored in S3 to indicate the latest version of the model and checked on make 2017-03-31 13:35:07 -04:00
Al
6d4c7984df [api] doing this now since we're bumping a major version. Using a libpostal prefixes for all public header functions and definitions 2017-03-31 03:35:51 -04:00
Brad Hards
fb68e22bbf [fix] Use UTC date reference to avoid repeating S3 downloads.
Resolves https://github.com/openvenues/libpostal/issues/143
2016-12-26 12:04:02 +11:00
Al
d575caba8a [data] using UTC for libpostal data files on the Mac version of the download script as well 2016-12-09 19:43:05 -05:00
Al
c3f3896b48 [fix] update test for date function in data download script 2016-12-09 19:29:00 -05:00
Al
01afbf80ef [data] Each curl process will retry the chunk up to 3 times 2016-08-25 23:18:39 -04:00
Tom Davis
18c8e90eb3 Use xargs to start workers as soon as possible 2016-07-27 17:46:44 -04:00
Tom Davis
11abf6cb22 Use posix sh for systems without bash 2016-07-26 20:17:18 -04:00
Tom Davis
2991ffd193 Don't call download_multipart for 1 chunk
Previously, where a file was larger than `$LARGE_FILE_SIZE` but smaller
than `$CHUNK_SIZE*2`, `download_multipart` would be called but would
only download one (1) chunk that was the whole file.

This fix keeps the same download performance as before but optimizes
processing chunks out.
2016-07-23 16:41:04 -04:00
Tom Davis
24e0314e71 Remove call to seq which may not exist 2016-07-23 01:03:15 -04:00
Al
ad9dfb46bd [build] Using a process pool with 64MB chunks (similar to aws cli) for S3 downloads. Setting the max concurrent requeests to 10, also the default in aws cli. 2016-07-01 14:37:13 -04:00
Al
a9ba61585b [fix] Adding set -e to data download script so it fails if any subcommands fail 2016-05-04 23:08:06 -04:00
Al
59e5fcd1b4 [fix] LC_ALL=C in data download script 2016-04-11 12:47:50 -04:00
Al
0d7f9f2032 [data] Using UTC dates for libpostal data file tracking for #38. Also silencing curl when checking if file was updated 2016-03-10 16:44:02 -05:00
Al
c0b548833b [fix] create data dir if it doesn't exist 2016-01-30 13:40:10 -05:00
Al
789db8f582 [build] Adding language classifier to data file download script. As the current file is rather large, added multipart downloads from S3 to speed things up 2016-01-27 03:31:45 -05:00
Al
2950358697 [build] address_parser client now links to libpostal, adding address_parser to download script with an "all" option 2015-12-12 12:49:50 -05:00
Al
6aaa08c220 [fix] Usage on libpostal_data script 2015-10-27 13:33:03 -04:00
Al
588cf1df86 [build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3 2015-10-11 22:25:37 -05:00
Al
91f4e477ad [fix] typo 2015-10-06 12:04:07 -04:00
Al
abfa744d59 [build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install 2015-09-28 17:26:15 -04:00