Merge pull request #183 from openvenues/cdn
Hosting model files and training data on CloudFront CDN
This commit is contained in:
@@ -405,7 +405,7 @@ Libpostal is a bit different because it's trained on open data that's available
|
|||||||
Training data are stored on S3 by the date they were created. There's also a file stored on S3 to point to the most recent training data. To always point to the latest data, use something like: ```latest=$(curl https://s3.amazonaws.com/libpostal/training_data/latest)``` and use that variable in place of the date.
|
Training data are stored on S3 by the date they were created. There's also a file stored on S3 to point to the most recent training data. To always point to the latest data, use something like: ```latest=$(curl https://s3.amazonaws.com/libpostal/training_data/latest)``` and use that variable in place of the date.
|
||||||
|
|
||||||
### Parser training sets ###
|
### Parser training sets ###
|
||||||
All files can be found under s3://libpostal/training_data/YYYY-MM-DD/parser/ as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```.
|
All files can be found at https://d1p366rbd94x8u.cloudfront.net/training_data/$YYYY-MM-DD/parser/$FILE as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```.
|
||||||
|
|
||||||
- **formatted_addresses_tagged.random.tsv.gz** (ODBL): OSM addresses. Apartments, PO boxes, categories, etc. are added primarily to these examples
|
- **formatted_addresses_tagged.random.tsv.gz** (ODBL): OSM addresses. Apartments, PO boxes, categories, etc. are added primarily to these examples
|
||||||
- **formatted_places_tagged.random.tsv.gz** (ODBL): every toponym in OSM (even cities represented as points, etc.), reverse-geocoded to its parent admins, possibly including postal codes if they're listed on the point/polygon. Every place gets a base level of representation and places with higher populations get proportionally more.
|
- **formatted_places_tagged.random.tsv.gz** (ODBL): every toponym in OSM (even cities represented as points, etc.), reverse-geocoded to its parent admins, possibly including postal codes if they're listed on the point/polygon. Every place gets a base level of representation and places with higher populations get proportionally more.
|
||||||
|
|||||||
@@ -11,7 +11,8 @@ LIBPOSTAL_VERSION_STRING="v1"
|
|||||||
|
|
||||||
LIBPOSTAL_S3_BUCKET_NAME="libpostal"
|
LIBPOSTAL_S3_BUCKET_NAME="libpostal"
|
||||||
LIBPOSTAL_S3_KEY="s3://$LIBPOSTAL_S3_BUCKET_NAME"
|
LIBPOSTAL_S3_KEY="s3://$LIBPOSTAL_S3_BUCKET_NAME"
|
||||||
LIBPOSTAL_S3_BUCKET_URL="http://$LIBPOSTAL_S3_BUCKET_NAME.s3.amazonaws.com"
|
LIBPOSTAL_S3_BUCKET_URL="https://$LIBPOSTAL_S3_BUCKET_NAME.s3.amazonaws.com"
|
||||||
|
LIBPOSTAL_CLOUDFRONT_URL="https://d1p366rbd94x8u.cloudfront.net"
|
||||||
LIBPOSTAL_DATA_FILE="libpostal_data.tar.gz"
|
LIBPOSTAL_DATA_FILE="libpostal_data.tar.gz"
|
||||||
LIBPOSTAL_PARSER_FILE="parser.tar.gz"
|
LIBPOSTAL_PARSER_FILE="parser.tar.gz"
|
||||||
LIBPOSTAL_LANG_CLASS_FILE="language_classifier.tar.gz"
|
LIBPOSTAL_LANG_CLASS_FILE="language_classifier.tar.gz"
|
||||||
@@ -112,7 +113,7 @@ download_file() {
|
|||||||
|
|
||||||
echo "Checking for new libpostal $name..."
|
echo "Checking for new libpostal $name..."
|
||||||
|
|
||||||
url=$LIBPOSTAL_S3_BUCKET_URL/$prefix/$filename
|
url=$LIBPOSTAL_CLOUDFRONT_URL/$prefix/$filename
|
||||||
|
|
||||||
if [ $(curl -sI $url -z "$(cat $updated_path)" --remote-time -w %{http_code} -o /dev/null | grep "^200$") ]; then
|
if [ $(curl -sI $url -z "$(cat $updated_path)" --remote-time -w %{http_code} -o /dev/null | grep "^200$") ]; then
|
||||||
echo "New libpostal $name available"
|
echo "New libpostal $name available"
|
||||||
|
|||||||
Reference in New Issue
Block a user