[data] deployed model files and training data to CloudFront for easier downloading around the world and in places like China where the Great Fire Wall may prevent large downloads from abroad. TTL is set to 0 so it still caches the files themselves but checks with origin for the If-Modified-Since headers, allowing the files to be updated dynamically
This commit is contained in:
@@ -405,7 +405,7 @@ Libpostal is a bit different because it's trained on open data that's available
|
||||
Training data are stored on S3 by the date they were created. There's also a file stored on S3 to point to the most recent training data. To always point to the latest data, use something like: ```latest=$(curl https://s3.amazonaws.com/libpostal/training_data/latest)``` and use that variable in place of the date.
|
||||
|
||||
### Parser training sets ###
|
||||
All files can be found under s3://libpostal/training_data/YYYY-MM-DD/parser/ as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```.
|
||||
All files can be found at https://d1p366rbd94x8u.cloudfront.net/training_data/$YYYY-MM-DD/parser/$FILE as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```.
|
||||
|
||||
- **formatted_addresses_tagged.random.tsv.gz** (ODBL): OSM addresses. Apartments, PO boxes, categories, etc. are added primarily to these examples
|
||||
- **formatted_places_tagged.random.tsv.gz** (ODBL): every toponym in OSM (even cities represented as points, etc.), reverse-geocoded to its parent admins, possibly including postal codes if they're listed on the point/polygon. Every place gets a base level of representation and places with higher populations get proportionally more.
|
||||
|
||||
Reference in New Issue
Block a user