From b8f65d0a06abd9480e383136bdee3c0875067ad9 Mon Sep 17 00:00:00 2001 From: Al Date: Mon, 3 Apr 2017 18:18:02 -0400 Subject: [PATCH] [docs] aesthetic README changes --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 30a23e0e..3cc5f2fc 100644 --- a/README.md +++ b/README.md @@ -398,8 +398,8 @@ Libpostal is a bit different because it's trained on open data that's available Training data are stored on S3 by the date they were created. There's also a file stored on S3 to point to the most recent training data. To always point to the latest data, use something like: ```latest=$(curl https://s3.amazonaws.com/libpostal/training_data/latest)``` and use that variable in place of the date. -### Parser training data ### -All files can be found at s3://libpostal/training_data/YYYY-MM-DD/parser/ as gzip'd tab-separated values (TSV) files formatted as ```language\tcountry\tformatted_address```. +### Parser training sets ### +All files can be found under s3://libpostal/training_data/YYYY-MM-DD/parser/ as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```. - **formatted_addresses_tagged.random.tsv.gz** (ODBL): OSM addresses. Apartments, PO boxes, categories, etc. are added primarily to these examples - **formatted_places_tagged.random.tsv.gz** (ODBL): every toponym in OSM (even cities represented as points, etc.), reverse-geocoded to its parent admins, possibly including postal codes if they're listed on the point/polygon. Every place gets a base level of representation and places with higher populations get proportionally more. @@ -586,7 +586,7 @@ Preprocessing (Python) The [geodata](https://github.com/openvenues/libpostal/tree/master/scripts/geodata) Python package in the libpostal repo contains the pipeline for preprocessing the various geo data sets and building training data for the C models to use. -This package shouldn't be needed for most users. However, for those interested in generating new types of addresses or improving libpostal's training data, this is where it all happens. +This package shouldn't be needed for most users, but for those interested in generating new types of addresses or improving libpostal's training data, this is where to look. Address parser accuracy -----------------------