From a2b84a01771564386b45dc7f542a1babfbfaaa85 Mon Sep 17 00:00:00 2001 From: Al Barrentine Date: Sat, 7 Jan 2017 14:17:31 -0500 Subject: [PATCH] [docs][ci skip] Adding parser label definitions to the README --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 4ad4009c..8f9792d4 100644 --- a/README.md +++ b/README.md @@ -148,6 +148,20 @@ int main(int argc, char **argv) { } ``` +Parser labels +------------- + +The address parser can use any string labels that are defined in the training data, but these are the default labels, based on the fields defined in [OpenCage's address-formatting library](https://github.com/OpenCageData/address-formatting): + +- **house**: venue name e.g. "Brooklyn Academy of Music", and building names e.g. "Empire State Building" +- **house_number**: usually refers to the external (street-facing) building number. In some countries this may be a compount, hyphenated number which also includes an apartment number, or a block number (a la Japan), but libpostal will just call it the house_number for simplicity. +- **road**: street name(s) +- **suburb**: usually an unofficial neighborhood name like "Harlem", "South Bronx", or "Crown Heights" +- **city_district**: these are usually boroughs or districts within a city that serve some official purpose e.g. "Brooklyn" or "Hackney" or "Bratislava IV" +- **city**: any human settlement including cities, towns, villages, hamlets, localities, etc. +- **state_district**: usually a second-level administrative division or county. +- **state**: a first-level administrative division. Scotland, Northern Ireland, Wales, and England in the UK are mapped to "state" as well (convention used in OSM, GeoPlanet, etc.) +- **country**: sovereign nations and their dependent territories, anything with an [ISO-3166 code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2). Examples of normalization -------------------------