@@ -22,11 +22,11 @@ addons:
|
||||
- ubuntu-toolchain-r-test
|
||||
packages:
|
||||
- gcc-4.8
|
||||
- libsnappy-dev
|
||||
- pkg-config
|
||||
before_script:
|
||||
- ./bootstrap.sh
|
||||
- if [[ $DICTIONARIES_CHANGED -ne 0 || $NUMEX_CHANGED -ne 0 ]]; then git clone https://github.com/pypa/virtualenv; cd virtualenv; git checkout master; python virtualenv.py ../env; cd ..; env/bin/pip install -r scripts/requirements-simple.txt; fi;
|
||||
- if [ $NUMEX_CHANGED -ne 0 ]; then env/bin/python scripts/geodata/i18n/numex.py; fi;
|
||||
- if [ $NUMEX_CHANGED -ne 0 ]; then env/bin/python scripts/geodata/numbers/numex.py; fi;
|
||||
- if [ $DICTIONARIES_CHANGED -ne 0 ]; then env/bin/python scripts/geodata/address_expansions/address_dictionaries.py; fi;
|
||||
install:
|
||||
- if [ "$CC" = "gcc" ]; then export CC="gcc-4.8"; fi
|
||||
|
||||
284
README.md
284
README.md
@@ -4,20 +4,20 @@
|
||||
[](#sponsors)
|
||||
[](#backers)
|
||||
|
||||
libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. For a more comprehensive overview of the research behind libpostal, check out the introductory blog post: [Statistical NLP on OpenStreetMap](https://medium.com/@albarrentine/statistical-nlp-on-openstreetmap-b9d573e6cc86)
|
||||
|
||||
<span>🇧🇷</span> <span>🇫🇮</span> <span>🇳🇬</span> :jp: <span>🇽🇰 </span> <span>🇧🇩 </span> <span>🇵🇱 </span> <span>🇻🇳 </span> <span>🇧🇪 </span> <span>🇲🇦 </span> <span>🇺🇦 </span> <span>🇯🇲 </span> :ru: <span>🇮🇳 </span> <span>🇱🇻 </span> <span>🇧🇴 </span> :de: <span>🇸🇳 </span> <span>🇦🇲 </span> :kr: <span>🇳🇴 </span> <span>🇲🇽 </span> <span>🇨🇿 </span> <span>🇹🇷 </span> :es: <span>🇸🇸 </span> <span>🇪🇪 </span> <span>🇧🇭 </span> <span>🇳🇱 </span> :cn: <span>🇵🇹 </span> <span>🇵🇷 </span> :gb: <span>🇵🇸 </span>
|
||||
|
||||
libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. For a more comprehensive overview of the research, check out the [introductory blog post](https://medium.com/@albarrentine/statistical-nlp-on-openstreetmap-b9d573e6cc86), but to sum up, the goal of this project is to understand location-based strings in every language, everywhere.
|
||||
Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, simpler, and more consistent internationally.
|
||||
|
||||
<span>🇷🇴 </span> <span>🇬🇭 </span> <span>🇦🇺 </span> <span>🇲🇾 </span> <span>🇭🇷 </span> <span>🇭🇹 </span> :us: <span>🇿🇦 </span> <span>🇷🇸 </span> <span>🇨🇱 </span> :it: <span>🇰🇪 <span>🇨🇭 </span> <span>🇨🇺 </span> <span>🇸🇰 </span> <span>🇦🇴 </span> <span>🇩🇰 </span> <span>🇹🇿 </span> <span>🇦🇱 </span> <span>🇨🇴 </span> <span>🇮🇱 </span> <span>🇬🇹 </span> :fr: <span>🇵🇭 </span> <span>🇦🇹 </span> <span>🇱🇨 </span> <span>🇮🇸 <span>🇮🇩 </span> </span> <span>🇦🇪 </span> </span> <span>🇸🇰 </span> <span>🇹🇳 </span> <span>🇰🇭 </span> <span>🇦🇷 </span> <span>🇭🇰 </span>
|
||||
|
||||
Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, simpler, and more consistent internationally.
|
||||
|
||||
The core library is written in pure C. Language bindings for [Python](https://github.com/openvenues/pypostal), [Ruby](https://github.com/openvenues/ruby_postal), [Go](https://github.com/openvenues/gopostal), [Java](https://github.com/openvenues/jpostal), [PHP](https://github.com/openvenues/php-postal), and [NodeJS](https://github.com/openvenues/node-postal) are officially supported and it's easy to write bindings in other languages.
|
||||
|
||||
Sponsors
|
||||
------------
|
||||
--------
|
||||
|
||||
If your company is using libpostal, consider asking your organization to sponsor the project and help fund our continued research into geo + NLP. Interpreting what humans mean when they refer to locations is far from a solved problem, and sponsorships help us pursue new frontiers in machine geospatial intelligence. As a sponsor, your company logo will appear prominently on the Github repo page along with a link to your site. [Sponsorship info](https://opencollective.com/libpostal#sponsor)
|
||||
If your company is using libpostal, consider asking your organization to sponsor the project. Interpreting what humans mean when they refer to locations is far from a solved problem, and sponsorships help us pursue new frontiers in geospatial NLP. As a sponsor, your company logo will appear prominently on the Github repo page along with a link to your site. [Sponsorship info](https://opencollective.com/libpostal#sponsor)
|
||||
|
||||
<a href="https://opencollective.com/libpostal/sponsor/0/website" target="_blank"><img src="https://opencollective.com/libpostal/sponsor/0/avatar.svg"></a>
|
||||
<a href="https://opencollective.com/libpostal/sponsor/1/website" target="_blank"><img src="https://opencollective.com/libpostal/sponsor/1/avatar.svg"></a>
|
||||
@@ -86,25 +86,67 @@ Individual users can also help support open geo NLP research by making a monthly
|
||||
<a href="https://opencollective.com/libpostal/backer/28/website" target="_blank"><img src="https://opencollective.com/libpostal/backer/28/avatar.svg"></a>
|
||||
<a href="https://opencollective.com/libpostal/backer/29/website" target="_blank"><img src="https://opencollective.com/libpostal/backer/29/avatar.svg"></a>
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
Before you install, make sure you have the following prerequisites:
|
||||
|
||||
**On Ubuntu/Debian**
|
||||
```
|
||||
sudo apt-get install curl autoconf automake libtool pkg-config
|
||||
```
|
||||
|
||||
**On CentOS/RHEL**
|
||||
```
|
||||
sudo yum install curl autoconf automake libtool pkgconfig
|
||||
```
|
||||
|
||||
**On Mac OSX**
|
||||
```
|
||||
brew install curl autoconf automake libtool pkg-config
|
||||
```
|
||||
|
||||
Then to install the C library:
|
||||
|
||||
```
|
||||
git clone https://github.com/openvenues/libpostal
|
||||
cd libpostal
|
||||
./bootstrap.sh
|
||||
./configure --datadir=[...some dir with a few GB of space...]
|
||||
make
|
||||
sudo make install
|
||||
|
||||
# On Linux it's probably a good idea to run
|
||||
sudo ldconfig
|
||||
```
|
||||
|
||||
libpostal has support for pkg-config, so you can use the pkg-config to print the flags needed to link your program against it:
|
||||
|
||||
```
|
||||
pkg-config --cflags libpostal # print compiler flags
|
||||
pkg-config --libs libpostal # print linker flags
|
||||
pkg-config --cflags --libs libpostal # print both
|
||||
```
|
||||
|
||||
For example, if you write a program called app.c, you can compile it like this:
|
||||
|
||||
```
|
||||
gcc app.c `pkg-config --cflags --libs libpostal`
|
||||
```
|
||||
|
||||
Examples of parsing
|
||||
-------------------
|
||||
|
||||
libpostal implements the first statistical address parser that works well internationally,
|
||||
trained on ~50 million addresses in over 100 countries and as many
|
||||
languages. We use OpenStreetMap (anything with an addr:* tag) and the OpenCage
|
||||
address format templates at: https://github.com/OpenCageData/address-formatting
|
||||
to construct the training data, supplementing with containing polygons and
|
||||
perturbing the inputs in a number of ways to make the parser as robust as possible
|
||||
to messy real-world input.
|
||||
libpostal's international address parser uses machine learning (Conditional Random Fields) and is trained on over 1 billion addresses in every inhabited country on Earth. We use [OpenStreetMap](https://openstreetmap.org) and [OpenAddresses](https://openaddresses.io) as sources of structured addresses, and the OpenCage address format templates at: https://github.com/OpenCageData/address-formatting to construct the training data, supplementing with containing polygons, and generating sub-building components like apartment/floor numbers and PO boxes. We also add abbreviations, drop out components at random, etc. to make the parser as robust as possible to messy real-world input.
|
||||
|
||||
These example parse results are taken from the interactive address_parser program
|
||||
that builds with libpostal when you run ```make```. Note that the parser is robust to
|
||||
commas vs. no commas, casing, different permutations of components (if the input
|
||||
that builds with libpostal when you run ```make```. Note that the parser can handle
|
||||
commas vs. no commas as well as various casings and permutations of components (if the input
|
||||
is e.g. just city or just city/postcode).
|
||||
|
||||

|
||||

|
||||
|
||||
The parser achieves very high accuracy on held-out data, currently 98.9%
|
||||
The parser achieves very high accuracy on held-out data, currently 99.45%
|
||||
correct full parses (meaning a 1 in the numerator for getting *every* token
|
||||
in the address correct).
|
||||
|
||||
@@ -132,15 +174,15 @@ int main(int argc, char **argv) {
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
address_parser_options_t options = get_libpostal_address_parser_default_options();
|
||||
address_parser_response_t *parsed = parse_address("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA", options);
|
||||
libpostal_address_parser_options_t options = libpostal_get_address_parser_default_options();
|
||||
libpostal_address_parser_response_t *parsed = libpostal_parse_address("781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA", options);
|
||||
|
||||
for (size_t i = 0; i < parsed->num_components; i++) {
|
||||
printf("%s: %s\n", parsed->labels[i], parsed->components[i]);
|
||||
}
|
||||
|
||||
// Free parse result
|
||||
address_parser_response_destroy(parsed);
|
||||
libpostal_address_parser_response_destroy(parsed);
|
||||
|
||||
// Teardown (only called once at the end of your program)
|
||||
libpostal_teardown();
|
||||
@@ -151,17 +193,28 @@ int main(int argc, char **argv) {
|
||||
Parser labels
|
||||
-------------
|
||||
|
||||
The address parser can use any string labels that are defined in the training data, but these are the default labels, based on the fields defined in [OpenCage's address-formatting library](https://github.com/OpenCageData/address-formatting):
|
||||
The address parser can technically use any string labels that are defined in the training data, but these are the ones currently defined, based on the fields defined in [OpenCage's address-formatting library](https://github.com/OpenCageData/address-formatting), as well as a few added by libpostal to handle specific patterns:
|
||||
|
||||
- **house**: venue name e.g. "Brooklyn Academy of Music", and building names e.g. "Empire State Building"
|
||||
- **category**: for category queries like "restaurants", etc.
|
||||
- **near**: phrases like "in", "near", etc. used after a category phrase to help with parsing queries like "restaurants in Brooklyn"
|
||||
- **house_number**: usually refers to the external (street-facing) building number. In some countries this may be a compount, hyphenated number which also includes an apartment number, or a block number (a la Japan), but libpostal will just call it the house_number for simplicity.
|
||||
- **road**: street name(s)
|
||||
- **unit**: an apartment, unit, office, lot, or other secondary unit designator
|
||||
- **level**: expressions indicating a floor number e.g. "3rd Floor", "Ground Floor", etc.
|
||||
- **staircase**: numbered/lettered staircase
|
||||
- **entrance**: numbered/lettered entrance
|
||||
- **po_box**: post office box: typically found in non-physical (mail-only) addresses
|
||||
- **postcode**: postal codes used for mail sorting
|
||||
- **suburb**: usually an unofficial neighborhood name like "Harlem", "South Bronx", or "Crown Heights"
|
||||
- **city_district**: these are usually boroughs or districts within a city that serve some official purpose e.g. "Brooklyn" or "Hackney" or "Bratislava IV"
|
||||
- **city**: any human settlement including cities, towns, villages, hamlets, localities, etc.
|
||||
- **island**: named islands e.g. "Maui"
|
||||
- **state_district**: usually a second-level administrative division or county.
|
||||
- **state**: a first-level administrative division. Scotland, Northern Ireland, Wales, and England in the UK are mapped to "state" as well (convention used in OSM, GeoPlanet, etc.)
|
||||
- **country_region**: informal subdivision of a country without any political status
|
||||
- **country**: sovereign nations and their dependent territories, anything with an [ISO-3166 code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2).
|
||||
- **world_region**: currently only used for appending “West Indies” after the country name, a pattern frequently used in the English-speaking Caribbean e.g. “Jamaica, West Indies”
|
||||
|
||||
Examples of normalization
|
||||
-------------------------
|
||||
@@ -188,8 +241,7 @@ Here's a short list of some less straightforward normalizations in various langu
|
||||
| Marktstrasse 14 | markt straße 14 |
|
||||
|
||||
libpostal currently supports these types of normalizations in *60+ languages*,
|
||||
and you can [add more](https://github.com/openvenues/libpostal/tree/master/resources/dictionaries)
|
||||
(without having to write any C).
|
||||
and you can [add more](https://github.com/openvenues/libpostal/tree/master/resources/dictionaries) (without having to write any C).
|
||||
|
||||
For further reading and some bizarre address edge-cases, see:
|
||||
[Falsehoods Programmers Believe About Addresses](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/).
|
||||
@@ -220,15 +272,15 @@ int main(int argc, char **argv) {
|
||||
}
|
||||
|
||||
size_t num_expansions;
|
||||
normalize_options_t options = get_libpostal_default_options();
|
||||
char **expansions = expand_address("Quatre-vingt-douze Ave des Champs-Élysées", options, &num_expansions);
|
||||
libpostal_normalize_options_t options = libpostal_get_default_options();
|
||||
char **expansions = libpostal_expand_address("Quatre-vingt-douze Ave des Champs-Élysées", options, &num_expansions);
|
||||
|
||||
for (size_t i = 0; i < num_expansions; i++) {
|
||||
printf("%s\n", expansions[i]);
|
||||
}
|
||||
|
||||
// Free expansions
|
||||
expansion_array_destroy(expansions, num_expansions);
|
||||
libpostal_expansion_array_destroy(expansions, num_expansions);
|
||||
|
||||
// Teardown (only called once at the end of your program)
|
||||
libpostal_teardown();
|
||||
@@ -236,54 +288,38 @@ int main(int argc, char **argv) {
|
||||
}
|
||||
```
|
||||
|
||||
Installation
|
||||
------------
|
||||
Command-line usage (expand)
|
||||
---------------------------
|
||||
|
||||
Before you install, make sure you have the following prerequisites:
|
||||
|
||||
**On Ubuntu/Debian**
|
||||
```
|
||||
sudo apt-get install curl libsnappy-dev autoconf automake libtool pkg-config
|
||||
```
|
||||
|
||||
**On CentOS/RHEL**
|
||||
```
|
||||
sudo yum install snappy snappy-devel autoconf automake libtool pkgconfig
|
||||
```
|
||||
|
||||
**On Mac OSX**
|
||||
```
|
||||
brew install snappy autoconf automake libtool pkg-config
|
||||
```
|
||||
|
||||
Then to install the C library:
|
||||
After building libpostal:
|
||||
|
||||
```
|
||||
git clone https://github.com/openvenues/libpostal
|
||||
cd libpostal
|
||||
./bootstrap.sh
|
||||
./configure --datadir=[...some dir with a few GB of space...]
|
||||
make
|
||||
sudo make install
|
||||
cd src/
|
||||
|
||||
# On Linux it's probably a good idea to run
|
||||
sudo ldconfig
|
||||
./libpostal "Quatre vingt douze Ave des Champs-Élysées"
|
||||
```
|
||||
|
||||
libpostal has support for pkg-config, so you can use the pkg-config to print the flags needed to link your program against it:
|
||||
If you have a text file or stream with one address per line, the command-line interface also accepts input from stdin:
|
||||
|
||||
```
|
||||
pkg-config --cflags libpostal # print compiler flags
|
||||
pkg-config --libs libpostal # print linker flags
|
||||
pkg-config --cflags --libs libpostal # print both
|
||||
cat some_file | ./libpostal --json
|
||||
```
|
||||
|
||||
For example, if you write a program called app.c, you can compile it like this:
|
||||
Command-line usage (parser)
|
||||
---------------------------
|
||||
|
||||
After building libpostal:
|
||||
|
||||
```
|
||||
gcc app.c `pkg-config --cflags --libs libpostal`
|
||||
cd src/
|
||||
|
||||
./address_parser
|
||||
```
|
||||
|
||||
address_parser is an interactive shell. Just type addresses and libpostal will
|
||||
parse them and print the result.
|
||||
|
||||
|
||||
Bindings
|
||||
--------
|
||||
|
||||
@@ -316,43 +352,11 @@ Libpostal is designed to be used by higher-level languages. If you don't see yo
|
||||
|
||||
- Libpostal REST Docker [Libpostal REST Docker](https://github.com/johnlonganecker/libpostal-rest-docker)
|
||||
|
||||
|
||||
**Libpostal ZeroMQ Docker**
|
||||
|
||||
- Libpostal ZeroMQ Docker image: [pasupulaphani/libpostal-zeromq](https://hub.docker.com/r/pasupulaphani/libpostal-zeromq/) , Source: [Github](https://github.com/pasupulaphani/libpostal-docker)
|
||||
|
||||
|
||||
Command-line usage (expand)
|
||||
---------------------------
|
||||
|
||||
After building libpostal:
|
||||
|
||||
```
|
||||
cd src/
|
||||
|
||||
./libpostal "Quatre vingt douze Ave des Champs-Élysées"
|
||||
```
|
||||
|
||||
If you have a text file or stream with one address per line, the command-line interface also accepts input from stdin:
|
||||
|
||||
```
|
||||
cat some_file | ./libpostal --json
|
||||
```
|
||||
|
||||
Command-line usage (parser)
|
||||
---------------------------
|
||||
|
||||
After building libpostal:
|
||||
|
||||
```
|
||||
cd src/
|
||||
|
||||
./address_parser
|
||||
```
|
||||
|
||||
address_parser is an interactive shell. Just type addresses and libpostal will
|
||||
parse them and print the result.
|
||||
|
||||
Tests
|
||||
-----
|
||||
|
||||
@@ -364,19 +368,18 @@ make check
|
||||
|
||||
Adding [test cases](https://github.com/openvenues/libpostal/tree/master/test) is easy, even if your C is rusty/non-existent, and we'd love contributions. We use mostly functional tests checking string input against string output.
|
||||
|
||||
libpostal also gets periodically battle-tested on tens of millions of addresses from OSM (clean) as well as anonymized queries from a production geocoder (not so clean). During this process we use valgrind to check for memory leaks and other errors.
|
||||
libpostal also gets periodically battle-tested on millions of addresses from OSM (clean) as well as anonymized queries from a production geocoder (not so clean). During this process we use valgrind to check for memory leaks and other errors.
|
||||
|
||||
Data files
|
||||
----------
|
||||
|
||||
libpostal needs to download some data files from S3. The basic files are on-disk
|
||||
representations of the data structures necessary to perform expansion. For address
|
||||
parsing, since model training takes about a day, we publish the fully trained model
|
||||
to S3 and will update it automatically as new addresses get added to OSM. Same goes for
|
||||
the language classifier model.
|
||||
parsing, since model training takes a few days, we publish the fully trained model
|
||||
to S3 and will update it automatically as new addresses get added to OSM, OpenAddresses, etc. Same goes for the language classifier model.
|
||||
|
||||
Data files are automatically downloaded when you run make. To check for and download
|
||||
any new data files, run:
|
||||
any new data files, you can either run ```make```, or run:
|
||||
|
||||
```
|
||||
libpostal_data download all $YOUR_DATA_DIR/libpostal
|
||||
@@ -389,6 +392,27 @@ Language dictionaries
|
||||
|
||||
libpostal contains a number of per-language dictionaries that influence expansion, the language classifier, and the parser. To explore the dictionaries or contribute abbreviations/phrases in your language, see [resources/dictionaries](https://github.com/openvenues/libpostal/tree/master/resources/dictionaries).
|
||||
|
||||
Training data
|
||||
-------------
|
||||
|
||||
In machine learning, large amounts of training data are often essential for getting good results. Many open-source machine learning projects either release only the model code (results reproducible if and only if you're Google), or a pre-baked model where the training conditions are unknown.
|
||||
|
||||
Libpostal is a bit different because it's trained on open data that's available to everyone, so we've released the entire training pipeline (the [geodata](https://github.com/openvenues/libpostal/tree/master/scripts/geodata) package in this repo), as well as the resulting training data itself on S3. It's over 100GB unzipped.
|
||||
|
||||
Training data are stored on S3 by the date they were created. There's also a file stored on S3 to point to the most recent training data. To always point to the latest data, use something like: ```latest=$(curl https://s3.amazonaws.com/libpostal/training_data/latest)``` and use that variable in place of the date.
|
||||
|
||||
### Parser training sets ###
|
||||
All files can be found under s3://libpostal/training_data/YYYY-MM-DD/parser/ as gzip'd tab-separated values (TSV) files formatted like:```language\tcountry\taddress```.
|
||||
|
||||
- **formatted_addresses_tagged.random.tsv.gz** (ODBL): OSM addresses. Apartments, PO boxes, categories, etc. are added primarily to these examples
|
||||
- **formatted_places_tagged.random.tsv.gz** (ODBL): every toponym in OSM (even cities represented as points, etc.), reverse-geocoded to its parent admins, possibly including postal codes if they're listed on the point/polygon. Every place gets a base level of representation and places with higher populations get proportionally more.
|
||||
- **formatted_ways_tagged.random.tsv.gz** (ODBL): every street in OSM (ways with highway=*, with a few conditions), reverse-geocoded to its admins
|
||||
- **geoplanet_formatted_addresses_tagged.random.tsv.gz** (CC-BY): every postal code in Yahoo GeoPlanet (includes almost every postcode in the UK, Canada, etc.) and their parent admins. The GeoPlanet admins have been cleaned up and mapped to libpostal's tagset
|
||||
- **openaddresses_formatted_addresses_tagged.random.tsv.gz** (various licenses, mostly CC-BY): most of the address data sets from [OpenAddresses](https://openaddresses.io/), which in turn come directly from government sources
|
||||
- **uk_openaddresses_formatted_addresses_tagged.random.tsv.gz** (CC-BY): addresses from [OpenAddresses UK](https://alpha.openaddressesuk.org/)
|
||||
|
||||
If the parser doesn't perform as well as you'd hoped on a particular type of address, the best recourse is to use grep/awk to look through the training data and try to determine if there's some pattern/style of address that's not being captured.
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
@@ -399,13 +423,13 @@ whitespace e.g. Chinese) are supported, as are Germanic languages where
|
||||
thoroughfare types are concatenated onto the end of the string, and may
|
||||
optionally be separated so Rosenstraße and Rosen Straße are equivalent.
|
||||
|
||||
- **International address parsing**: sequence model which parses
|
||||
- **International address parsing**: [Conditional Random Field](http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/) which parses
|
||||
"123 Main Street New York New York" into {"house_number": 123, "road":
|
||||
"Main Street", "city": "New York", "state": "New York"}. The parser works
|
||||
for a wide variety of countries and languages, not just US/English.
|
||||
The model is trained on > 50M OSM addresses, using the
|
||||
The model is trained on over 1 billion addresses and address-like strings, using the
|
||||
templates in the [OpenCage address formatting repo](https://github.com/OpenCageData/address-formatting) to construct formatted,
|
||||
tagged traning examples for most countries around the world. Many types of [normalizations](https://github.com/openvenues/libpostal/blob/master/scripts/geodata/osm/osm_address_training_data.py)
|
||||
tagged traning examples for every inhabited country in the world. Many types of [normalizations](https://github.com/openvenues/libpostal/blob/master/scripts/geodata/addresses/components.py)
|
||||
are performed to make the training data resemble real messy geocoder input as closely as possible.
|
||||
|
||||
- **Language classification**: multinomial logistic regression
|
||||
@@ -446,34 +470,11 @@ Latin scripts in the same address). In transliteration we can use all
|
||||
applicable transliterators for a given Unicode script (Greek can for instance
|
||||
be transliterated with Greek-Latin, Greek-Latin-BGN and Greek-Latin-UNGEGN).
|
||||
|
||||
Roadmap
|
||||
-------
|
||||
|
||||
- **Geographic name aliasing (coming soon)**: New York, NYC and Nueva York alias
|
||||
to New York City. Uses the crowd-sourced GeoNames (geonames.org) database, so alternate
|
||||
names added by contributors can automatically improve libpostal.
|
||||
|
||||
- **Geographic disambiguation (coming soon)**: There are several equally
|
||||
likely Springfields in the US (formally known as The Simpsons problem), and
|
||||
some context like a state is required to disambiguate. There are also > 1200
|
||||
distinct San Franciscos in the world but the term "San Francisco" almost always
|
||||
refers to the one in California. Williamsburg can refer to a neighborhood in
|
||||
Brooklyn or a city in Virginia. Geo disambiguation is a subset of Word Sense
|
||||
Disambiguation, and attempts to resolve place names in a string to GeoNames
|
||||
entities. This can be useful for city-level geocoding suitable for polygon/area
|
||||
lookup. By default, if there is no other context, as in the San Francisco case,
|
||||
the most populous entity will be selected.
|
||||
|
||||
- **Ambiguous token classification (coming soon)**: e.g. "dr" => "doctor" or
|
||||
"drive" for an English address depending on the context. Multiclass logistic
|
||||
regression trained on OSM addresses, where abbreviations are discouraged,
|
||||
giving us many examples of fully qualified addresses on which to train.
|
||||
|
||||
Non-goals
|
||||
---------
|
||||
|
||||
- Verifying that a location is a valid address
|
||||
- Street-level geocoding
|
||||
- Actually geocoding addresses to a lat/lon (that requires a database/search index)
|
||||
|
||||
Raison d'être
|
||||
-------------
|
||||
@@ -568,8 +569,8 @@ isn't as important because everything's being done in parallel, but there are
|
||||
some streaming ingestion applications at Mapzen where this needs to
|
||||
run in-process.
|
||||
|
||||
C codebase
|
||||
----------
|
||||
C conventions
|
||||
-------------
|
||||
|
||||
libpostal is written in modern, legible, C99 and uses the following conventions:
|
||||
|
||||
@@ -581,31 +582,30 @@ libpostal is written in modern, legible, C99 and uses the following conventions:
|
||||
- Generic containers (via [klib](https://github.com/attractivechaos/klib)) whenever possible
|
||||
- Data structrues take advantage of sparsity as much as possible
|
||||
- Efficient double-array trie implementation for most string dictionaries
|
||||
- Tries to stay cross-platform as much as possible, particularly for *nix
|
||||
- Cross-platform as much as possible, particularly for *nix
|
||||
|
||||
Python codebase
|
||||
---------------
|
||||
Preprocessing (Python)
|
||||
----------------------
|
||||
|
||||
The [geodata](https://github.com/openvenues/libpostal/tree/master/scripts/geodata) package in the libpostal repo is a confederation of scripts for preprocessing the various geo
|
||||
data sets and building input files for the C lib to use during model training.
|
||||
Said scripts shouldn't be needed for most users unless you're rebuilding data
|
||||
files for the C lib.
|
||||
The [geodata](https://github.com/openvenues/libpostal/tree/master/scripts/geodata) Python package in the libpostal repo contains the pipeline for preprocessing the various geo
|
||||
data sets and building training data for the C models to use.
|
||||
This package shouldn't be needed for most users, but for those interested in generating new types of addresses or improving libpostal's training data, this is where to look.
|
||||
|
||||
Address parser accuracy
|
||||
-----------------------
|
||||
|
||||
On held-out test data (meaning labeled parses that the model has _not_ seen
|
||||
before), the address parser achieves 98.9% full parse accuracy.
|
||||
before), the address parser achieves 99.45% full parse accuracy.
|
||||
|
||||
For some tasks like named entity recognition it's preferable to use something
|
||||
like an F1 score or variants, mostly because there's a class bias problem (most
|
||||
tokens are non-entities, and a system that simply predicted non-entity for
|
||||
words are non-entities, and a system that simply predicted non-entity for
|
||||
every token would actually do fairly well in terms of accuracy). That is not
|
||||
the case for address parsing. Every token has a label and there are millions
|
||||
of examples of each class in the training data, so accuracy is preferable as it's
|
||||
a clean, simple and intuitive measure of performance.
|
||||
|
||||
Here we use full parse accuracy, meaning we only give the parser a "point" in
|
||||
Here we use full parse accuracy, meaning we only give the parser one "point" in
|
||||
the numerator if it gets every single token in the address correct. That should
|
||||
be a better measure than simply looking at whether each token was correct.
|
||||
|
||||
@@ -614,7 +614,7 @@ Improving the address parser
|
||||
|
||||
Though the current parser works quite well for most standard addresses, there
|
||||
is still room for improvement, particularly in making sure the training data
|
||||
we use is as close as possible to addresses in the wild. There are four primary
|
||||
we use is as close as possible to addresses in the wild. There are two primary
|
||||
ways the address parser can be improved even further (in order of difficulty):
|
||||
|
||||
1. Contribute addresses to OSM. Anything with an addr:housenumber tag will be
|
||||
@@ -622,22 +622,12 @@ ways the address parser can be improved even further (in order of difficulty):
|
||||
2. If the address parser isn't working well for a particular country, language
|
||||
or style of address, chances are that some name variations or places being
|
||||
missed/mislabeled during training data creation. Sometimes the fix is to
|
||||
add more countries at: https://github.com/OpenCageData/address-formatting,
|
||||
update the formats at: https://github.com/OpenCageData/address-formatting,
|
||||
and in many other cases there are relatively simple tweaks we can make
|
||||
when creating the training data that will ensure the model is trained to
|
||||
handle your use case without you having to do any manual data entry.
|
||||
If you see a pattern of obviously bad address parses, the best thing to
|
||||
do is post an issue to Github.
|
||||
3. We currently don't have training data for things like apartment/flat numbers.
|
||||
The tags are fairly uncommon in OSM and the address-formatting templates
|
||||
don't use floor, level, apartment/flat number, etc. This would be a slightly
|
||||
more involved effort, but would be worth starting a discussion.
|
||||
4. We use a greedy averaged perceptron for the parser model primarily for its
|
||||
speed and relatively good performance compared to slower, fancier models.
|
||||
Viterbi inference using a linear-chain CRF may improve parser performance
|
||||
on certain classes of input since the score is the argmax over the entire
|
||||
label sequence not just the token. This may slow down training significantly
|
||||
although runtime performance would be relatively unaffected.
|
||||
|
||||
Contributing
|
||||
------------
|
||||
|
||||
34
configure.ac
34
configure.ac
@@ -1,7 +1,13 @@
|
||||
# -*- Autoconf -*-
|
||||
# Process this file with autoconf to produce a configure script.
|
||||
|
||||
AC_INIT([libpostal], [0.3.3])
|
||||
m4_define(LIBPOSTAL_MAJOR_VERSION, [1])
|
||||
m4_define(LIBPOSTAL_MINOR_VERSION, [0])
|
||||
m4_define(LIBPOSTAL_PATCH_VERSION, [0])
|
||||
|
||||
AC_INIT([libpostal], LIBPOSTAL_MAJOR_VERSION.LIBPOSTAL_MINOR_VERSION.LIBPOSTAL_PATCH_VERSION)
|
||||
|
||||
AC_CONFIG_MACRO_DIR([m4])
|
||||
|
||||
AM_INIT_AUTOMAKE([foreign subdir-objects])
|
||||
AC_CONFIG_SRCDIR([src])
|
||||
@@ -16,9 +22,6 @@ AC_PROG_INSTALL
|
||||
LDFLAGS="$LDFLAGS -L/usr/local/lib"
|
||||
|
||||
# Checks for libraries.
|
||||
AC_SEARCH_LIBS([snappy_compress],
|
||||
[snappy],,[AC_MSG_ERROR([Could not find snappy])
|
||||
])
|
||||
AC_SEARCH_LIBS([log],
|
||||
[m],,[AC_MSG_ERROR([Could not find math library])])
|
||||
|
||||
@@ -45,18 +48,34 @@ AC_TYPE_UINT8_T
|
||||
AC_CHECK_TYPES([ptrdiff_t])
|
||||
|
||||
# Checks for library functions.
|
||||
AC_FUNC_MMAP
|
||||
AC_CHECK_FUNCS([malloc realloc getcwd gettimeofday memmove memset munmap regcomp setlocale sqrt strdup strndup])
|
||||
AC_CHECK_FUNCS([malloc realloc getcwd gettimeofday memmove memset regcomp setlocale sqrt strdup strndup])
|
||||
|
||||
AC_CONFIG_FILES([Makefile
|
||||
libpostal.pc
|
||||
src/Makefile
|
||||
src/sparkey/Makefile
|
||||
test/Makefile])
|
||||
|
||||
AC_CHECK_PROG([FOUND_SHUF], [shuf], [yes])
|
||||
AC_CHECK_PROG([FOUND_GSHUF], [gshuf], [yes])
|
||||
|
||||
AS_IF([test "x$FOUND_SHUF" = xyes], [AC_DEFINE([HAVE_SHUF], [1], [shuf available])])
|
||||
AS_IF([test "x$FOUND_GSHUF" = xyes], [AC_DEFINE([HAVE_GSHUF], [1], [gshuf available])])
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Checks for SSE2 build
|
||||
# ------------------------------------------------------------------
|
||||
AC_ARG_ENABLE([sse2],
|
||||
AS_HELP_STRING(
|
||||
[--disable-sse2],
|
||||
[disable SSE2 optimization routines]
|
||||
)
|
||||
)
|
||||
|
||||
AS_IF([test "x$enable_sse2" != "xno"], [
|
||||
CFLAGS="-mfpmath=sse -msse2 -DUSE_SSE ${CFLAGS}"
|
||||
])
|
||||
|
||||
AC_CHECK_HEADER(cblas.h, [AX_CBLAS])
|
||||
|
||||
AC_ARG_ENABLE([data-download],
|
||||
[ --disable-data-download Disable downloading data],
|
||||
@@ -81,5 +100,6 @@ AC_ARG_WITH(cflags-scanner-extra, [AS_HELP_STRING([--with-cflags-scanner-extra@<
|
||||
|
||||
AC_MSG_NOTICE([extra cflags for scanner.c: $CFLAGS_SCANNER_EXTRA])
|
||||
AC_SUBST(CFLAGS_SCANNER_EXTRA)
|
||||
AC_SUBST(LIBPOSTAL_SO_VERSION, LIBPOSTAL_MAJOR_VERSION:LIBPOSTAL_MINOR_VERSION:LIBPOSTAL_PATCH_VERSION)
|
||||
|
||||
AC_OUTPUT
|
||||
|
||||
172
m4/ax_cblas.m4
Normal file
172
m4/ax_cblas.m4
Normal file
@@ -0,0 +1,172 @@
|
||||
# ===========================================================================
|
||||
# http://autoconf-archive.cryp.to/acx_blas.html
|
||||
# ===========================================================================
|
||||
#
|
||||
# SYNOPSIS
|
||||
#
|
||||
# AX_CBLAS([ACTION-IF-FOUND[, ACTION-IF-NOT-FOUND]])
|
||||
#
|
||||
# DESCRIPTION
|
||||
#
|
||||
# This macro looks for a library that implements the CBLAS linear-algebra
|
||||
# interface (see http://www.netlib.org/blas/). On success, it sets the
|
||||
# CBLAS_LIBS output variable to hold the requisite library linkages.
|
||||
#
|
||||
# To link with CBLAS, you should link with:
|
||||
#
|
||||
# $CBLAS_LIBS $LIBS
|
||||
#
|
||||
# in that order.
|
||||
#
|
||||
# Many libraries are searched for, from ATLAS to CXML to ESSL. The user
|
||||
# may also use --with-cblas=<lib> in order to use some specific CBLAS
|
||||
# library <lib>.
|
||||
#
|
||||
# ACTION-IF-FOUND is a list of shell commands to run if a BLAS library is
|
||||
# found, and ACTION-IF-NOT-FOUND is a list of commands to run it if it is
|
||||
# not found. If ACTION-IF-FOUND is not specified, the default action will
|
||||
# define HAVE_BLAS.
|
||||
#
|
||||
# This macro requires autoconf 2.50 or later.
|
||||
#
|
||||
# LAST MODIFICATION
|
||||
#
|
||||
# 2008-12-29
|
||||
#
|
||||
# COPYLEFT
|
||||
#
|
||||
# Copyright (c) 2008 Patrick O. Perry <patperry@stanfordalumni.org>
|
||||
# Copyright (c) 2008 Steven G. Johnson <stevenj@alum.mit.edu>
|
||||
#
|
||||
# This program is free software: you can redistribute it and/or modify it
|
||||
# under the terms of the GNU General Public License as published by the
|
||||
# Free Software Foundation, either version 3 of the License, or (at your
|
||||
# option) any later version.
|
||||
#
|
||||
# This program is distributed in the hope that it will be useful, but
|
||||
# WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
|
||||
# Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License along
|
||||
# with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
#
|
||||
# As a special exception, the respective Autoconf Macro's copyright owner
|
||||
# gives unlimited permission to copy, distribute and modify the configure
|
||||
# scripts that are the output of Autoconf when processing the Macro. You
|
||||
# need not follow the terms of the GNU General Public License when using
|
||||
# or distributing such scripts, even though portions of the text of the
|
||||
# Macro appear in them. The GNU General Public License (GPL) does govern
|
||||
# all other use of the material that constitutes the Autoconf Macro.
|
||||
#
|
||||
# This special exception to the GPL applies to versions of the Autoconf
|
||||
# Macro released by the Autoconf Macro Archive. When you make and
|
||||
# distribute a modified version of the Autoconf Macro, you may extend this
|
||||
# special exception to the GPL to apply to your modified version as well.
|
||||
|
||||
AC_DEFUN([AX_CBLAS], [
|
||||
AC_PREREQ(2.50)
|
||||
ax_cblas_ok=no
|
||||
|
||||
AC_ARG_WITH(cblas,
|
||||
[AC_HELP_STRING([--with-cblas=<lib>], [use CBLAS library <lib>])])
|
||||
case $with_cblas in
|
||||
yes | "") ;;
|
||||
no) ax_cblas_ok=disable ;;
|
||||
-* | */* | *.a | *.so | *.so.* | *.o) CBLAS_LIBS="$with_cblas" ;;
|
||||
*) CBLAS_LIBS="-l$with_cblas" ;;
|
||||
esac
|
||||
|
||||
ax_cblas_save_LIBS="$LIBS"
|
||||
|
||||
# First, check CBLAS_LIBS environment variable
|
||||
if test $ax_cblas_ok = no; then
|
||||
if test "x$CBLAS_LIBS" != x; then
|
||||
save_LIBS="$LIBS"; LIBS="$CBLAS_LIBS $LIBS"
|
||||
AC_MSG_CHECKING([for cblas_dgemm in $CBLAS_LIBS])
|
||||
AC_TRY_LINK_FUNC(cblas_dgemm, [ax_cblas_ok=yes], [CBLAS_LIBS=""])
|
||||
AC_MSG_RESULT($ax_cblas_ok)
|
||||
LIBS="$save_LIBS"
|
||||
fi
|
||||
fi
|
||||
|
||||
# CBLAS linked to by default? (happens on some supercomputers)
|
||||
if test $ax_cblas_ok = no; then
|
||||
save_LIBS="$LIBS"; LIBS="$LIBS"
|
||||
AC_CHECK_FUNC(cblas_dgemm, [ax_cblas_ok=yes])
|
||||
LIBS="$save_LIBS"
|
||||
fi
|
||||
|
||||
# BLAS in ATLAS library? (http://math-atlas.sourceforge.net/)
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(atlas, ATL_xerbla,
|
||||
[AC_CHECK_LIB(cblas, cblas_dgemm,
|
||||
[ax_cblas_ok=yes
|
||||
CBLAS_LIBS="-lcblas -latlas"],
|
||||
[], [-latlas])])
|
||||
fi
|
||||
|
||||
# BLAS in Intel MKL library?
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(mkl, cblas_dgemm, [ax_cblas_ok=yes;CBLAS_LIBS="-lmkl"])
|
||||
fi
|
||||
|
||||
# BLAS in Apple vecLib library?
|
||||
if test $ax_cblas_ok = no; then
|
||||
save_LIBS="$LIBS"; LIBS="-framework vecLib $LIBS"
|
||||
AC_CHECK_FUNC(cblas_dgemm, [ax_cblas_ok=yes;CBLAS_LIBS="-framework vecLib"])
|
||||
LIBS="$save_LIBS"
|
||||
fi
|
||||
|
||||
# BLAS in Alpha DXML library? (now called CXML, see above)
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(dxml, cblas_dgemm, [ax_cblas_ok=yes;CBLAS_LIBS="-ldxml"])
|
||||
fi
|
||||
|
||||
# BLAS in Sun Performance library?
|
||||
if test $ax_cblas_ok = no; then
|
||||
if test "x$GCC" != xyes; then # only works with Sun CC
|
||||
AC_CHECK_LIB(sunmath, acosp,
|
||||
[AC_CHECK_LIB(sunperf, cblas_dgemm,
|
||||
[CBLAS_LIBS="-xlic_lib=sunperf -lsunmath"
|
||||
ax_cblas_ok=yes],[],[-lsunmath])])
|
||||
fi
|
||||
fi
|
||||
|
||||
# BLAS in SCSL library? (SGI/Cray Scientific Library)
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(scs, cblas_dgemm, [ax_cblas_ok=yes; CBLAS_LIBS="-lscs"])
|
||||
fi
|
||||
|
||||
# BLAS in SGIMATH library?
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(complib.sgimath, cblas_dgemm,
|
||||
[ax_cblas_ok=yes; CBLAS_LIBS="-lcomplib.sgimath"])
|
||||
fi
|
||||
|
||||
# BLAS in IBM ESSL library? (requires generic BLAS lib, too)
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(blas, cblas_dgemm,
|
||||
[AC_CHECK_LIB(essl, cblas_dgemm,
|
||||
[ax_cblas_ok=yes; CBLAS_LIBS="-lessl -lblas"],
|
||||
[], [-lblas])])
|
||||
fi
|
||||
|
||||
# Generic CBLAS library?
|
||||
if test $ax_cblas_ok = no; then
|
||||
AC_CHECK_LIB(cblas, cblas_dgemm, [ax_cblas_ok=yes; CBLAS_LIBS="-lcblas"])
|
||||
fi
|
||||
|
||||
AC_SUBST(CBLAS_LIBS)
|
||||
|
||||
LIBS="$ax_cblas_save_LIBS"
|
||||
|
||||
# Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND:
|
||||
if test x"$ax_cblas_ok" = xyes; then
|
||||
ifelse([$1],,AC_DEFINE(HAVE_CBLAS,1,[Define if you have a CBLAS library.]),[$1])
|
||||
:
|
||||
else
|
||||
ax_cblas_ok=no
|
||||
$2
|
||||
fi
|
||||
])dnl AX_CBLAS
|
||||
1001
resources/addresses/bg.yaml
Normal file
1001
resources/addresses/bg.yaml
Normal file
File diff suppressed because it is too large
Load Diff
585
resources/addresses/bs.yaml
Normal file
585
resources/addresses/bs.yaml
Normal file
@@ -0,0 +1,585 @@
|
||||
# bs.yaml
|
||||
# -------
|
||||
# Bosnian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.7
|
||||
alphanumeric_probability: 0.3
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
# For unit types like 2/34
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
|
||||
|
||||
numbers:
|
||||
no_number:
|
||||
default:
|
||||
canonical: bez broja
|
||||
abbreviated: bb
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
|
||||
default: &broj
|
||||
canonical: broj
|
||||
abbreviated: br
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "br."
|
||||
whitespace_probability: 0.6
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
alphanumeric_phrase_probability: 0.05
|
||||
no_number_probability: 0.05
|
||||
|
||||
|
||||
and:
|
||||
default: &i
|
||||
canonical: i
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
cross_streets:
|
||||
i: *i
|
||||
at: &na
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &ugao
|
||||
canonical: ugao
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner_of: &uglu
|
||||
canonical: uglu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_uglu: &na_uglu
|
||||
canonical: na uglu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *i
|
||||
probability: 0.65
|
||||
alternatives:
|
||||
- alternative: *na
|
||||
probability: 0.1
|
||||
- alternative: *uglu
|
||||
probability: 0.1
|
||||
- alternative: *na_uglu
|
||||
probability: 0.1
|
||||
- alternative: *ugao
|
||||
probability: 0.05
|
||||
|
||||
izmedu: &izmedu
|
||||
canonical: između
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
between:
|
||||
default: *izmedu
|
||||
|
||||
levels:
|
||||
sprat: &sprat
|
||||
canonical: sprat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
kat: &kat
|
||||
canonical: kat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
prizemlje: &prizemlje
|
||||
canonical: prizemlje
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
mezanino: &mezanin
|
||||
canonical: mezanin
|
||||
half_floors: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
sample: true
|
||||
# e.g. mezanin 2
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. 2. mezanin
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.1
|
||||
ordinal_probability: 0.2
|
||||
standalone_probability: 0.6
|
||||
podrum: &podrum
|
||||
canonical: podrum
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. podrum 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. podrum
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *podrum
|
||||
"-1":
|
||||
default: *podrum
|
||||
# Special token for half-floors
|
||||
half_floors:
|
||||
default: *mezanin
|
||||
"0":
|
||||
default: *prizemlje
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *parter
|
||||
probability: 0.4
|
||||
- alternative: *kat
|
||||
probability: 0.05
|
||||
- alternative: *sprat
|
||||
probability: 0.05
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *kat
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *sprat
|
||||
probability: 0.5
|
||||
numeric_probability: 0.69 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.3 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: u blizini
|
||||
nearby:
|
||||
default:
|
||||
canonical: u blizini
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: u blizini ovdje
|
||||
probability: 0.3
|
||||
- alternative:
|
||||
canonical: ovde
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: u blizini mene
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: u
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &desno
|
||||
canonical: desno
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &lijevo
|
||||
canonical: lijevo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.5
|
||||
- alternative: *lijevo
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &istok
|
||||
canonical: istok
|
||||
abbreviated: i
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: i
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zapad
|
||||
canonical: zapad
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sjever
|
||||
canonical: sjever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &jug
|
||||
canonical: jug
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sjever
|
||||
probability: 0.25
|
||||
- alternative: *istok
|
||||
probability: 0.23
|
||||
- alternative: *jug
|
||||
probability: 0.23
|
||||
- alternative: *zapad
|
||||
probability: 0.23
|
||||
|
||||
entrances:
|
||||
ulaz: &ulaz
|
||||
canonical: ulaz
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Ulaz 1, Ulaz A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ulaz
|
||||
numeric_probability: 0.1 # e.g. Ulaz 1
|
||||
alpha_probability: 0.85 # e.g. Ulaz A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
staircases:
|
||||
stubiste: &stubiste
|
||||
canonical: stubište
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stubiste
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.2
|
||||
- alternative: *lijevo
|
||||
probability: 0.2
|
||||
- alternative: *sjever
|
||||
probability: 0.15
|
||||
- alternative: *jug
|
||||
probability: 0.15
|
||||
- alternative: *istok
|
||||
probability: 0.15
|
||||
- alternative: *zapad
|
||||
probability: 0.15
|
||||
|
||||
po_boxes:
|
||||
postanski_pretinac: &postanski_pretinac
|
||||
canonical: poštanski pretinac
|
||||
abbreviated: p.p
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *postanski_pretinac
|
||||
numeric_probability: 0.9 # pp 123
|
||||
alpha_probability: 0.05 # p.p A
|
||||
numeric_plus_alpha_probability: 0.04 # pp 123G
|
||||
alpha_plus_numeric_probability: 0.01 # pp A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
stan: &stan
|
||||
canonical: stan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
apartman: &apartman
|
||||
canonical: apartman
|
||||
abbreviated: ap
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
soba: &soba
|
||||
canonical: soba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ured: &ured
|
||||
canonical: ured
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *stan
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *apartman
|
||||
probability: 0.3
|
||||
- alternative: *soba
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. stan. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. stan A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *soba
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *ured
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *soba
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
856
resources/addresses/ca.yaml
Normal file
856
resources/addresses/ca.yaml
Normal file
@@ -0,0 +1,856 @@
|
||||
# ca.yaml
|
||||
# -------
|
||||
# Catalan language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
# If no floor number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.35
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.3
|
||||
alphanumeric_probability: 0.65
|
||||
standalone_probability: 0.05
|
||||
|
||||
numbers:
|
||||
default: &numero
|
||||
canonical: número
|
||||
abbreviated: "nº"
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#" # e.g. #3, #2F, etc.
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
direction: left # affix goes on the number's left
|
||||
|
||||
# Probabilities for numbers
|
||||
numeric_probability: 0.7
|
||||
numeric_affix_probability: 0.3
|
||||
|
||||
and:
|
||||
default: &i
|
||||
canonical: i
|
||||
abbreviated: "&"
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.1
|
||||
|
||||
house_numbers:
|
||||
# sense número (s/n) addresses
|
||||
no_number:
|
||||
default:
|
||||
canonical: sense número
|
||||
abbreviated: s/n
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
alphanumeric:
|
||||
default: *numero
|
||||
|
||||
alphanumeric_phrase_probability: 0.01
|
||||
no_number_probability: 0.1 # With this probability, use sense número if no house_number is specified
|
||||
|
||||
|
||||
|
||||
levels:
|
||||
# Everywhere except Spain
|
||||
floor: &pis
|
||||
canonical: pis
|
||||
abbreviated: p
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true # Occasionally add variation of "number", e.g. Pis No 2
|
||||
add_number_phrase_probability: 0.05
|
||||
numeric_affix:
|
||||
affix: p
|
||||
direction: left # P2
|
||||
# e.g. 2o piso
|
||||
ordinal:
|
||||
direction: right
|
||||
direction_probability: 0.95 # Let it vary occasionally e.g. Pis 2o
|
||||
standalone_probability: 0.2 # Let e.g. 5º be the entire floor string
|
||||
# If ordinal is selected, chance of e.g. just using 2o without Piso
|
||||
null_phrase_probability: 0.6
|
||||
numeric_probability: 0.2
|
||||
numeric_affix_probability: 0.05
|
||||
ordinal_probability: 0.75
|
||||
# Ground floor
|
||||
baixos: &baixos
|
||||
canonical: baixos
|
||||
abbreviated: bxs
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.1
|
||||
pis_baix: &pis_baix
|
||||
canonical: pis baix
|
||||
abbreviated: pb
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
sota: &sota
|
||||
canonical: sota
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# Used when floor number is < 0 (starts at -1 in all countries)
|
||||
soterrani: &soterrani
|
||||
canonical: soterrani
|
||||
abbreviated: so
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
# e.g. soterrani 1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: so
|
||||
direction: left
|
||||
# e.g. segon soterrani
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
sub_soterrani: &sub_soterrani
|
||||
canonical: sub soterrani
|
||||
abbreviated: ss
|
||||
sample: true
|
||||
# e.g. sub soterrani 1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: ss
|
||||
direction: left
|
||||
# e.g. segon sub soterrani
|
||||
ordinal:
|
||||
direction: right
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 2
|
||||
# Soterrani 2 == Sub-soterrani 1
|
||||
number_subtract_abs_value: 1
|
||||
standalone_probability: 0.985
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
entresol: &entresol
|
||||
canonical: entresòl
|
||||
abbreviated: entl
|
||||
half_floors: true
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
# e.g. entresòl 2
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. ent2
|
||||
numeric_affix:
|
||||
affix: ent
|
||||
direction: left
|
||||
# e.g. segon entresòl
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.1
|
||||
numeric_affix_probability: 0.1
|
||||
ordinal_probability: 0.2
|
||||
standalone_probability: 0.6
|
||||
pis_principal: &pis_principal
|
||||
canonical: pis principal
|
||||
abbreviated: pis pral
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.5
|
||||
principal: &principal
|
||||
canonical: principal
|
||||
abbreviated: pral
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
atic: &atic
|
||||
canonical: àtic
|
||||
abbreviated: át
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.2
|
||||
sobreatic: &sobreatic
|
||||
canonical: sobreàtic
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *soterrani
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *sub_soterrani
|
||||
probability: 0.3995
|
||||
- alternative: *pis
|
||||
probability: 0.0005
|
||||
"-1":
|
||||
default: *soterrani
|
||||
probability: 0.9995
|
||||
alternatives:
|
||||
- alternative: *pis
|
||||
probability: 0.0005
|
||||
# Special token for half-floors
|
||||
half_floors:
|
||||
default: *entresol
|
||||
"0":
|
||||
default: *baixos
|
||||
probability: 0.495
|
||||
alternatives:
|
||||
- alternative: *pis_baix
|
||||
probability: 0.395
|
||||
- alternative: *sota
|
||||
probability: 0.1
|
||||
- alternative: *pis
|
||||
# Piso 0 is uncommon
|
||||
probability: 0.01
|
||||
top:
|
||||
default: *pis
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *atic
|
||||
probability: 0.1
|
||||
- alternative: *sobreatic
|
||||
probability: 0.05
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *pis
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
numeric_probability: 0.99
|
||||
alpha_probability: 0.01
|
||||
|
||||
blocks:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: bloc
|
||||
abbreviated: bl
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: a prop de
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: prop de
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: prop
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: a prop
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: proper
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: proper a
|
||||
probability: 0.05
|
||||
|
||||
nearby:
|
||||
default:
|
||||
canonical: proper
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: a prop
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: a prop d'aquí
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: a prop d'aqui
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: aquí
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: aqui
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: a prop meu
|
||||
in:
|
||||
default:
|
||||
canonical: a
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: dins
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: en
|
||||
probability: 0.2
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
cross_streets:
|
||||
and: *i
|
||||
amb: &amb
|
||||
canonical: amb
|
||||
a: &a
|
||||
canonical: a
|
||||
corner_of: &cantonada_de
|
||||
canonical: cantonada de
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
at_the_corner_of: &a_la_cantonada_de
|
||||
canonical: a la cantonada de
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
corner: &cantonada
|
||||
canonical: cantonada
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
|
||||
intersection:
|
||||
default: *i
|
||||
probability: 0.55
|
||||
alternatives:
|
||||
- alternative: *amb
|
||||
probability: 0.2
|
||||
- alternative: *a
|
||||
probability: 0.1
|
||||
- alternative: *cantonada_de
|
||||
probability: 0.09
|
||||
- alternative: *a_la_cantonada_de
|
||||
probability: 0.05
|
||||
- alternative: *cantonada
|
||||
probability: 0.01
|
||||
|
||||
between:
|
||||
canonical: entre
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probabililty: 0.5
|
||||
|
||||
|
||||
po_boxes:
|
||||
apartat: &apartat
|
||||
canonical: apartat
|
||||
abbreviated: apt
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.4 # Apt No 1234
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *apartat
|
||||
numeric_probability: 0.9 # Apt 123
|
||||
alpha_probability: 0.05 # Apt A
|
||||
numeric_plus_alpha_probability: 0.04 # Apt 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Apt A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: codi postal
|
||||
abbreviated: cp
|
||||
sample: true
|
||||
canonical_probability: 0.01
|
||||
abbreviated_probability: 0.95
|
||||
sample_probability: 0.04
|
||||
|
||||
numeric:
|
||||
# Postcodes in Spain and Latin America are sometimes prefixed by CP
|
||||
direction: left
|
||||
|
||||
numeric_affix:
|
||||
affix: cp
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.7
|
||||
numeric_probability: 0.18
|
||||
numeric_affix_probability: 0.12
|
||||
strict_numeric: true
|
||||
|
||||
directions:
|
||||
right: &dreta
|
||||
canonical: dreta
|
||||
abbreviated: dta
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
left: &esquerra
|
||||
canonical: esquerra
|
||||
abbreviated: esq
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
rear: &posterior
|
||||
canonical: posterior
|
||||
abbreviated: pos
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
front: &front
|
||||
canonical: front
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *dreta
|
||||
probability: 0.45
|
||||
- alternative: *esquerra
|
||||
probability: 0.45
|
||||
- alternative: *posterior
|
||||
probability: 0.05
|
||||
- alternative: *front
|
||||
probability: 0.05
|
||||
|
||||
anteroposterior:
|
||||
alternatives:
|
||||
- alternative: *front
|
||||
probability: 0.5
|
||||
- alternative: *posterior
|
||||
probability: 0.5
|
||||
|
||||
lateral:
|
||||
alternatives:
|
||||
- alternative: *dreta
|
||||
probability: 0.5
|
||||
- alternative: *esquerra
|
||||
probability: 0.5
|
||||
|
||||
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &est
|
||||
canonical: est
|
||||
abbreviated: e
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &oest
|
||||
canonical: oest
|
||||
abbreviated: w
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: w
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &sud
|
||||
canonical: sud
|
||||
abbreviated: s
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *est
|
||||
probability: 0.25
|
||||
- alternative: *sud
|
||||
probability: 0.25
|
||||
- alternative: *oest
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
entrada: &entrada
|
||||
canonical: entrada
|
||||
abbreviated: entr
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Entrance 1, Entrance A, etc.
|
||||
alphanumeric:
|
||||
default: *entrada
|
||||
numeric_probability: 0.1 # e.g. Entrance 1
|
||||
alpha_probability: 0.85 # e.g. Entrnace A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *oest
|
||||
- alternative: *dreta
|
||||
- alternative: *esquerra
|
||||
- alternative: *posterior
|
||||
- alternative: *front
|
||||
|
||||
staircases:
|
||||
escala: &escala
|
||||
canonical: escala
|
||||
abbreviated: esc
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, Stair A, Stair 1, etc.
|
||||
default: *escala
|
||||
numeric_probability: 0.6 # e.g. Escalera 1
|
||||
alpha_probability: 0.35 # e.g. Escalera A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right # e.g. Escalera Izq
|
||||
direction_probability: 0.8
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *oest
|
||||
- alternative: *dreta
|
||||
- alternative: *esquerra
|
||||
- alternative: *posterior
|
||||
- alternative: *front
|
||||
|
||||
units:
|
||||
flat: &apartament
|
||||
canonical: apartament
|
||||
abbreviated: apmt
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
door: &porta
|
||||
canonical: porta
|
||||
abbreviated: pta
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
# If it's just puerta B, many times it's just e.g. 3o B for "tercero piso puerta B"
|
||||
null_phrase_probability: 0.15
|
||||
ordinal:
|
||||
direction: right
|
||||
gender: f
|
||||
direction_probability: 0.95 # Let it vary occasionally e.g. Porta 2a
|
||||
null_phrase_probability: 0.8 # Let e.g. 5a be the entire unit string
|
||||
numeric_probability: 0.25
|
||||
ordinal_probability: 0.75
|
||||
lletra: &lletra
|
||||
canonical: lletra
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
office: &oficina
|
||||
canonical: oficina
|
||||
abbreviated: of
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
# Another word for unit, used more in Colombia
|
||||
unitat: &unitat
|
||||
canonical: unitat
|
||||
abbreviated: un
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
lot: &lot
|
||||
canonical: lot
|
||||
abbreviated: lt
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
parcel: &parcella
|
||||
canonical: parcel·la
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
habitacio: &habitacio
|
||||
canonical: habitació
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
casa: &casa
|
||||
canonical: casa
|
||||
numeric:
|
||||
direction: left
|
||||
room: &sala
|
||||
canonical: sala
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *porta
|
||||
probability: 0.8
|
||||
sample: true
|
||||
alternatives:
|
||||
- alternative: *apartament
|
||||
probability: 0.1
|
||||
- alternative: *casa
|
||||
probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2o Izq, 2 Dcha, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
add_direction_numeric: true # Only for numbers
|
||||
add_direction_standalone: true # A unit can be as simple as "D"
|
||||
|
||||
numeric_probability: 0.7 # e.g. Porta 1a
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Porta 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Porta A1
|
||||
alpha_probability: 0.28 # e.g. Porta A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
alpha:
|
||||
default: *porta
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *lletra
|
||||
probability: 0.12
|
||||
- alternative: *apartament
|
||||
probability: 0.05
|
||||
- alternative: *casa
|
||||
probability: 0.01
|
||||
- alternative: *unitat
|
||||
probability: 0.01
|
||||
- alternative: *habitacio
|
||||
probability: 0.01
|
||||
|
||||
zones:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *oficina
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *sala
|
||||
probability: 0.2
|
||||
|
||||
numeric_probability: 0.9 # e.g. Oficina 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Oficina 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Oficina A1
|
||||
alpha_probability: 0.08 # e.g. Oficina A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
alpha:
|
||||
default: *oficina
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *sala
|
||||
probability: 0.15
|
||||
- alternative: *lletra
|
||||
probability: 0.05
|
||||
|
||||
industrial:
|
||||
default: *lot
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *oficina
|
||||
probability: 0.3
|
||||
- alternative: *unitat
|
||||
probability: 0.19
|
||||
- alternative: *parcella
|
||||
probability: 0.01
|
||||
|
||||
numeric_probability: 0.9 # e.g. Lote 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Lote 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Lote A1
|
||||
alpha_probability: 0.08 # e.g. Lote A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
university:
|
||||
default: *sala
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *porta
|
||||
probability: 0.1
|
||||
|
||||
numeric_probability: 0.9 # e.g. Sala 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Sala 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Sala A1
|
||||
alpha_probability: 0.08 # e.g. Sala A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
alpha:
|
||||
default: *sala
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *porta
|
||||
probability: 0.08
|
||||
- alternative: *lletra
|
||||
probability: 0.02
|
||||
|
||||
allotments:
|
||||
lot:
|
||||
default: *lot
|
||||
numeric_probability: 0.8
|
||||
alphanumeric_probability: 0.1
|
||||
alpha_probability: 0.1
|
||||
parcel:
|
||||
default: *parcella
|
||||
numeric_probability: 0.3
|
||||
alphanumeric_probability: 0.3
|
||||
alpha_probability: 0.4
|
||||
lot_probability: 0.9
|
||||
parcel_probability: 0.06
|
||||
lot_plus_parcel_probability: 0.02
|
||||
parcel_plus_lot_probability: 0.02
|
||||
570
resources/addresses/cs.yaml
Normal file
570
resources/addresses/cs.yaml
Normal file
@@ -0,0 +1,570 @@
|
||||
# cs.yaml
|
||||
# -------
|
||||
# Czech language specification
|
||||
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.04
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
# Note: no combinations because of the house numbering scheme
|
||||
|
||||
numbers:
|
||||
default: &cislo
|
||||
canonical: číslo
|
||||
abbreviated: č
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "č."
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
and:
|
||||
default: &a
|
||||
canonical: a
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
conscription_numbers:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: číslo popisné
|
||||
abbreviated: "č.p."
|
||||
canonical_probability: 0.05
|
||||
abbreviated_probability: 0.85
|
||||
sample: true
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
cross_streets:
|
||||
and: *a
|
||||
at: &na
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner_of: &rohu
|
||||
canonical: rohu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &roh
|
||||
canonical: roh
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &na_rohu
|
||||
canonical: na rohu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *a
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *na
|
||||
probability: 0.1
|
||||
- alternative: *rohu
|
||||
probability: 0.1
|
||||
- alternative: *roh
|
||||
probability: 0.1
|
||||
- alternative: *na_rohu
|
||||
probability: 0.1
|
||||
|
||||
between:
|
||||
canonical: mezi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &patro
|
||||
canonical: patro
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
nadzemni_podlazi: &nadzemni_podlazi
|
||||
canonical: nadzemní podlaží
|
||||
abbreviated: np
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.8
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etaz: &etaz
|
||||
canonical: etáž
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
prizemi: &prizemi
|
||||
canonical: přízemí
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
podzemni_podlazi: &podzemni_podlazi
|
||||
canonical: podzemní podlaží
|
||||
abbreviated: pp
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.3
|
||||
# e.g. podzemní podlaží 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. pp1
|
||||
numeric_affix:
|
||||
affix: pp
|
||||
direction: left
|
||||
# e.g. 1. podzemní podlaží
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *podzemni_podlazi
|
||||
"-1":
|
||||
default: *podzemni_podlazi
|
||||
"0":
|
||||
default: *prizemi
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *patro
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *patro
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *nadzemni_podlazi
|
||||
probability: 0.19
|
||||
- alternative: *etaz
|
||||
probability: 0.01
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: poblíž
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: v blízkém okolí
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: u
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: kolem
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
probability: 0.05
|
||||
nearby:
|
||||
default:
|
||||
canonical: poblíž
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.45
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: blízko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: v blízkosti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tady poblíž
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tady
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: okolo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: v okolí
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
near_me:
|
||||
default:
|
||||
canonical: v blízkosti mně
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: v
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: ve
|
||||
probability: 0.3
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &prava
|
||||
canonical: pravá
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &leva
|
||||
canonical: levá
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *prava
|
||||
probability: 0.5
|
||||
- alternative: *leva
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &vychod
|
||||
canonical: východ
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zapad
|
||||
canonical: západ
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sever
|
||||
canonical: sever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &jih
|
||||
canonical: jih
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
probability: 0.25
|
||||
- alternative: *vychod
|
||||
probability: 0.25
|
||||
- alternative: *jih
|
||||
probability: 0.25
|
||||
- alternative: *zapad
|
||||
probability: 0.25
|
||||
entrances:
|
||||
vchod: &vchod
|
||||
canonical: vchod
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Wejście 1, Wejście A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *vchod
|
||||
numeric_probability: 0.1 # e.g. Wejście 1
|
||||
alpha_probability: 0.85 # e.g. Wejście A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
schodiste: &schodiste
|
||||
canonical: schodiště
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *schodiste
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
- alternative: *jih
|
||||
- alternative: *vychod
|
||||
- alternative: *zapad
|
||||
|
||||
po_boxes:
|
||||
postovni_prihradka: &postovni_prihradka
|
||||
canonical: poštovní přihrádka
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # poštovní přihrádka 1234
|
||||
alphanumeric:
|
||||
default: *postovni_prihradka
|
||||
numeric_probability: 0.9 # poštovní přihrádka 123
|
||||
alpha_probability: 0.05 # poštovní přihrádka A
|
||||
numeric_plus_alpha_probability: 0.04 # poštovní přihrádka 123G
|
||||
alpha_plus_numeric_probability: 0.01 # poštovní přihrádka A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
apartaman: &apartaman
|
||||
canonical: apartmán
|
||||
abbreviated: apt
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
pokoj: &pokoj
|
||||
canonical: pokoj
|
||||
abbreviated: pok
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
kancelar: &kancelar
|
||||
canonical: kancelář
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *apartaman
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *pokoj
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. apt. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. apt. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *pokoj
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *kancelar
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. pokoj 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pokoj 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokoj A1
|
||||
alpha_probability: 0.03 # e.g. pokoj A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *pokoj
|
||||
numeric_probability: 0.95 # e.g. pokoj 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pok 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokoj A1
|
||||
alpha_probability: 0.03 # e.g. pokoj A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
599
resources/addresses/da.yaml
Normal file
599
resources/addresses/da.yaml
Normal file
@@ -0,0 +1,599 @@
|
||||
# da.yaml
|
||||
# -------
|
||||
# Danish language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85
|
||||
alphanumeric_probability: 0.1
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- level
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.9
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- entrance
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.9
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.001
|
||||
|
||||
|
||||
numbers:
|
||||
default: &nummer
|
||||
canonical: nummer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *nummer
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &og
|
||||
canonical: og
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *og
|
||||
corner_of: &hjorne_af
|
||||
canonical: hjørne af
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &pa_hjornet_af
|
||||
canonical: på hjørnet af
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *og
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hjorne_af
|
||||
probability: 0.15
|
||||
- alternative: *pa_hjornet_af
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: mellem
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &sal
|
||||
canonical: sal
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etage: &etage
|
||||
canonical: etage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
stuen: &stuen
|
||||
canonical: stuen
|
||||
abbreviated: st
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
stueetage: &stueetage
|
||||
canonical: stueetage
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
sample_probability: 0.7
|
||||
kaelderen: &kaelderen
|
||||
canonical: kælderen
|
||||
abbreviated: kl
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
# e.g. 1 kælderen
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.8
|
||||
# e.g. k1
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: left
|
||||
# e.g. 1. kl
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kaelderen
|
||||
"-1":
|
||||
default: *kaelderen
|
||||
"0":
|
||||
default: *stuen
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *stueetage
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *sal
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *etage
|
||||
probability: 0.3
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: i nærheden af
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: tæt på
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: tæt ved
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
nearby:
|
||||
default:
|
||||
canonical: i nærheden
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.4
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: rundt her
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: nær her
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nær
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: omkring her
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tæt på her
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: nær mig
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: i nærheden af mig
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tæt på mig
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: i
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: om
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: på
|
||||
probability: 0.1
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
directions:
|
||||
right: &til_hojre
|
||||
canonical: til højre
|
||||
abbreviated: t.h
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: t.h
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &til_venstre
|
||||
canonical: til venstre
|
||||
abbreviated: t.v
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: t.v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
middle: &midt_for
|
||||
canonical: midt for
|
||||
abbreviated: m.f
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: m.f
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
alternatives:
|
||||
- alternative: *til_hojre
|
||||
probability: 0.45
|
||||
- alternative: *til_venstre
|
||||
probability: 0.45
|
||||
- alternative: *midt_for
|
||||
probability: 0.1
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &ost
|
||||
canonical: øst
|
||||
abbreviated: ø
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: ø
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &vest
|
||||
canonical: vest
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &syd
|
||||
canonical: syd
|
||||
abbreviated: s
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *ost
|
||||
probability: 0.25
|
||||
- alternative: *syd
|
||||
probability: 0.25
|
||||
- alternative: *vest
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
indgang: &indgang
|
||||
canonical: indgang
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *indgang
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
stiege: &stiege
|
||||
canonical: stiege
|
||||
abbreviated: stg
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
trappe: &trappe
|
||||
canonical: trappe
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *trappe
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *stiege
|
||||
probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *syd
|
||||
- alternative: *ost
|
||||
- alternative: *vest
|
||||
|
||||
po_boxes:
|
||||
postboks: &postboks
|
||||
canonical: postboks
|
||||
abbreviated: pb
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Pb No 1234
|
||||
boks: &boks
|
||||
canonical: boks
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Boks No 1234
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postboks
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *boks
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # Pb 123
|
||||
alpha_probability: 0.05 # Pb A
|
||||
numeric_plus_alpha_probability: 0.04 # Pb 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Pb A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
lejlighed: &lejlighed
|
||||
canonical: lejlighed
|
||||
abbreviated: ljd
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.5
|
||||
# Lejlighed nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
hus: &hus
|
||||
canonical: hus
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
vaerelse: &vaerelse
|
||||
canonical: værelse
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *lejlighed
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *hus
|
||||
probability: 0.1
|
||||
- alternative: *vaerelse
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. Lejlighed 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. Lejl A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2R, 2L, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.5
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Lejlighed Rechts
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
715
resources/addresses/de.yaml
Normal file
715
resources/addresses/de.yaml
Normal file
@@ -0,0 +1,715 @@
|
||||
# de.yaml
|
||||
# -------
|
||||
# Note: this will only apply to the German language code, which encompasses Germany,
|
||||
# Austria, Switzerland (but not Swiss-German, which has its own language code),
|
||||
# Lichtenstein, Luxembourg (Luxembourgish has its own language code), and part of Belgium.
|
||||
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85
|
||||
alphanumeric_probability: 0.1
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
combinations:
|
||||
# e.g. 2/34, more common way to specify a unit number in German
|
||||
# if unit exists in the first place
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.8
|
||||
- separator: "-"
|
||||
probability: 0.1
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.05
|
||||
|
||||
|
||||
numbers:
|
||||
default: &nummer
|
||||
canonical: nummer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *nummer
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
conscription_numbers:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: konskriptionsnummer
|
||||
abbreviated: konskr. nr
|
||||
canonical_probability: 0.15
|
||||
abbreviated_probability: 0.65
|
||||
sample: true
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
and:
|
||||
default: &und
|
||||
canonical: und
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *und
|
||||
corner_of: &ecke_von
|
||||
canonical: ecke von
|
||||
at_the_corner_of: &an_der_ecke_von
|
||||
canonical: an der ecke von
|
||||
intersection:
|
||||
default: *und
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *ecke_von
|
||||
probability: 0.15
|
||||
- alternative: *an_der_ecke_von
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: zwischen
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &obergeschoss
|
||||
canonical: obergeschoss
|
||||
abbreviated: og
|
||||
sample: true
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: og
|
||||
direction: right
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.3
|
||||
numeric_affix_probability: 0.5
|
||||
ordinal_probability: 0.2
|
||||
etage: &etage
|
||||
canonical: etage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
stock: &stock
|
||||
canonical: stock
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.1
|
||||
ordinal_probability: 0.9
|
||||
erdgeschoss: &erdgeschoss
|
||||
canonical: erdgeschoss
|
||||
abbreviated: eg
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
untergeschoss: &untergeschoss
|
||||
canonical: untergeschoss
|
||||
abbreviated: ug
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
# e.g. Basement 1
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. 1ug
|
||||
numeric_affix:
|
||||
affix: ug
|
||||
direction: left
|
||||
# e.g. 1. UG
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
unterste_etage: &unterste_etage
|
||||
canonical: unterste etage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
oberste_etage: &oberste_etage
|
||||
canonical: oberste etage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *untergeschoss
|
||||
"-1":
|
||||
default: *untergeschoss
|
||||
"0":
|
||||
default: *erdgeschoss
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *unterste_etage
|
||||
probability: 0.1
|
||||
"top":
|
||||
default: *obergeschoss
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *stock
|
||||
probability: 0.1
|
||||
- alternative: *etage
|
||||
probability: 0.05
|
||||
- alternative: *oberste_etage
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *obergeschoss
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *stock
|
||||
probability: 0.1
|
||||
- alternative: *etage
|
||||
probability: 0.05
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: nähe
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: bei
|
||||
probability: 0.3
|
||||
- alternative:
|
||||
canonical: nah
|
||||
probability: 0.15
|
||||
- alternative:
|
||||
canonical: nahe an
|
||||
probability: 0.05
|
||||
nearby:
|
||||
default:
|
||||
canonical: hier in der nähe
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.4
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: in der nähe
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.3
|
||||
- alternative:
|
||||
canonical: in der nähe hier
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: in der nähe von
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nahe gelegen
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: hier in der gegend
|
||||
probability: 0.05
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: in meiner nähe
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: in der nähe zu mir
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: in
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: im
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: um
|
||||
probability: 0.2
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
directions:
|
||||
right: &rechts
|
||||
canonical: rechts
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: r
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &links
|
||||
canonical: links
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: l
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *rechts
|
||||
probability: 0.5
|
||||
- alternative: *links
|
||||
probability: 0.5
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &ost
|
||||
canonical: ost
|
||||
abbreviated: o
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &west
|
||||
canonical: west
|
||||
abbreviated: w
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: w
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &sud
|
||||
canonical: süd
|
||||
abbreviated: s
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *ost
|
||||
probability: 0.25
|
||||
- alternative: *sud
|
||||
probability: 0.25
|
||||
- alternative: *west
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
eingang: &eingang
|
||||
canonical: eingang
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *eingang
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
stiege: &stiege
|
||||
canonical: stiege
|
||||
abbreviated: stg
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
treppe: &treppe
|
||||
canonical: treppe
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stiege
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *treppe
|
||||
probability: 0.4
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *ost
|
||||
- alternative: *west
|
||||
|
||||
po_boxes:
|
||||
postfach: &postfach
|
||||
canonical: postfach
|
||||
abbreviated: pf
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # PF No 1234
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postfach
|
||||
numeric_probability: 0.9 # Apdo 123
|
||||
alpha_probability: 0.05 # Apdo A
|
||||
numeric_plus_alpha_probability: 0.04 # Apdo 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Apdo A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
halle: &halle
|
||||
canonical: halle
|
||||
numeric:
|
||||
direction: left
|
||||
wohnung: &wohnung
|
||||
canonical: wohnung
|
||||
abbreviated: whg
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.3
|
||||
plural:
|
||||
canonical: wohnungen
|
||||
numeric:
|
||||
direction: left
|
||||
# Wohnung nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
haus: &haus
|
||||
canonical: haus
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
wohnungsnummer: &wohnungsnummer
|
||||
canonical: wohnungsnummer
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
appartement: &appartement
|
||||
canonical: appartement
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
buro: &buro
|
||||
canonical: büro
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
zimmer: &zimmer
|
||||
canonical: zimmer
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *wohnung
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *wohnungsnummer
|
||||
probability: 0.1
|
||||
- alternative: *appartement
|
||||
probability: 0.05
|
||||
- alternative: *haus
|
||||
probability: 0.05
|
||||
|
||||
numeric_probability: 0.9 # e.g. Wohnung 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. Wohnung A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2R, 2L, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Wohnung Rechts
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
|
||||
zone:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *buro
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *zimmer
|
||||
probability: 0.1
|
||||
university:
|
||||
default: *halle
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *zimmer
|
||||
probability: 0.1
|
||||
|
||||
|
||||
countries:
|
||||
# Austria
|
||||
at:
|
||||
# Staircase and entrance numbers more common
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.3
|
||||
standalone_probability: 0.1
|
||||
staircase:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
entrance:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
unit:
|
||||
null_probability: 0.4
|
||||
alphanumeric_probability: 0.6
|
||||
|
||||
# Combined apartment numbers are very common
|
||||
combinations:
|
||||
# e.g. Neubaugasse 55/A/1/5
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- entrance
|
||||
- staircase
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.98
|
||||
- separator: "-"
|
||||
probability: 0.02
|
||||
probability: 0.9
|
||||
# e.g. Neubaugasse 55/1/5
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.98
|
||||
- separator: "-"
|
||||
probability: 0.02
|
||||
probability: 0.8
|
||||
# e.g. Neubaugasse 55/5
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
probability: 0.7
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.98
|
||||
- separator: "-"
|
||||
probability: 0.02
|
||||
|
||||
units:
|
||||
top: &top
|
||||
canonical: top
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &austria_units_alphanumeric
|
||||
default: *top
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *haus
|
||||
probability: 0.15
|
||||
- alternative: *wohnung
|
||||
probability: 0.05
|
||||
- alternative: *wohnungsnummer
|
||||
probability: 0.025
|
||||
- alternative: *appartement
|
||||
probability: 0.025
|
||||
368
resources/addresses/el.yaml
Normal file
368
resources/addresses/el.yaml
Normal file
@@ -0,0 +1,368 @@
|
||||
# el.yaml
|
||||
# -------
|
||||
# Greek language specification
|
||||
|
||||
|
||||
alphabet: αβγδεζηθικλμνξοπρστυφχψω
|
||||
alphabet_probability: 0.8
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.05
|
||||
|
||||
entrance:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
unit:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
|
||||
levels:
|
||||
orofos: &orofos
|
||||
canonical: όροφος
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
orofos_latin: &orofos_latin
|
||||
canonical: órofos
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
isogelo: &isogelo
|
||||
canonical: ισόγειο
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
isogelo_latin: &isogelo_latin
|
||||
canonical: isógeio
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
imiorofos: &imiorofos
|
||||
canonical: ημιώροφος
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
imiorofos_latin: &imiorofos_latin
|
||||
canonical: imiórofos
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
|
||||
ypogeio: &ypogeio
|
||||
canonical: υπόγειο
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.985
|
||||
numeric_probability: 0.01
|
||||
ordinal_probability: 0.005
|
||||
ypogeio_latin: &ypogeio_latin
|
||||
canonical: ypógeio
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.985
|
||||
numeric_probability: 0.01
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *ypogeio
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *ypogeio_latin
|
||||
probability: 0.1
|
||||
"-1":
|
||||
default: *ypogeio
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *ypogeio_latin
|
||||
probability: 0.1
|
||||
|
||||
half_floors:
|
||||
default: *imiorofos
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *imiorofos_latin
|
||||
probability: 0.1
|
||||
|
||||
"0":
|
||||
default: *isogelo
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *isogelo_latin
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *orofos
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *orofos_latin
|
||||
probability: 0.1
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
entrances:
|
||||
eisodos: &eisodos
|
||||
canonical: είσοδος
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
eisodos_latin: &eisodos_latin
|
||||
canonical: eísodos
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# είσοδος 1, etc.
|
||||
alphanumeric:
|
||||
default: *eisodos
|
||||
probability: 0.99
|
||||
alternatives:
|
||||
- alternative: *eisodos_latin
|
||||
probability: 0.01
|
||||
numeric_probability: 0.1
|
||||
alpha_probability: 0.9
|
||||
|
||||
staircases:
|
||||
skala: &skala
|
||||
canonical: σκάλα
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
skala_latin: &skala_latin
|
||||
canonical: skála
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, skála A, skála 1, etc.
|
||||
default: *skala
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *skala_latin
|
||||
probability: 0.1
|
||||
numeric_probability: 0.6 # e.g. skála 1
|
||||
alpha_probability: 0.35 # e.g. skála A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
po_boxes:
|
||||
tachydromiki_thyrida: &tachydromiki_thyrida
|
||||
canonical: ταχυδρομική θυρίδα
|
||||
abbreviated: τ.θ
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
tachydromiki_thyrida_latin: &tachydromiki_thyrida_latin
|
||||
canonical: tachydromikí thyrída
|
||||
abbreviated: t.th
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
default: *tachydromiki_thyrida
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *tachydromiki_thyrida_latin
|
||||
probability: 0.2
|
||||
numeric_probability: 0.9 # t.th 123
|
||||
alpha_probability: 0.05 # t.th А
|
||||
numeric_plus_alpha_probability: 0.04 # t.th 123А
|
||||
alpha_plus_numeric_probability: 0.01 # t.th А123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
diamerisma: &diamerisma
|
||||
canonical: διαμέρισμα
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
diamerisma_latin: &diamerisma_latin
|
||||
canonical: diamérisma
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
|
||||
domatio: &domatio
|
||||
canonical: δωμάτιο
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
domatio_latin: &domatio_latin
|
||||
canonical: domátio
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
|
||||
grafeiou: &grafeiou
|
||||
canonical: γραφείου
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
grafeiou_latin: &grafeiou_latin
|
||||
canonical: grafeíou
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *diamerisma
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *diamerisma_latin
|
||||
probability: 0.1
|
||||
- alternative: *domatio
|
||||
probability: 0.09
|
||||
- alternative: *domatio_latin
|
||||
probability: 0.01
|
||||
|
||||
numeric_probability: 0.9 # e.g. diamérisma 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1А
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. AА1
|
||||
alpha_probability: 0.04 # e.g. διαμέρισμα А
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
|
||||
zone:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *grafeiou
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *grafeiou_latin
|
||||
probability: 0.1
|
||||
university:
|
||||
default: *domatio
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *domatio_latin
|
||||
probability: 0.1
|
||||
1468
resources/addresses/en.yaml
Normal file
1468
resources/addresses/en.yaml
Normal file
File diff suppressed because it is too large
Load Diff
1189
resources/addresses/es.yaml
Normal file
1189
resources/addresses/es.yaml
Normal file
File diff suppressed because it is too large
Load Diff
470
resources/addresses/et.yaml
Normal file
470
resources/addresses/et.yaml
Normal file
@@ -0,0 +1,470 @@
|
||||
# et.yaml
|
||||
# -------
|
||||
# Estonian language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.97
|
||||
alphanumeric_probability: 0.02
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.95
|
||||
- separator: " - "
|
||||
probability: 0.05
|
||||
probability: 0.7
|
||||
|
||||
|
||||
numbers:
|
||||
default: &number
|
||||
canonical: number
|
||||
abbreviated: nbr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *number
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &ja
|
||||
canonical: ja
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *ja
|
||||
corner_of: &nurgas
|
||||
canonical: nurgas
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &nurgal
|
||||
canonical: nurgal
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *ja
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *nurgas
|
||||
probability: 0.15
|
||||
- alternative: *nurgal
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: vahel
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &korrusel
|
||||
canonical: korrusel
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
kelder: &kelder
|
||||
canonical: kelder
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
standalone_probability: 1.0
|
||||
keldris: &keldris
|
||||
canonical: keldris
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. 1 keldris
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.8
|
||||
# e.g. k1
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: left
|
||||
# e.g. 1. keldris
|
||||
ordinal:
|
||||
direction: right
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.05
|
||||
numeric_affix_probability: 0.9
|
||||
ordinal_probability: 0.05
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kelder
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *keldris
|
||||
probability: 0.15
|
||||
"-1":
|
||||
default: *kelder
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *keldris
|
||||
probability: 0.1
|
||||
- alternative: *korrusel
|
||||
probability: 0.05
|
||||
"1":
|
||||
default: *parter
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *korrusel
|
||||
probability: 0.5
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *korrusel
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: lähedal
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
nearby:
|
||||
default:
|
||||
canonical: lähedal
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: siin lähedal
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: siinkandis
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: lähedal mulle
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.7
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
|
||||
directions:
|
||||
right: &paremal
|
||||
canonical: paremal
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: p
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
paramale: &paremale
|
||||
canonical: paremale
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: p
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &vasakul
|
||||
canonical: vasakul
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
vasakule: &vasakule
|
||||
canonical: vasakule
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
alternatives:
|
||||
- alternative: *paremal
|
||||
probability: 0.25
|
||||
- alternative: *paremale
|
||||
probability: 0.25
|
||||
- alternative: *vasakul
|
||||
probability: 0.25
|
||||
- alternative: *vasakule
|
||||
probability: 0.25
|
||||
|
||||
cardinal_directions:
|
||||
east: &ida
|
||||
canonical: ida
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
west: &laas
|
||||
canonical: lääs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
north: &pohi
|
||||
canonical: põhi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
south: &louna
|
||||
canonical: lõuna
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
alternatives:
|
||||
- alternative: *pohi
|
||||
probability: 0.25
|
||||
- alternative: *ida
|
||||
probability: 0.25
|
||||
- alternative: *louna
|
||||
probability: 0.25
|
||||
- alternative: *laas
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
sissepaas: &sissepaas
|
||||
canonical: sissepääs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *sissepaas
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
trepikoda: &trepikoda
|
||||
canonical: trepikoda
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *trepikoda
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *pohi
|
||||
- alternative: *louna
|
||||
- alternative: *ida
|
||||
- alternative: *laas
|
||||
|
||||
po_boxes:
|
||||
postboks: &abonementpostkast
|
||||
canonical: abonementpostkast
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # abonementpostkast #1234
|
||||
kast: &kast
|
||||
canonical: kast
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Kast #1234
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *abonementpostkast
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *kast
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # 123
|
||||
alpha_probability: 0.05 # A
|
||||
numeric_plus_alpha_probability: 0.04 # 123G
|
||||
alpha_plus_numeric_probability: 0.01 # A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
korter: &korter
|
||||
canonical: korter
|
||||
abbreviated: k
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.3
|
||||
# Lejlighed nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
ruumi: &ruumi
|
||||
canonical: ruumi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *korter
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *ruumi
|
||||
probability: 0.1
|
||||
numeric_probability: 1.0 # e.g. korter 1
|
||||
|
||||
# Separate random probability for adding directions like 2P, 2V, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.005
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Korter vasakule
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
375
resources/addresses/eu.yaml
Normal file
375
resources/addresses/eu.yaml
Normal file
@@ -0,0 +1,375 @@
|
||||
# eu.yaml
|
||||
# -------
|
||||
# Basque language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
# If no floor number is specified
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.4
|
||||
alphanumeric_probability: 0.6
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- level
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.85
|
||||
- separator: "/"
|
||||
probability: 0.15
|
||||
probability: 0.7
|
||||
|
||||
|
||||
and:
|
||||
default: &eta
|
||||
canonical: eta
|
||||
abbreviated: "&"
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.1
|
||||
|
||||
house_numbers:
|
||||
# zenbakirik gabe (zk.g) addresses
|
||||
no_number:
|
||||
default:
|
||||
canonical: zenbakirik gabe
|
||||
abbreviated: zk.g
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.3
|
||||
|
||||
no_number_probability: 0.1 # With this probability, use sense número if no house_number is specified
|
||||
|
||||
levels:
|
||||
floor: &solairua
|
||||
canonical: solairua
|
||||
abbreviated: sol
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. 2. solairua
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.25
|
||||
ordinal_probability: 0.75
|
||||
# Ground floor
|
||||
beheko_solairua: &beheko_solairua
|
||||
canonical: beheko solairua
|
||||
abbreviated: beheko sol
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.1
|
||||
behe_solairua: &behe_solairua
|
||||
canonical: behe-solairua
|
||||
abbreviated: behe-sol
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.1
|
||||
aliases:
|
||||
"0":
|
||||
default: *beheko_solairua
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *behe_solairua
|
||||
probability: 0.4
|
||||
- alternative: *solairua
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *solairua
|
||||
numeric_probability: 0.99
|
||||
alpha_probability: 0.01
|
||||
|
||||
blocks:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: blokea
|
||||
abbreviated: bl
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.2
|
||||
ordinal_probability: 0.8
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: gertu
|
||||
|
||||
nearby:
|
||||
default:
|
||||
canonical: gertuko
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: hemen gertu
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: hemen
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: me gertu
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.7
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
|
||||
cross_streets:
|
||||
and: *eta
|
||||
txoko: &txoko
|
||||
canonical: txoko
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
|
||||
intersection:
|
||||
default: *eta
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *txoko
|
||||
probability: 0.2
|
||||
|
||||
between:
|
||||
canonical: arteko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probabililty: 0.5
|
||||
|
||||
|
||||
po_boxes:
|
||||
posta_kutxa: &posta_kutxa
|
||||
canonical: posta-kutxa
|
||||
abbreviated: p.-ku
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *posta_kutxa
|
||||
numeric_probability: 0.9 # P.-Ku 123
|
||||
alpha_probability: 0.05 # P.-Ku A
|
||||
numeric_plus_alpha_probability: 0.04 # P.-Ku 123G
|
||||
alpha_plus_numeric_probability: 0.01 # P.-Ku A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: posta-kodea
|
||||
abbreviated: p.-k
|
||||
sample: true
|
||||
canonical_probability: 0.01
|
||||
abbreviated_probability: 0.9
|
||||
sample_probability: 0.09
|
||||
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
numeric_affix:
|
||||
affix: p.-k.
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.7
|
||||
numeric_probability: 0.18
|
||||
numeric_affix_probability: 0.12
|
||||
strict_numeric: true
|
||||
|
||||
directions:
|
||||
right: &eskuina
|
||||
canonical: eskuina
|
||||
abbreviated: esk
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: esk.
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.9
|
||||
numeric_affix_probability: 0.1
|
||||
left: &ezkerkada
|
||||
canonical: ezkerkada
|
||||
abbreviated: ezk
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: ezk.
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.9
|
||||
numeric_affix_probability: 0.1
|
||||
ezkerreko: &ezkerreko
|
||||
canonical: ezkerreko
|
||||
abbreviated: ezk.-ko
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alternatives:
|
||||
- alternative: *eskuina
|
||||
probability: 0.5
|
||||
- alternative: *ezkerkada
|
||||
probability: 0.5
|
||||
|
||||
|
||||
entrances:
|
||||
sarrera: &sarrera
|
||||
canonical: sarrera
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Sarrera 1, Sarrera A, etc.
|
||||
alphanumeric:
|
||||
default: *sarrera
|
||||
numeric_probability: 0.1 # e.g. Sarrera 1
|
||||
alpha_probability: 0.85 # e.g. Sarrera A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *eskuina
|
||||
- alternative: *ezkerreko
|
||||
|
||||
staircases:
|
||||
eskailera: &eskailera
|
||||
canonical: eskailera
|
||||
abbreviated: eskra
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, Eskra A, Eskra 1, etc.
|
||||
default: *eskailera
|
||||
numeric_probability: 0.6 # e.g. Eskra 1
|
||||
alpha_probability: 0.35 # e.g. Eskra A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left # e.g. Ezk.-ko Eskra
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *eskuina
|
||||
- alternative: *ezkerreko
|
||||
|
||||
units:
|
||||
flat: &apartamentu
|
||||
canonical: apartamentu
|
||||
abbreviated: aptu
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
# If it's just puerta B, many times it's just e.g. 3o B for "tercero piso puerta B"
|
||||
null_phrase_probability: 0.15
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.6
|
||||
ordinal_probability: 0.4
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *apartamentu
|
||||
|
||||
# Separate random probability for adding directions like 2. Ezk, 2 Esk, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
add_direction_numeric: true # Only for numbers
|
||||
add_direction_standalone: true # A unit can be as simple as "D"
|
||||
|
||||
numeric_probability: 0.7 # e.g. 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. A1
|
||||
alpha_probability: 0.28 # e.g. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
403
resources/addresses/fi.yaml
Normal file
403
resources/addresses/fi.yaml
Normal file
@@ -0,0 +1,403 @@
|
||||
# fi.yaml
|
||||
# -------
|
||||
# Finnish language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.97
|
||||
alphanumeric_probability: 0.02
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- staircase
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: " "
|
||||
probability: 0.8
|
||||
- separator: "-"
|
||||
probability: 0.1
|
||||
- separator: "/"
|
||||
probability: 0.05
|
||||
- separator: " - "
|
||||
probability: 0.05
|
||||
probability: 0.85
|
||||
|
||||
numbers:
|
||||
default: &numero
|
||||
canonical: numero
|
||||
abbreviated: nro
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.4
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.7
|
||||
numeric_affix_probability: 0.3
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *numero
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &ja
|
||||
canonical: ja
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *ja
|
||||
corner_of: &kulmassa
|
||||
canonical: kulmassa
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *ja
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *kulmassa
|
||||
probability: 0.3
|
||||
|
||||
between:
|
||||
canonical: välillä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &kerros
|
||||
canonical: kerros
|
||||
abbreviated: krs
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *kerros
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: lähellä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
nearby:
|
||||
default:
|
||||
canonical: lähistöllä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: lähellä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tässä lähellä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: täällä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: lähellä minua
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.7
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
|
||||
directions:
|
||||
right: &oikea
|
||||
canonical: oikea
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
oikealla: &oikealla
|
||||
canonical: oikealla
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &vasen
|
||||
canonical: vasen
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
vasemmalla: &vasemmalla
|
||||
canonical: vasemmalla
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
alternatives:
|
||||
- alternative: *oikea
|
||||
probability: 0.25
|
||||
- alternative: *oikealla
|
||||
probability: 0.25
|
||||
- alternative: *vasen
|
||||
probability: 0.25
|
||||
- alternative: *vasemmalla
|
||||
probability: 0.25
|
||||
|
||||
cardinal_directions:
|
||||
east: &itaan
|
||||
canonical: itään
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
west: &lansi
|
||||
canonical: länsi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
north: &pohja
|
||||
canonical: pohja
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
south: &etela
|
||||
canonical: etelä
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
alternatives:
|
||||
- alternative: *pohja
|
||||
probability: 0.25
|
||||
- alternative: *itaan
|
||||
probability: 0.25
|
||||
- alternative: *etela
|
||||
probability: 0.25
|
||||
- alternative: *lansi
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
sissepaas: &sisaankaynti
|
||||
canonical: sisäänkäynti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Portaikko 1, Portaikko A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *sisaankaynti
|
||||
numeric_probability: 0.1 # e.g. Portaikko 1
|
||||
alpha_probability: 0.85 # e.g. Portaikko A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
portaikko: &portaikko
|
||||
canonical: portaikko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *portaikko
|
||||
alpha_probability: 1.0
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *pohja
|
||||
- alternative: *etela
|
||||
- alternative: *itaan
|
||||
- alternative: *lansi
|
||||
|
||||
po_boxes:
|
||||
postilokero: &postilokero
|
||||
canonical: postilokero
|
||||
abbreviated: pl
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # PL #1234
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postilokero
|
||||
numeric_probability: 0.9 # 123
|
||||
alpha_probability: 0.05 # A
|
||||
numeric_plus_alpha_probability: 0.04 # 123G
|
||||
alpha_plus_numeric_probability: 0.01 # A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
asunto: &asunto
|
||||
canonical: asunto
|
||||
abbreviated: as
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.3
|
||||
# as nro 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
ruumi: &huone
|
||||
canonical: huone
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *asunto
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *huone
|
||||
probability: 0.1
|
||||
numeric_probability: 1.0 # e.g. as 1
|
||||
|
||||
# Separate random probability for adding directions like 2O, 2V, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.005
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. asunto
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
951
resources/addresses/fr.yaml
Normal file
951
resources/addresses/fr.yaml
Normal file
@@ -0,0 +1,951 @@
|
||||
# Note: default config is for France. Canadian, Swiss, Belgian, and other
|
||||
# conventions go in country overrides
|
||||
|
||||
components:
|
||||
level:
|
||||
# If no floor number is specified
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.8
|
||||
- separator: "-"
|
||||
probability: 0.1
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.005
|
||||
|
||||
numbers:
|
||||
default: &numero
|
||||
canonical: numéro
|
||||
abbreviated: "nº"
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
sample_exclude:
|
||||
- "#" # Used in numeric affix. Needs to be quoted, otherwise it's a comment
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
# Probabilities for numbers
|
||||
numeric_probability: 0.7
|
||||
numeric_affix_probability: 0.3
|
||||
|
||||
and:
|
||||
default: &and
|
||||
canonical: et
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.25
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
house_numbers:
|
||||
# sans numéro (s/n) addresses
|
||||
no_number:
|
||||
canonical: sans numéro
|
||||
abbreviated: s/n
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *numero
|
||||
|
||||
alphanumeric_phrase_probability: 0.01
|
||||
no_number_probability: 0.05 # With this probability, use sin número if no house_number is specified
|
||||
|
||||
levels:
|
||||
floor: &etage
|
||||
canonical: étage
|
||||
abbreviated: ét
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
ordinal_probability: 0.25
|
||||
niveau: &niveau
|
||||
canonical: niveau
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
ordinal_probability: 0.25
|
||||
bel_etage: &bel_etage
|
||||
canonical: bel étage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
etage_noble: &etage_noble
|
||||
canonical: étage noble
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
dernier_etage: &dernier_etage
|
||||
canonical: dernier étage
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
basement: &sous_sol
|
||||
canonical: sous-sol
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.99
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
sub_basement: &soubassement
|
||||
canonical: soubassement
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 2
|
||||
number_subtract_abs_value: 1
|
||||
standalone_probability: 0.99
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
mezzanine: &entresol
|
||||
canonical: entresol
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# Ground floor
|
||||
rez_de_chaussee: &rez_de_chaussee
|
||||
canonical: rez-de-chaussée
|
||||
abbreviated: rdc
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.3
|
||||
rez_de_chaussee_bas: &rez_de_chaussee_bas
|
||||
canonical: rez-de-chaussée bas
|
||||
abbreviated: rcb
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
rez_de_chaussee_haut: &rez_de_chaussee_haut
|
||||
canonical: rez-de-chaussée haut
|
||||
abbreviated: rch
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
parterre: &parterre
|
||||
canonical: parterre
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
rez_de_jardin: &rez_de_jardin
|
||||
canonical: rez-de-jardin
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *sous_sol
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *soubassement
|
||||
probability: 0.3995
|
||||
- alternative: *etage
|
||||
probability: 0.0005
|
||||
"-1":
|
||||
default: *sous_sol
|
||||
probability: 0.9995
|
||||
alternatives:
|
||||
- alternative: *etage
|
||||
probability: 0.0005
|
||||
half_floors:
|
||||
default: *entresol
|
||||
"0":
|
||||
default: *rez_de_chaussee
|
||||
probability: 0.74
|
||||
alternatives:
|
||||
- alternative: *rez_de_jardin
|
||||
probability: 0.01
|
||||
- alternative: *rez_de_chaussee_bas
|
||||
probability: 0.1
|
||||
- alternative: *rez_de_chaussee_haut
|
||||
probability: 0.1
|
||||
- alternative: *etage
|
||||
probability: 0.05
|
||||
"1":
|
||||
default: *etage
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *bel_etage
|
||||
probability: 0.1
|
||||
- alternative: *etage_noble
|
||||
probability: 0.1
|
||||
top:
|
||||
default: *etage
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *dernier_etage
|
||||
probability: 0.1
|
||||
|
||||
alphanumeric:
|
||||
default: *etage
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *niveau
|
||||
probability: 0.05
|
||||
numeric_probability: 0.99
|
||||
alpha_probability: 0.01
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
|
||||
cross_streets:
|
||||
# 26th & 6th Avenue
|
||||
and: *and
|
||||
# 26th @ Broadway
|
||||
a: &a
|
||||
canonical: à
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
au: &au
|
||||
canonical: au
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
corner_of: &langle_de
|
||||
canonical: l'angle de
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
at_the_corner_of: &a_langle_de
|
||||
canonical: à l'angle de
|
||||
|
||||
intersection:
|
||||
default: *and
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *a
|
||||
probability: 0.025
|
||||
- alternative: *au
|
||||
probability: 0.025
|
||||
- alternative: *langle_de
|
||||
probability: 0.15
|
||||
- alternative: *a_langle_de
|
||||
probability: 0.1
|
||||
|
||||
# 26th betw 5th Ave and 6th Ave
|
||||
between:
|
||||
canonical: entre
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5 # Probability of using parentheses e.g. (between 5th and 6th)
|
||||
|
||||
directions:
|
||||
right: &droit
|
||||
canonical: droit
|
||||
abbreviated: dr
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.7
|
||||
numeric_affix_probability: 0.3
|
||||
left: &gauche
|
||||
canonical: gauche
|
||||
abbreviated: g
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: g
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
rear: &arriere
|
||||
canonical: arrière
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
front: &avant
|
||||
canonical: avant
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *droit
|
||||
probability: 0.49
|
||||
- alternative: *gauche
|
||||
probability: 0.49
|
||||
- alternative: *arriere
|
||||
probability: 0.01
|
||||
- alternative: *avant
|
||||
probability: 0.01
|
||||
|
||||
anteroposterior:
|
||||
alternatives:
|
||||
- alternative: *avant
|
||||
probability: 0.5
|
||||
- alternative: *arriere
|
||||
probability: 0.5
|
||||
|
||||
lateral:
|
||||
alternatives:
|
||||
- alternative: *droit
|
||||
probability: 0.5
|
||||
- alternative: *gauche
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &est
|
||||
canonical: est
|
||||
abbreviated: e
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &ouest
|
||||
canonical: ouest
|
||||
abbreviated: o
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &sud
|
||||
canonical: sud
|
||||
abbreviated: s
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *est
|
||||
probability: 0.25
|
||||
- alternative: *sud
|
||||
probability: 0.25
|
||||
- alternative: *ouest
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
entrance: &entrance
|
||||
canonical: entrance
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Entrance 1, Entrance A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *entrance
|
||||
numeric_probability: 0.1 # e.g. Entrance 1
|
||||
alpha_probability: 0.85 # e.g. Entrnace A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
modifier:
|
||||
direction: right # e.g. Entrance Nord
|
||||
direction_probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *ouest
|
||||
- alternative: *droit
|
||||
- alternative: *gauche
|
||||
- alternative: *arriere
|
||||
- alternative: *avant
|
||||
|
||||
staircases:
|
||||
escalier: &escalier
|
||||
canonical: escalier
|
||||
abbreviated: esc
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, Escalier A, Esc 1, etc.
|
||||
default: *escalier
|
||||
numeric_probability: 0.6 # e.g. Escalier 1
|
||||
alpha_probability: 0.35 # e.g. Escalier A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right # e.g. Escalier Izq
|
||||
direction_probability: 0.9
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *ouest
|
||||
- alternative: *droit
|
||||
- alternative: *gauche
|
||||
- alternative: *arriere
|
||||
- alternative: *avant
|
||||
|
||||
|
||||
po_boxes:
|
||||
boite_postal: &boite_postal
|
||||
canonical: boîte postale
|
||||
abbreviated: bp
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # BP No 1234
|
||||
numeric_probability: 1.0
|
||||
case_postal: &case_postal
|
||||
canonical: case postale
|
||||
abbreviated: cp
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # CP No 1234
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *boite_postal
|
||||
numeric_probability: 0.9 # BP 123
|
||||
alpha_probability: 0.05 # BP A
|
||||
numeric_plus_alpha_probability: 0.04 # BP 123G
|
||||
alpha_plus_numeric_probability: 0.01 # BP A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
units:
|
||||
flat: &appartement
|
||||
canonical: appartement
|
||||
abbreviated: app
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
unit: &unite
|
||||
canonical: unité
|
||||
abbreviated: u
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
suite: &suite
|
||||
canonical: suite
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.4
|
||||
office: &bureau
|
||||
canonical: bureau
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.3
|
||||
door: &porte
|
||||
canonical: porte
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
room: &salle
|
||||
canonical: salle
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
chambre: &chambre
|
||||
canonical: chambre
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
boite: &boite
|
||||
canonical: boîte
|
||||
abbreviated: bte
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
lot: &lotissement
|
||||
canonical: lotissement
|
||||
abbreviated: lot
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
parcelle: &parcelle
|
||||
canonical: parcelle
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
allotments:
|
||||
lot:
|
||||
default: *lotissement
|
||||
numeric_probability: 0.8
|
||||
alphanumeric_probability: 0.1
|
||||
alpha_probability: 0.1
|
||||
parcel:
|
||||
default: *parcelle
|
||||
numeric_probability: 0.3
|
||||
alphanumeric_probability: 0.3
|
||||
alpha_probability: 0.4
|
||||
lot_probability: 0.9
|
||||
parcel_probability: 0.06
|
||||
lot_plus_parcel_probability: 0.02
|
||||
parcel_plus_lot_probability: 0.02
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *appartement
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
# e.g. just plain #3 or No. 4
|
||||
- alternative: *numero
|
||||
probability: 0.05
|
||||
- alternative: *porte
|
||||
probability: 0.095
|
||||
- alternative: *chambre
|
||||
probability: 0.005
|
||||
numeric_probability: 0.9 # e.g. Appartement 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. Appartement A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2D, 2G, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Unité Gauche
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
|
||||
zones:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *bureau
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *suite
|
||||
probability: 0.2
|
||||
|
||||
numeric_probability: 0.9 # e.g. Bureau 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Bureau 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Bureau A1
|
||||
alpha_probability: 0.08 # e.g. Bureau A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
industrial:
|
||||
default: *lotissement
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *bureau
|
||||
probability: 0.3
|
||||
- alternative: *unite
|
||||
probability: 0.19
|
||||
- alternative: *parcelle
|
||||
probability: 0.01
|
||||
|
||||
numeric_probability: 0.9 # e.g. Lote 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Lote 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Lote A1
|
||||
alpha_probability: 0.08 # e.g. Lote A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
university:
|
||||
default: *salle
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *porte
|
||||
probability: 0.1
|
||||
|
||||
numeric_probability: 0.9 # e.g. Salle 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Salle 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Salle A1
|
||||
alpha_probability: 0.08 # e.g. Salle A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: près de
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: à coté de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: proche de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: proches de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: a cote de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: pres de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: aux environs de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: à proximité de
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: a proximite de
|
||||
probability: 0.05
|
||||
nearby:
|
||||
default:
|
||||
canonical: proche
|
||||
probability: 0.4
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: à coté
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: a cote
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: près d'ici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: près dici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: pres d'ici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: pres dici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: près de là
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: pres de la
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: par ici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: dans les alentours
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: à proximité de là
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: a proximite de la
|
||||
probability: 0.05
|
||||
near_me:
|
||||
default:
|
||||
canonical: proche de chez moi
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: près de moi
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: pres de moi
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: à proximité de moi
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: a proximite de moi
|
||||
probability: 0.1
|
||||
in:
|
||||
default:
|
||||
canonical: à
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: en
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: a
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: dans
|
||||
probability: 0.1
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
countries:
|
||||
# Belgium
|
||||
be:
|
||||
units:
|
||||
alphanumeric:
|
||||
default: *boite
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *appartement
|
||||
probability: 0.1
|
||||
# e.g. just plain #3 or No. 4
|
||||
- alternative: *numero
|
||||
probability: 0.05
|
||||
- alternative: *porte
|
||||
probability: 0.095
|
||||
- alternative: *chambre
|
||||
probability: 0.005
|
||||
# Canada
|
||||
ca:
|
||||
components:
|
||||
|
||||
unit:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- unit
|
||||
- house_number
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.04
|
||||
- separator: "-"
|
||||
probability: 0.95
|
||||
- separator: " - "
|
||||
probability: 0.01
|
||||
probability: 0.1
|
||||
levels:
|
||||
numbering_starts_at: 1
|
||||
aliases:
|
||||
"1":
|
||||
# Have to do this because etage is numeric
|
||||
# and has keys like "numeric_probability" which
|
||||
# we don't want to infect rez_de_chausee when doing
|
||||
# a recursive merge
|
||||
default: *etage
|
||||
probability: 0.1
|
||||
alternatives:
|
||||
- alternative: *rez_de_chaussee
|
||||
probability: 0.8
|
||||
- alternative: *bel_etage
|
||||
probability: 0.05
|
||||
- alternative: *etage_noble
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
alphanumeric:
|
||||
# More common to use in in Canada, as in the US
|
||||
use_floor_probability: 0.35
|
||||
|
||||
po_boxes:
|
||||
alphanumeric:
|
||||
default: *case_postal
|
||||
# Switzerland
|
||||
ch:
|
||||
levels:
|
||||
aliases:
|
||||
"0":
|
||||
default: *parterre
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *rez_de_chaussee
|
||||
probability: 0.05
|
||||
- alternative: *etage
|
||||
probability: 0.05
|
||||
po_boxes:
|
||||
alphanumeric:
|
||||
default: *case_postal
|
||||
269
resources/addresses/he.yaml
Normal file
269
resources/addresses/he.yaml
Normal file
@@ -0,0 +1,269 @@
|
||||
# he.yaml
|
||||
# -------
|
||||
# Hebrew language specification
|
||||
|
||||
|
||||
alphabet: אבגדהוזחטיכךלמםנןסעפףצץקרשת
|
||||
alphabet_probability: 0.8
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.05
|
||||
|
||||
entrance:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
unit:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- entrance
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.7
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- entrance
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: " "
|
||||
probability: 0.5
|
||||
- separator: ""
|
||||
probability: 0.2
|
||||
- separator: "/"
|
||||
probability: 0.1
|
||||
- separator: "-"
|
||||
probability: 0.1
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.7
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
|
||||
levels:
|
||||
koma: &koma
|
||||
canonical: קומה
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
koma_latin: &koma_latin
|
||||
canonical: koma
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
komat_karka: &komat_karka
|
||||
canonical: קומת קרקע
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
komat_karka_latin: &komat_karka_latin
|
||||
canonical: komát karká
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
|
||||
martef: &martef
|
||||
canonical: מרתף
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.985
|
||||
numeric_probability: 0.01
|
||||
ordinal_probability: 0.005
|
||||
martef_latin: &martef_latin
|
||||
canonical: martef
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: left
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.985
|
||||
numeric_probability: 0.01
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *martef
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *martef_latin
|
||||
probability: 0.1
|
||||
"-1":
|
||||
default: *martef
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *martef_latin
|
||||
probability: 0.1
|
||||
"0":
|
||||
default: *komat_karka
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *komat_karka_latin
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *koma
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *koma_latin
|
||||
probability: 0.1
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
entrances:
|
||||
knisa: &knisa
|
||||
canonical: כניסה
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
knisa_latin: &knisa_latin
|
||||
canonical: knisa
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# вход 1, вход A, etc.
|
||||
alphanumeric:
|
||||
default: *knisa
|
||||
probability: 0.99
|
||||
alternatives:
|
||||
- alternative: *knisa_latin
|
||||
probability: 0.01
|
||||
numeric_probability: 0.1
|
||||
alpha_probability: 0.9
|
||||
|
||||
po_boxes:
|
||||
ta_doar: &ta_doar
|
||||
canonical: תיבת דואר
|
||||
abbreviated: ת.ד.
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ta_doar_latin: &ta_doar_latin
|
||||
canonical: abonementnyy pochtovyy yashchik
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
default: *ta_doar
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *ta_doar_latin
|
||||
probability: 0.2
|
||||
numeric_probability: 0.9 # ta doar 123
|
||||
alpha_probability: 0.05 # ta doar А
|
||||
numeric_plus_alpha_probability: 0.04 # ta doar 123А
|
||||
alpha_plus_numeric_probability: 0.01 # ta doar А123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
dira: &dira
|
||||
canonical: דירה
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
dira_latin: &dira_latin
|
||||
canonical: dira
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *dira
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *dira_latin
|
||||
probability: 0.1
|
||||
|
||||
numeric_probability: 0.9 # e.g. dira 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1А
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. AА1
|
||||
alpha_probability: 0.04 # e.g. dira А
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
586
resources/addresses/hr.yaml
Normal file
586
resources/addresses/hr.yaml
Normal file
@@ -0,0 +1,586 @@
|
||||
# hr.yaml
|
||||
# -------
|
||||
# Croatian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.7
|
||||
alphanumeric_probability: 0.3
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
# For unit types like 2/34
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
|
||||
|
||||
numbers:
|
||||
no_number:
|
||||
default:
|
||||
canonical: bez broja
|
||||
abbreviated: bb
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
|
||||
default: &broj
|
||||
canonical: broj
|
||||
abbreviated: br
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "br."
|
||||
whitespace_probability: 0.6
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
alphanumeric_phrase_probability: 0.05
|
||||
no_number_probability: 0.05
|
||||
|
||||
|
||||
and:
|
||||
default: &i
|
||||
canonical: i
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
|
||||
cross_streets:
|
||||
i: *i
|
||||
at: &na
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &ugao
|
||||
canonical: ugao
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner_of: &uglu
|
||||
canonical: uglu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_uglu: &na_uglu
|
||||
canonical: na uglu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *i
|
||||
probability: 0.65
|
||||
alternatives:
|
||||
- alternative: *na
|
||||
probability: 0.1
|
||||
- alternative: *uglu
|
||||
probability: 0.1
|
||||
- alternative: *na_uglu
|
||||
probability: 0.1
|
||||
- alternative: *ugao
|
||||
probability: 0.05
|
||||
|
||||
izmedu: &izmedu
|
||||
canonical: između
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
between:
|
||||
default: *izmedu
|
||||
|
||||
levels:
|
||||
kat: &kat
|
||||
canonical: kat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etaza: &etaza
|
||||
canonical: etaža
|
||||
abbreviated: et
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
prizemlje: &prizemlje
|
||||
canonical: prizemlje
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
mezanino: &polukat
|
||||
canonical: polukat
|
||||
half_floors: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
sample: true
|
||||
# e.g. polukat 2
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. 2. entresuelo
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.1
|
||||
ordinal_probability: 0.2
|
||||
standalone_probability: 0.6
|
||||
podrum: &podrum
|
||||
canonical: podrum
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. подрум 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. подрум
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *podrum
|
||||
"-1":
|
||||
default: *podrum
|
||||
# Special token for half-floors
|
||||
half_floors:
|
||||
default: *polukat
|
||||
"0":
|
||||
default: *prizemlje
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *parter
|
||||
probability: 0.4
|
||||
- alternative: *kat
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *kat
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *etaza
|
||||
probability: 0.05
|
||||
numeric_probability: 0.69 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.3 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: u blizini
|
||||
nearby:
|
||||
default:
|
||||
canonical: u blizini
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: u blizini ovdje
|
||||
probability: 0.3
|
||||
- alternative:
|
||||
canonical: oko ovdje
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: u blizini mene
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: u
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &desno
|
||||
canonical: desno
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &lijevo
|
||||
canonical: lijevo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.5
|
||||
- alternative: *lijevo
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &istok
|
||||
canonical: istok
|
||||
abbreviated: i
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: i
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zapad
|
||||
canonical: zapad
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sjever
|
||||
canonical: sjever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &jug
|
||||
canonical: jug
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sjever
|
||||
probability: 0.25
|
||||
- alternative: *istok
|
||||
probability: 0.23
|
||||
- alternative: *jug
|
||||
probability: 0.23
|
||||
- alternative: *zapad
|
||||
probability: 0.23
|
||||
|
||||
entrances:
|
||||
ulaz: &ulaz
|
||||
canonical: ulaz
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Ulaz 1, Ulaz A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ulaz
|
||||
numeric_probability: 0.1 # e.g. Ulaz 1
|
||||
alpha_probability: 0.85 # e.g. Ulaz A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
staircases:
|
||||
stubiste: &stubiste
|
||||
canonical: stubište
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stubiste
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.2
|
||||
- alternative: *lijevo
|
||||
probability: 0.2
|
||||
- alternative: *sjever
|
||||
probability: 0.15
|
||||
- alternative: *jug
|
||||
probability: 0.15
|
||||
- alternative: *istok
|
||||
probability: 0.15
|
||||
- alternative: *zapad
|
||||
probability: 0.15
|
||||
|
||||
po_boxes:
|
||||
postanski_pretinac: &postanski_pretinac
|
||||
canonical: poštanski pretinac
|
||||
abbreviated: p.p
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *postanski_pretinac
|
||||
numeric_probability: 0.9 # pp 123
|
||||
alpha_probability: 0.05 # p.p A
|
||||
numeric_plus_alpha_probability: 0.04 # pp 123G
|
||||
alpha_plus_numeric_probability: 0.01 # pp A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
stan: &stan
|
||||
canonical: stan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
apartman: &apartman
|
||||
canonical: apartman
|
||||
abbreviated: ap
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
soba: &soba
|
||||
canonical: soba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ured: &ured
|
||||
canonical: ured
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *stan
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *apartman
|
||||
probability: 0.3
|
||||
- alternative: *soba
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. stan. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. stan A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *soba
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *ured
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *soba
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
439
resources/addresses/hu.yaml
Normal file
439
resources/addresses/hu.yaml
Normal file
@@ -0,0 +1,439 @@
|
||||
# hu.yaml
|
||||
# -------
|
||||
# Hungarian language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.2
|
||||
standalone_probability: 0.05
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- level
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.55
|
||||
- separator: " "
|
||||
probability: 0.4
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.8
|
||||
|
||||
|
||||
numbers:
|
||||
default: &szam
|
||||
canonical: szám
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
and:
|
||||
default: &es
|
||||
canonical: és
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: &es_a
|
||||
canonical: és a
|
||||
canonical_probability: 0.9
|
||||
sample: true
|
||||
sample_probability: 0.1
|
||||
probability: 0.2
|
||||
- alternative: &es_az
|
||||
canonical: és az
|
||||
canonical_probability: 0.9
|
||||
sample: true
|
||||
sample_probability: 0.1
|
||||
probability: 0.2
|
||||
|
||||
cross_streets:
|
||||
and: *es
|
||||
corner_of: &sarkan
|
||||
canonical: sarkán
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *es
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *es_a
|
||||
probability: 0.1
|
||||
- alternative: *es_az
|
||||
probability: 0.1
|
||||
- alternative: *sarkan
|
||||
probability: 0.2
|
||||
|
||||
between:
|
||||
canonical: között
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &emelet
|
||||
canonical: emelet
|
||||
abbreviated: em
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.85
|
||||
sample_probability: 0.05
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.2
|
||||
roman_numeral_probability: 0.8
|
||||
numeric_probability: 0.1
|
||||
ordinal_probability: 0.9
|
||||
foldszint: &foldszint
|
||||
canonical: földszint
|
||||
abbreviated: fszt
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
felemelet: &felemelet
|
||||
canonical: félemelet
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
magasfoldszint: &magasfoldszint
|
||||
canonical: magasföldszint
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
pince: &pince
|
||||
canonical: pince
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
alagsor: &alagsor
|
||||
canonical: alagsor
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
felszuteren: &felszuteren
|
||||
canonical: félszuterén
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
szuteren: &szuteren
|
||||
canonical: szuterén
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *alagsor
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *pince
|
||||
probability: 0.3
|
||||
- alternative: *szuteren
|
||||
probability: 0.1
|
||||
"-1":
|
||||
default: *alagsor
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *pince
|
||||
probability: 0.3
|
||||
- alternative: *szuteren
|
||||
probability: 0.1
|
||||
- alternative: *felszuteren
|
||||
probability: 0.1
|
||||
|
||||
"0":
|
||||
default: *foldszint
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *emelet
|
||||
probability: 0.1
|
||||
|
||||
"1":
|
||||
default: *emelet
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *felemelet
|
||||
probability: 0.1
|
||||
|
||||
"2":
|
||||
default: *emelet
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *magasfoldszint
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *emelet
|
||||
numeric_probability: 0.59 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.4 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: közelében
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
nearby:
|
||||
default:
|
||||
canonical: közelben
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: közelemben
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.7
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
|
||||
directions:
|
||||
right: &jobb
|
||||
canonical: jobb
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &bal
|
||||
canonical: bal
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *jobb
|
||||
probability: 0.5
|
||||
- alternative: *bal
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &kelet
|
||||
canonical: kelet
|
||||
abbreviated: k
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &nyugat
|
||||
canonical: nyugat
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &eszak
|
||||
canonical: észak
|
||||
abbreviated: e
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &del
|
||||
canonical: dél
|
||||
abbreviated: d
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *eszak
|
||||
probability: 0.25
|
||||
- alternative: *kelet
|
||||
probability: 0.25
|
||||
- alternative: *del
|
||||
probability: 0.25
|
||||
- alternative: *nyugat
|
||||
probability: 0.25
|
||||
|
||||
|
||||
po_boxes:
|
||||
postafiok: &postafiok
|
||||
canonical: postafiók
|
||||
abbreviated: pf
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric:
|
||||
default: *postafiok
|
||||
numeric_probability: 0.9 # Pf 123
|
||||
alpha_probability: 0.05 # Pf A
|
||||
numeric_plus_alpha_probability: 0.04 # Pf 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Pf A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
lakas: &lakas
|
||||
canonical: lakás
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.3
|
||||
ordinal_probability: 0.7
|
||||
iroda: &iroda
|
||||
canonical: iroda
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
szoba: &szoba
|
||||
canonical: szoba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *lakas
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *szoba
|
||||
probability: 0.1
|
||||
numeric_probability: 0.95 # e.g. m. 1
|
||||
numeric_plus_alpha_probability: 0.005 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.005 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. m. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.2
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *iroda
|
||||
numeric_probability: 0.95 # e.g. pokój 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pokój 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokój A1
|
||||
alpha_probability: 0.03 # e.g. pokój A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university: *commercial_unit_types
|
||||
459
resources/addresses/is.yaml
Normal file
459
resources/addresses/is.yaml
Normal file
@@ -0,0 +1,459 @@
|
||||
# da.yaml
|
||||
# -------
|
||||
# Danish language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- level
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.9
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- entrance
|
||||
- unit
|
||||
label: unit
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.9
|
||||
- separator: " - "
|
||||
probability: 0.1
|
||||
probability: 0.001
|
||||
|
||||
|
||||
numbers:
|
||||
default: &numer
|
||||
canonical: númer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *numer
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &og
|
||||
canonical: og
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *og
|
||||
corner_of: &horn_of
|
||||
canonical: horn af
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &a_horinu_a
|
||||
canonical: á horninu á
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *og
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *horn_of
|
||||
probability: 0.15
|
||||
- alternative: *a_horinu_a
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: milli
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &haeo
|
||||
canonical: hæð
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
jarohaeo: &jarohaeo
|
||||
canonical: jarðhæð
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
sample_probability: 0.7
|
||||
kjallara: &kjallara
|
||||
canonical: kjallara
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. 1 kjallara
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.8
|
||||
# e.g. k1
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: left
|
||||
# e.g. 1. kjallara
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kjallara
|
||||
"-1":
|
||||
default: *kjallara
|
||||
"0":
|
||||
default: *jarohaeo
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *haeo
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: nálægt
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
nearby:
|
||||
default:
|
||||
canonical: nálægt
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: nálægt hér
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: hérna
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: hér
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: nálægt mér
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: í
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
|
||||
directions:
|
||||
right: &til_haegri
|
||||
canonical: til hægri
|
||||
abbreviated: t.h
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: t.h
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &til_vinstri
|
||||
canonical: til vinstri
|
||||
abbreviated: t.v
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: t.v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
alternatives:
|
||||
- alternative: *til_haegri
|
||||
probability: 0.5
|
||||
- alternative: *til_vinstri
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &austur
|
||||
canonical: austur
|
||||
abbreviated: a
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: a
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &vestur
|
||||
canonical: vestur
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &norour
|
||||
canonical: norður
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &suour
|
||||
canonical: suður
|
||||
abbreviated: s
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *norour
|
||||
probability: 0.25
|
||||
- alternative: *austur
|
||||
probability: 0.25
|
||||
- alternative: *suour
|
||||
probability: 0.25
|
||||
- alternative: *vestur
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
inngangur: &inngangur
|
||||
canonical: inngangur
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Inngangur 1, Inngangur A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *inngangur
|
||||
numeric_probability: 0.1 # e.g. Inngangur 1
|
||||
alpha_probability: 0.85 # e.g. Inngangur A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
stiege: &stigi
|
||||
canonical: stigi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stigi
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *norour
|
||||
- alternative: *suour
|
||||
- alternative: *austur
|
||||
- alternative: *vestur
|
||||
|
||||
po_boxes:
|
||||
postholf: &postholf
|
||||
canonical: pósthólf
|
||||
abbreviated: ph
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Ph Nr 1234
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postholf
|
||||
numeric_probability: 0.9 # Ph 123
|
||||
alpha_probability: 0.05 # Ph A
|
||||
numeric_plus_alpha_probability: 0.04 # Ph 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Ph A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
ibuo: &ibuo
|
||||
canonical: íbúð
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.5
|
||||
# íbúð nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *ibuo
|
||||
numeric_probability: 0.9 # e.g. íbúð 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. íbúð A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2R, 2L, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
673
resources/addresses/it.yaml
Normal file
673
resources/addresses/it.yaml
Normal file
@@ -0,0 +1,673 @@
|
||||
# it.yaml
|
||||
# -------
|
||||
# Italian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
# If no floor number is specified
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 1.0
|
||||
probability: 0.5
|
||||
|
||||
numbers:
|
||||
default: &numero
|
||||
canonical: numero
|
||||
abbreviated: "nº"
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.5
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "n."
|
||||
direction: left
|
||||
# Probabilities for numbers
|
||||
numeric_probability: 0.7
|
||||
numeric_affix_probability: 0.3
|
||||
|
||||
and:
|
||||
default: &e
|
||||
canonical: e
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.25
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
house_numbers:
|
||||
# sans numéro (s/n) addresses
|
||||
no_number:
|
||||
canonical: senza numero civico
|
||||
abbreviated: snc
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *numero
|
||||
|
||||
alphanumeric_phrase_probability: 0.01
|
||||
no_number_probability: 0.05 # With this probability, use sin número if no house_number is specified
|
||||
|
||||
levels:
|
||||
floor: &piano
|
||||
canonical: piano
|
||||
abbreviated: pº
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.15
|
||||
sample_probability: 0.25
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.95
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
digits:
|
||||
ascii_probability: 0.9
|
||||
roman_numeral_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.55
|
||||
ordinal_probability: 0.45
|
||||
livello: &livello
|
||||
canonical: livello
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.75
|
||||
ordinal_probability: 0.25
|
||||
piano_nobile: &piano_nobile
|
||||
canonical: piano nobile
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
piano_terra: &piano_terra
|
||||
canonical: piano terra
|
||||
abbreviated: p.t
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.25
|
||||
sample_probability: 0.25
|
||||
basement: &seminterrato
|
||||
canonical: seminterrato
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
standalone_probability: 0.99
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *seminterrato
|
||||
probability: 0.995
|
||||
alternatives:
|
||||
- alternative: *piano
|
||||
probability: 0.005
|
||||
"-1":
|
||||
default: *seminterrato
|
||||
probability: 0.9995
|
||||
alternatives:
|
||||
- alternative: *piano
|
||||
probability: 0.0005
|
||||
"0":
|
||||
default: *piano_terra
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *piano
|
||||
probability: 0.05
|
||||
"1":
|
||||
default: *piano
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *piano_nobile
|
||||
probability: 0.1
|
||||
|
||||
alphanumeric:
|
||||
default: *piano
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *livello
|
||||
probability: 0.05
|
||||
numeric_probability: 0.99
|
||||
alpha_probability: 0.01
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
cross_streets:
|
||||
# 26th & 6th Avenue
|
||||
and: *e
|
||||
# 26th @ Broadway
|
||||
a: &a
|
||||
canonical: a
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
corner_of: &angolo_di
|
||||
canonical: angolo di
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
corner: &angolo
|
||||
canonical: angolo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
at_the_corner_of: &all_angolo_tra
|
||||
canonical: all'angolo tra
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
intersection:
|
||||
default: *e
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *a
|
||||
probability: 0.05
|
||||
- alternative: *angolo_di
|
||||
probability: 0.15
|
||||
- alternative: *all_angolo_tra
|
||||
probability: 0.1
|
||||
|
||||
# 26th betw 5th Ave and 6th Ave
|
||||
between:
|
||||
canonical: tra
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5 # Probability of using parentheses e.g. (between 5th and 6th)
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: vicino a
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: presso a
|
||||
probability: 0.25
|
||||
nearby:
|
||||
default:
|
||||
canonical: vicino
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: qui vicino
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nelle vicinanze
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: intorno a qui
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: vicino a me
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: a
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: ad
|
||||
probability: 0.15
|
||||
- alternative:
|
||||
canonical: in
|
||||
probability: 0.15
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
directions:
|
||||
right: &destra
|
||||
canonical: destra
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
left: &sinistra
|
||||
canonical: sinistra
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
rear: &posteriore
|
||||
canonical: posteriore
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
front: &anteriore
|
||||
canonical: anteriore
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *destra
|
||||
probability: 0.49
|
||||
- alternative: *sinistra
|
||||
probability: 0.49
|
||||
- alternative: *posteriore
|
||||
probability: 0.01
|
||||
- alternative: *anteriore
|
||||
probability: 0.01
|
||||
|
||||
anteroposterior:
|
||||
alternatives:
|
||||
- alternative: *anteriore
|
||||
probability: 0.5
|
||||
- alternative: *posteriore
|
||||
probability: 0.5
|
||||
|
||||
lateral:
|
||||
alternatives:
|
||||
- alternative: *destra
|
||||
probability: 0.5
|
||||
- alternative: *sinistra
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &est
|
||||
canonical: est
|
||||
abbreviated: e
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &ovest
|
||||
canonical: ovest
|
||||
abbreviated: o
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &sud
|
||||
canonical: sud
|
||||
abbreviated: s
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *est
|
||||
probability: 0.25
|
||||
- alternative: *sud
|
||||
probability: 0.25
|
||||
- alternative: *ovest
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
entrance: &ingresso
|
||||
canonical: ingresso
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Ingresso 1, Ingresso A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ingresso
|
||||
numeric_probability: 0.1 # e.g. Ingresso 1
|
||||
alpha_probability: 0.85 # e.g. Ingresso A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
modifier:
|
||||
direction: right # e.g. Ingresso Nord
|
||||
direction_probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *ovest
|
||||
- alternative: *destra
|
||||
- alternative: *sinistra
|
||||
- alternative: *posteriore
|
||||
- alternative: *anteriore
|
||||
|
||||
staircases:
|
||||
scala: &scala
|
||||
canonical: scala
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, Scala A, Scala 1, etc.
|
||||
default: *scala
|
||||
numeric_probability: 0.6 # e.g. Scala 1
|
||||
alpha_probability: 0.35 # e.g. Scala A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right # e.g. Scala Destra
|
||||
direction_probability: 0.9
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *ovest
|
||||
- alternative: *destra
|
||||
- alternative: *sinistra
|
||||
- alternative: *posteriore
|
||||
- alternative: *anteriore
|
||||
|
||||
|
||||
po_boxes:
|
||||
casella_postale: &casella_postale
|
||||
canonical: casella postale
|
||||
abbreviated: cp
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # CP No 1234
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
default: *casella_postale
|
||||
numeric_probability: 0.9 # CP 123
|
||||
alpha_probability: 0.05 # CP A
|
||||
numeric_plus_alpha_probability: 0.04 # CP 123G
|
||||
alpha_plus_numeric_probability: 0.01 # CP A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
units:
|
||||
flat: &appartamento
|
||||
canonical: appartamento
|
||||
abbreviated: app
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
casa: &casa
|
||||
canonical: casa
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
unit: &unita
|
||||
canonical: unità
|
||||
abbreviated: u
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
office: &officina
|
||||
canonical: officina
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.3
|
||||
lotto: &lotto
|
||||
canonical: lotto
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
door: &porta
|
||||
canonical: porta
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
interno: &interno
|
||||
canonical: interno
|
||||
abbreviated: int
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
room: &sala
|
||||
canonical: sala
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *appartamento
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *interno
|
||||
probability: 0.1
|
||||
# e.g. just plain #3 or No. 4
|
||||
- alternative: *numero
|
||||
probability: 0.05
|
||||
- alternative: *casa
|
||||
probability: 0.05
|
||||
- alternative: *porta
|
||||
probability: 0.045
|
||||
- alternative: *sala
|
||||
probability: 0.005
|
||||
numeric_probability: 0.9 # e.g. Appartement 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. Appartement A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2D, 2G, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Unité Gauche
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
|
||||
zones:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *officina
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *sala
|
||||
probability: 0.2
|
||||
|
||||
numeric_probability: 0.9 # e.g. Bureau 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Bureau 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Bureau A1
|
||||
alpha_probability: 0.08 # e.g. Bureau A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
industrial:
|
||||
default: *lotto
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *officina
|
||||
probability: 0.3
|
||||
- alternative: *unita
|
||||
probability: 0.2
|
||||
|
||||
numeric_probability: 0.9 # e.g. Lotto 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Lotto 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Lotto A1
|
||||
alpha_probability: 0.08 # e.g. Lotto A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
university:
|
||||
default: *sala
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *porta
|
||||
probability: 0.1
|
||||
|
||||
numeric_probability: 0.9 # e.g. Salle 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Salle 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Salle A1
|
||||
alpha_probability: 0.08 # e.g. Salle A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
161
resources/addresses/ja.yaml
Normal file
161
resources/addresses/ja.yaml
Normal file
@@ -0,0 +1,161 @@
|
||||
# ja.yaml
|
||||
# -------
|
||||
# Japanese language specification
|
||||
|
||||
whitespace: false
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.05
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 1.0
|
||||
conditional:
|
||||
- component: level
|
||||
probabilities:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.05
|
||||
- component: house_number
|
||||
probabilities:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
combinations:
|
||||
# Unit is just appended onto the house number
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 1.0
|
||||
probability: 1.0
|
||||
|
||||
numbers:
|
||||
default: &go
|
||||
canonical: 号
|
||||
numeric_affix:
|
||||
affix: 号
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
|
||||
blocks:
|
||||
alphanumeric:
|
||||
default: &ban
|
||||
canonical: 番
|
||||
numeric_affix:
|
||||
affix: 番
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: &banchi
|
||||
canonical: 番地
|
||||
numeric_affix:
|
||||
affix: 番地
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.1
|
||||
- alternative: &banchi_no
|
||||
canonical: 番地の
|
||||
numeric_affix:
|
||||
affix: 番地の
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.05
|
||||
numeric_probability: 1.0
|
||||
alphanumeric_phrase_probability: 0.4
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *go
|
||||
alphanumeric_phrase_probability: 0.4
|
||||
|
||||
levels:
|
||||
kai: &kai
|
||||
canonical: 階
|
||||
numeric_affix:
|
||||
affix: 階
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *kai
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
shishobako: &shishobako
|
||||
canonical: 私書箱
|
||||
numeric_affix:
|
||||
affix: 私書箱
|
||||
direction: left
|
||||
digits:
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *shishobako
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
metro_stations:
|
||||
alphanumeric:
|
||||
default: &eki
|
||||
canonical: 駅
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: 駅
|
||||
direction: right
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
alphanumeric_phrase_probability: 1.0
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: 〒
|
||||
numeric_affix:
|
||||
affix: 〒
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.1
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 0.9
|
||||
|
||||
units:
|
||||
alphanumeric:
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
180
resources/addresses/ja_rm.yaml
Normal file
180
resources/addresses/ja_rm.yaml
Normal file
@@ -0,0 +1,180 @@
|
||||
# ja_rm.yaml
|
||||
# ----------
|
||||
# Romaji (Romanized Japanese) language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.05
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 1.0
|
||||
conditional:
|
||||
- component: level
|
||||
probabilities:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.05
|
||||
- component: house_number
|
||||
probabilities:
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
|
||||
combinations:
|
||||
# Unit is just appended onto the house number
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 1.0
|
||||
probability: 1.0
|
||||
|
||||
numbers:
|
||||
default: &go
|
||||
canonical: go
|
||||
numeric_affix:
|
||||
affix: -go
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
blocks:
|
||||
alphanumeric:
|
||||
default: &ban
|
||||
canonical: ban
|
||||
numeric_affix:
|
||||
affix: -ban
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: &banchi
|
||||
canonical: banchi
|
||||
numeric_affix:
|
||||
affix: -ban
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.1
|
||||
- alternative: &banchi_no
|
||||
canonical: banchi-no
|
||||
numeric_affix:
|
||||
affix: -banchi-no
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.05
|
||||
numeric_probability: 1.0
|
||||
alphanumeric_phrase_probability: 0.4
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *go
|
||||
alphanumeric_phrase_probability: 0.4
|
||||
|
||||
levels:
|
||||
kai: &kai
|
||||
canonical: kai
|
||||
numeric_affix:
|
||||
affix: -kai
|
||||
upper_case: false
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
gai: &gai
|
||||
canonical: gai
|
||||
numeric_affix:
|
||||
affix: -gai
|
||||
upper_case: false
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *kai
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *gai
|
||||
probability: 0.4
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
shishobako: &shishobako
|
||||
canonical: shishobako
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *shishobako
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
metro_stations:
|
||||
alphanumeric:
|
||||
default: &eki
|
||||
canonical: eki
|
||||
numeric:
|
||||
direction: right
|
||||
title_case: false
|
||||
numeric_affix:
|
||||
affix: -eki
|
||||
title_case: false
|
||||
direction: right
|
||||
numeric_affix_probability: 1.0
|
||||
alphanumeric_phrase_probability: 1.0
|
||||
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
# This should still be the default in Romaji
|
||||
default:
|
||||
canonical: 〒
|
||||
numeric_affix:
|
||||
affix: 〒
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.1
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 0.9
|
||||
|
||||
units:
|
||||
alphanumeric:
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
122
resources/addresses/ko.yaml
Normal file
122
resources/addresses/ko.yaml
Normal file
@@ -0,0 +1,122 @@
|
||||
# ko.yaml
|
||||
# -------
|
||||
# Korean language specification
|
||||
|
||||
whitespace: false
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.15
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
numbers:
|
||||
combinations:
|
||||
# Unit is just appended onto the house number
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 1.0
|
||||
probability: 1.0
|
||||
|
||||
numbers:
|
||||
default: &ho
|
||||
canonical: 호
|
||||
numeric_affix:
|
||||
affix: 호
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: &ho_traditional
|
||||
canonical: 號
|
||||
numeric_affix:
|
||||
affix: 號
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.1
|
||||
|
||||
levels:
|
||||
cheung: &cheung
|
||||
canonical: 층
|
||||
numeric_affix:
|
||||
affix: 층
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *cheung
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
saseoham: &saseoham
|
||||
canonical: 사서함
|
||||
numeric_affix:
|
||||
affix: 사서함
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *saseoham
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default: &upyeon_beonho
|
||||
canonical: 우편번호
|
||||
numeric_affix:
|
||||
affix: 우편번호
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.9
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 0.1
|
||||
|
||||
units:
|
||||
alphanumeric:
|
||||
default: *ho
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *ho_traditional
|
||||
probability: 0.1
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
90
resources/addresses/ko_rm.yaml
Normal file
90
resources/addresses/ko_rm.yaml
Normal file
@@ -0,0 +1,90 @@
|
||||
# ko_rm.yaml
|
||||
# ----------
|
||||
# Romanized Korean language specification
|
||||
|
||||
whitespace: false
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.15
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
numbers:
|
||||
combinations:
|
||||
# Unit is just appended onto the house number
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 1.0
|
||||
probability: 1.0
|
||||
|
||||
numbers:
|
||||
default: &ho
|
||||
canonical: ho
|
||||
numeric_affix:
|
||||
affix: -ho
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
levels:
|
||||
cheung: &cheung
|
||||
canonical: cheung
|
||||
numeric_affix:
|
||||
affix: -cheung
|
||||
upper_case: false
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *cheung
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
saseoham: &saseoham
|
||||
canonical: saseoham
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
default: *saseoham
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
alphanumeric:
|
||||
default: *ho
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
391
resources/addresses/lt.yaml
Normal file
391
resources/addresses/lt.yaml
Normal file
@@ -0,0 +1,391 @@
|
||||
# lt.yaml
|
||||
# -------
|
||||
# Lithuanian language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.97
|
||||
alphanumeric_probability: 0.02
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.95
|
||||
- separator: " - "
|
||||
probability: 0.05
|
||||
probability: 0.8
|
||||
|
||||
|
||||
numbers:
|
||||
default: &numeris
|
||||
canonical: numeris
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
and:
|
||||
default: &ir
|
||||
canonical: ir
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
|
||||
cross_streets:
|
||||
and: *ir
|
||||
corner_of: &kampelis
|
||||
canonical: kampelis
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *ir
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *kampelis
|
||||
probability: 0.3
|
||||
|
||||
between:
|
||||
canonical: nuo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
|
||||
levels:
|
||||
aukstas: &aukstas
|
||||
canonical: aukštas
|
||||
abbreviated: auk
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
numeric_probability: 0.2
|
||||
ordinal_probability: 0.8
|
||||
aukste: &aukste
|
||||
<<: *aukstas
|
||||
canonical: aukšte
|
||||
# Ground floor
|
||||
pirmas_aukstas: &pirmas_aukstas
|
||||
canonical: pirmas aukštas
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
rusys: &rusys
|
||||
canonical: rūsys
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
standalone_probability: 1.0
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
rusyje: &rusyje
|
||||
canonical: rūsyje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. rūsyje 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. r1
|
||||
numeric_affix:
|
||||
affix: r
|
||||
direction: left
|
||||
# e.g. 1. rūsyje
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.1
|
||||
ordinal_probability: 0.4
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *rusyje
|
||||
"-1":
|
||||
default: *rusys
|
||||
"0": &ground_floor
|
||||
default: *pirmas_aukstas
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *aukste
|
||||
probability: 0.3
|
||||
- alternative: *aukstas
|
||||
probability: 0.1
|
||||
"1": *ground_floor
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *aukstas
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
directions:
|
||||
right: &desineje
|
||||
canonical: dešinėje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &kaireje
|
||||
canonical: kairėje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *desineje
|
||||
probability: 0.5
|
||||
- alternative: *kaireje
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &rytai
|
||||
canonical: rytai
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
west: &vakarai
|
||||
canonical: vakarai
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
north: &siaure
|
||||
canonical: šiaurė
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
south: &pietus
|
||||
canonical: pietūs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
alternatives:
|
||||
- alternative: *siaure
|
||||
probability: 0.25
|
||||
- alternative: *rytai
|
||||
probability: 0.25
|
||||
- alternative: *pietus
|
||||
probability: 0.25
|
||||
- alternative: *vakarai
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
wejscie: &iejimas
|
||||
canonical: įėjimas
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# įėjimas 1, įėjimas A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *iejimas
|
||||
numeric_probability: 0.1 # e.g. įėjimas 1
|
||||
alpha_probability: 0.85 # e.g. įėjimas A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
laiptai: &laiptai
|
||||
canonical: laiptai
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *laiptai
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *siaure
|
||||
- alternative: *rytai
|
||||
- alternative: *pietus
|
||||
- alternative: *vakarai
|
||||
|
||||
|
||||
po_boxes:
|
||||
pasto_dezute: &pasto_dezute
|
||||
canonical: pašto dėžutė
|
||||
abbreviated: p d
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.5
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # pašto dėžutė 1234
|
||||
alphanumeric:
|
||||
default: *pasto_dezute
|
||||
numeric_probability: 0.95 # P. d. 123
|
||||
alpha_probability: 0.01 # pašto dėžutė A
|
||||
numeric_plus_alpha_probability: 0.03 # P. d. 123G
|
||||
alpha_plus_numeric_probability: 0.01 # P. d. A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
butas: &butas
|
||||
canonical: butas
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
biuro: &biuro
|
||||
canonical: biuro
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
kambarys: &kambarys
|
||||
canonical: kambarys
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *butas
|
||||
numeric_probability: 0.9 # e.g. butas 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. butas A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *biuro
|
||||
numeric_probability: 0.95 # e.g. biuro 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. biuro 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. biuro A1
|
||||
alpha_probability: 0.03 # e.g. biuro A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *kambarys
|
||||
numeric_probability: 0.95 # e.g. kambarys 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. kambarys 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. kambarys A1
|
||||
alpha_probability: 0.03 # e.g. kambarys A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
403
resources/addresses/lv.yaml
Normal file
403
resources/addresses/lv.yaml
Normal file
@@ -0,0 +1,403 @@
|
||||
# lv.yaml
|
||||
# -------
|
||||
# Latvian language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.97
|
||||
alphanumeric_probability: 0.02
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.95
|
||||
- separator: " - "
|
||||
probability: 0.05
|
||||
probability: 0.2
|
||||
|
||||
|
||||
numbers:
|
||||
default: &numurs
|
||||
canonical: numurs
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
and:
|
||||
default: &un
|
||||
canonical: un
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
|
||||
cross_streets:
|
||||
and: *un
|
||||
corner_of: &sturis
|
||||
canonical: stūris
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &sturi
|
||||
canonical: stūrī
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *un
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *sturi
|
||||
probability: 0.2
|
||||
- alternative: *sturis
|
||||
probability: 0.1
|
||||
|
||||
between:
|
||||
canonical: starp
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
|
||||
levels:
|
||||
stavs: &stavs
|
||||
canonical: stāvs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
whitespace_probability: 0.5 # sometimes should be 2.stāvs
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
# Needs to be 1.0 so we don't get e.g. IIstāvs
|
||||
ordinal_suffix_probability: 1.0
|
||||
numeric_probability: 0.2
|
||||
ordinal_probability: 0.8
|
||||
|
||||
# Ground floor
|
||||
pirmais_stavs: &pirmais_stavs
|
||||
canonical: pirmais stāvs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
pagrabs: &pagrabs
|
||||
canonical: pagrabs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
standalone_probability: 1.0
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
pagraba: &pagraba
|
||||
canonical: pagraba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. pagraba 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. p1
|
||||
numeric_affix:
|
||||
affix: p
|
||||
direction: left
|
||||
# e.g. 1. pagraba
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.1
|
||||
ordinal_probability: 0.4
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *pagraba
|
||||
"-1":
|
||||
default: *pagrabs
|
||||
"0": &ground_floor
|
||||
default: *pirmais_stavs
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *stavs
|
||||
probability: 0.4
|
||||
"1": *ground_floor
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *stavs
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
directions:
|
||||
right: &pa_labi
|
||||
canonical: pa labi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &pa_kreisi
|
||||
canonical: pa kreisi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *pa_labi
|
||||
probability: 0.5
|
||||
- alternative: *pa_kreisi
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &austrumu
|
||||
canonical: austrumu
|
||||
abbreviated: a
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.05
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: a
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &rietumu
|
||||
canonical: rietumu
|
||||
abbreviated: r
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.05
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: r
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &ziemelu
|
||||
canonical: ziemeļu
|
||||
abbreviated: z
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.05
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
|
||||
south: &dienvidu
|
||||
canonical: dienvidu
|
||||
abbreviated: d
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.05
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *ziemelu
|
||||
probability: 0.25
|
||||
- alternative: *dienvidu
|
||||
probability: 0.25
|
||||
- alternative: *austrumu
|
||||
probability: 0.25
|
||||
- alternative: *rietumu
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
ieeja: &ieeja
|
||||
canonical: ieeja
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# ieeja 1, ieeja A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ieeja
|
||||
numeric_probability: 0.1 # e.g. ieeja 1
|
||||
alpha_probability: 0.85 # e.g. ieeja A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
kapnu: &kapnu
|
||||
canonical: kāpņu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
kapnu_telpa: &kapnu_telpa
|
||||
canonical: kāpņu telpa
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *kapnu
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *kapnu_telpa
|
||||
probability: 0.4
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *pa_labi
|
||||
- alternative: *pa_kreisi
|
||||
- alternative: *ziemelu
|
||||
- alternative: *dienvidu
|
||||
- alternative: *austrumu
|
||||
- alternative: *rietumu
|
||||
|
||||
|
||||
units:
|
||||
dzivoklis: &dzivoklis
|
||||
canonical: dzīvoklis
|
||||
abbreviated: dz
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.8
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
birojs: &birojs
|
||||
canonical: birojs
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
istaba: &istaba
|
||||
canonical: istaba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *dzivoklis
|
||||
numeric_probability: 0.9 # e.g. m. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. m. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *birojs
|
||||
numeric_probability: 0.95 # e.g. birojs 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. birojs 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. birojs A1
|
||||
alpha_probability: 0.03 # e.g. birojs A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *istaba
|
||||
numeric_probability: 0.95 # e.g. istaba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. istaba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. istaba A1
|
||||
alpha_probability: 0.03 # e.g. istaba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
563
resources/addresses/nb.yaml
Normal file
563
resources/addresses/nb.yaml
Normal file
@@ -0,0 +1,563 @@
|
||||
# nb.yaml
|
||||
# -------
|
||||
# Norwegian language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85
|
||||
alphanumeric_probability: 0.1
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
# Bolignummer
|
||||
-
|
||||
components:
|
||||
- level
|
||||
- unit
|
||||
label: unit
|
||||
zero_pad_digits: 2
|
||||
separators:
|
||||
- separator: ""
|
||||
probability: 1.0
|
||||
probability: 0.05
|
||||
|
||||
|
||||
numbers:
|
||||
default: &nummer
|
||||
canonical: nummer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *nummer
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &og
|
||||
canonical: og
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *og
|
||||
corner_of: &hjorne_av
|
||||
canonical: hjørne av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &pa_hjornet_av
|
||||
canonical: på hjørnet av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *og
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hjorne_av
|
||||
probability: 0.15
|
||||
- alternative: *pa_hjornet_av
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: mellom
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &etasje
|
||||
canonical: etasje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.9
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
hovedetasje: &hovedetasje
|
||||
canonical: hovedetasje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: h
|
||||
direction: left
|
||||
zero_pad: 2
|
||||
numeric_probability: 0.1
|
||||
numeric_affix_probability: 0.9
|
||||
underetasje: &underetasje
|
||||
canonical: underetasje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: u
|
||||
direction: left
|
||||
zero_pad: 2
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.1
|
||||
numeric_affix_probability: 0.9
|
||||
loftsetasje: &loftsetasje
|
||||
canonical: loftsetasje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: l
|
||||
direction: left
|
||||
zero_pad: 2
|
||||
numeric_probability: 0.1
|
||||
numeric_affix_probability: 0.9
|
||||
loft: &loft
|
||||
canonical: loft
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
kjeller: &kjeller
|
||||
canonical: kjeller
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. 1 kjeller
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.8
|
||||
# e.g. k01
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: left
|
||||
zero_pad: 2
|
||||
# e.g. 1. k
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.9
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.09
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kjeller
|
||||
"-1":
|
||||
default: *kjeller
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *etasje
|
||||
probability: 0.05
|
||||
- alternative: *underetasje
|
||||
probability: 0.1
|
||||
|
||||
"top":
|
||||
default: *etasje
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *loftsetasje
|
||||
probability: 0.1
|
||||
- alternative: *loft
|
||||
probability: 0.05
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *etasje
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *hovedetasje
|
||||
probability: 0.05
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: i nærheten av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: nær
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
nearby:
|
||||
default:
|
||||
canonical: i nærheten
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: rundt her
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: nær
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: nær meg
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: i nærheten av meg
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.4
|
||||
|
||||
in:
|
||||
default:
|
||||
canonical: i
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &hoyre
|
||||
canonical: høyre
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
sample_probability: 0.9
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: h
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &venstre
|
||||
canonical: venstre
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
sample_probability: 0.9
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
alternatives:
|
||||
- alternative: *hoyre
|
||||
probability: 0.5
|
||||
- alternative: *venstre
|
||||
probability: 0.5
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &ost
|
||||
canonical: øst
|
||||
abbreviated: ø
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: ø
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &vest
|
||||
canonical: vest
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &syd
|
||||
canonical: syd
|
||||
abbreviated: s
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *ost
|
||||
probability: 0.25
|
||||
- alternative: *syd
|
||||
probability: 0.25
|
||||
- alternative: *vest
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
inngang: &inngang
|
||||
canonical: inngang
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *inngang
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
stiege: &stiege
|
||||
canonical: stiege
|
||||
abbreviated: stg
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
trapp: &trapp
|
||||
canonical: trapp
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *trapp
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *stiege
|
||||
probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *syd
|
||||
- alternative: *ost
|
||||
- alternative: *vest
|
||||
|
||||
po_boxes:
|
||||
postboks: &postboks
|
||||
canonical: postboks
|
||||
abbreviated: pb
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Pb No 1234
|
||||
boks: &boks
|
||||
canonical: boks
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Boks No 1234
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postboks
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *boks
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # Pb 123
|
||||
alpha_probability: 0.05 # Pb A
|
||||
numeric_plus_alpha_probability: 0.04 # Pb 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Pb A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
leilighet: &leilighet
|
||||
canonical: leilighet
|
||||
abbreviated: leil
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.3
|
||||
# Lejlighed nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
hus: &hus
|
||||
canonical: hus
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
vaerelse: &vaerelse
|
||||
canonical: værelse
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *leilighet
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *hus
|
||||
probability: 0.1
|
||||
- alternative: *vaerelse
|
||||
probability: 0.1
|
||||
numeric_probability: 0.95 # e.g. Lejlighed 1
|
||||
alpha_probability: 0.05 # e.g. Lejl A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2H, 2V, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.005
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Lejlighed Venstre
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.2
|
||||
|
||||
# Use the actual floor phrase as long as the whole phrase is numeric
|
||||
# Has the effect of creating Bolignummer-style units
|
||||
use_floor_affix_unit_num_digits: 2
|
||||
572
resources/addresses/nl.yaml
Normal file
572
resources/addresses/nl.yaml
Normal file
@@ -0,0 +1,572 @@
|
||||
# nl.yaml
|
||||
# -------
|
||||
# Note: base config covers Dutch as spoken in the Netherlands
|
||||
# Belgium overrides go in country configs
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85
|
||||
alphanumeric_probability: 0.1
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: /
|
||||
probability: 0.9
|
||||
- separator: "-"
|
||||
probability: 0.1
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "-"
|
||||
probability: 0.9
|
||||
- separator: /
|
||||
probability: 0.1
|
||||
probability: 0.01
|
||||
|
||||
|
||||
and:
|
||||
default: &en
|
||||
canonical: en
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
numbers:
|
||||
default: &nummer
|
||||
canonical: nummer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *nummer
|
||||
alphanumeric_phrase_probability: 0.01
|
||||
|
||||
levels:
|
||||
verdieping: &verdieping
|
||||
canonical: verdieping
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
roman_numeral_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.5
|
||||
roman_numeral_probability: 0.3
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.7
|
||||
ordinal_probability: 0.3
|
||||
etage: &etage
|
||||
canonical: etage
|
||||
abbreviated: et
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
roman_numeral_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.5
|
||||
roman_numeral_probability: 0.3
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.7
|
||||
ordinal_probability: 0.3
|
||||
begane_grond: &begane_grond
|
||||
canonical: begane grond
|
||||
abbreviated: bg
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.3
|
||||
benedenverdieping: &benedenverdieping
|
||||
canonical: benedenverdieping
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parterre: &parterre
|
||||
canonical: parterre
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
gelijkvloers: &gelijkvloers
|
||||
canonical: gelijkvloers
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
het_gelijkvloers: &het_gelijkvloers
|
||||
canonical: het gelijkvloers
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
aliases:
|
||||
"0":
|
||||
default: *begane_grond
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *benedenverdieping
|
||||
probability: 0.35
|
||||
- alternative: *parterre
|
||||
probability: 0.04
|
||||
- alternative: *het_gelijkvloers
|
||||
probability: 0.005
|
||||
- alternative: *gelijkvloers
|
||||
probability: 0.005
|
||||
alphanumeric:
|
||||
default: *verdieping
|
||||
probability: 0.99
|
||||
alternatives:
|
||||
- alternative: *etage
|
||||
probability: 0.01
|
||||
numeric_probability: 0.79 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.2 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: in de buurt van
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: bij
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nabij
|
||||
probability: 0.1
|
||||
nearby:
|
||||
default:
|
||||
canonical: in de buurt
|
||||
near_me:
|
||||
default:
|
||||
canonical: in de buurt van me
|
||||
|
||||
in:
|
||||
default:
|
||||
canonical: in
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: te
|
||||
probability: 0.4
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
|
||||
cross_streets:
|
||||
and: *en
|
||||
corner_of: &hoek_van
|
||||
canonical: hoek van
|
||||
at_the_corner_of: &op_de_hoek_van
|
||||
canonical: op de hoek van
|
||||
intersection:
|
||||
default: *en
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hoek_van
|
||||
probability: 0.15
|
||||
- alternative: *op_de_hoek_van
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: tussen
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
|
||||
entrances:
|
||||
ingang: &ingang
|
||||
canonical: ingang
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ingang
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
po_boxes:
|
||||
postbus: &postbus
|
||||
canonical: postbus
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
antwoordnummer: &antwoordnummer
|
||||
canonical: antwoordnummer
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *postbus
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *antwoordnummer
|
||||
probability: 0.2
|
||||
numeric_probability: 0.9 # 123
|
||||
alpha_probability: 0.05 # A
|
||||
numeric_plus_alpha_probability: 0.04 # 123G
|
||||
alpha_plus_numeric_probability: 0.01 # A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
directions:
|
||||
right: &rechts
|
||||
canonical: rechts
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: r
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &links
|
||||
canonical: links
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: l
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *rechts
|
||||
probability: 0.5
|
||||
- alternative: *links
|
||||
probability: 0.5
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &oost
|
||||
canonical: oost
|
||||
abbreviated: o
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: o
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
oosten: &oosten
|
||||
<<: *oost
|
||||
canonical: oosten
|
||||
|
||||
oostelijke: &oostelijke
|
||||
canonical: oostelijke
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
west: &west
|
||||
canonical: west
|
||||
abbreviated: w
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: w
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
westen: &westen
|
||||
<<: *west
|
||||
canonical: westen
|
||||
|
||||
westelijke: &westelijke
|
||||
canonical: westelijke
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
north: &noord
|
||||
canonical: noord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
noorden: &noorden
|
||||
<<: *noord
|
||||
canonical: noorden
|
||||
|
||||
noordelijke: &noordelijke
|
||||
canonical: noordelijke
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
south: &zuid
|
||||
canonical: zuid
|
||||
abbreviated: z
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
zuiden: &zuiden
|
||||
<<: *zuid
|
||||
canonical: zuiden
|
||||
|
||||
zuidelijke: &zuidelijke
|
||||
canonical: zuidelijke
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
alternatives:
|
||||
- alternative: *noord
|
||||
probability: 0.25
|
||||
- alternative: *oost
|
||||
probability: 0.25
|
||||
- alternative: *zuid
|
||||
probability: 0.25
|
||||
- alternative: *west
|
||||
probability: 0.25
|
||||
|
||||
|
||||
staircases:
|
||||
stiege: &stiege
|
||||
canonical: stiege
|
||||
abbreviated: stg
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
trap: &trap
|
||||
canonical: trap
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *trap
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *stiege
|
||||
probability: 0.4
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
units:
|
||||
appartement: &appartement
|
||||
canonical: appartement
|
||||
abbreviated: apt
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
kamer: &kamer
|
||||
canonical: kamer
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *appartement
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *kamer
|
||||
probability: 0.4
|
||||
numeric_probability: 0.9 # e.g. Apt 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. Apt A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2R, 2L, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Apt Rechts
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.1
|
||||
|
||||
|
||||
countries:
|
||||
be:
|
||||
components:
|
||||
unit:
|
||||
null_probability: 0.65
|
||||
alphanumeric_probability: 0.35
|
||||
|
||||
levels:
|
||||
verdieping: &verdieping_flemish
|
||||
canonical: verdieping
|
||||
abbreviated: verdiep
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.7
|
||||
ordinal_probability: 0.3
|
||||
|
||||
aliases:
|
||||
"0":
|
||||
default: *het_gelijkvloers
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *gelijkvloers
|
||||
probability: 0.5
|
||||
alphanumeric:
|
||||
default: *verdieping_flemish
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *etage
|
||||
probability: 0.1
|
||||
|
||||
units:
|
||||
bus: &bus
|
||||
canonical: bus
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric:
|
||||
default: *appartement
|
||||
probability: 0.1
|
||||
alternatives:
|
||||
- alternative: *bus
|
||||
probability: 0.7
|
||||
- alternative: *kamer
|
||||
probability: 0.2
|
||||
509
resources/addresses/pl.yaml
Normal file
509
resources/addresses/pl.yaml
Normal file
@@ -0,0 +1,509 @@
|
||||
# pl.yaml
|
||||
# -------
|
||||
# Polish language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.04
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.9
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
- separator: " - "
|
||||
probability: 0.05
|
||||
probability: 0.01
|
||||
|
||||
numbers:
|
||||
default: &numer
|
||||
canonical: numer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
dom: &dom
|
||||
canonical: dom
|
||||
abbreviated: d
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric:
|
||||
default: *numer
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *dom
|
||||
probability: 0.4
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
and:
|
||||
default: &i
|
||||
canonical: i
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *i
|
||||
at: &w
|
||||
canonical: w
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner_of: &rogu
|
||||
canonical: rogu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &na_rogu
|
||||
canonical: na rogu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *i
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *w
|
||||
probability: 0.1
|
||||
- alternative: *rogu
|
||||
probability: 0.1
|
||||
- alternative: *na_rogu
|
||||
probability: 0.1
|
||||
|
||||
between:
|
||||
canonical: pomiędzy
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &pietro
|
||||
canonical: piętro
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
ordinal_suffix_probability: 0.6
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
suterena: &suterena
|
||||
canonical: suterena
|
||||
# e.g. suterena 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. s1
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: left
|
||||
# e.g. 1. suterena
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *suterena
|
||||
"-1":
|
||||
default: *suterena
|
||||
"0":
|
||||
default: *parter
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *pietro
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *pietro
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: w pobliżu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: blisko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: koło
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: niedaleko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: obok
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: przy
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
nearby:
|
||||
default:
|
||||
canonical: w pobliżu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: w pobliżu tutaj
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: wokół tutaj
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: blisko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: w pobliżu mnie
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: w
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: we
|
||||
probability: 0.3
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
directions:
|
||||
right: &prawo
|
||||
canonical: prawo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &lewo
|
||||
canonical: lewo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *prawo
|
||||
probability: 0.5
|
||||
- alternative: *lewo
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &wschod
|
||||
canonical: wschód
|
||||
abbreviated: w
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: w
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zachod
|
||||
canonical: zachód
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &polnoc
|
||||
canonical: północ
|
||||
abbreviated: pn
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: pn
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &poludnie
|
||||
canonical: południe
|
||||
abbreviated: pd
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: pd
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *polnoc
|
||||
probability: 0.25
|
||||
- alternative: *wschod
|
||||
probability: 0.25
|
||||
- alternative: *poludnie
|
||||
probability: 0.25
|
||||
- alternative: *zachod
|
||||
probability: 0.25
|
||||
|
||||
|
||||
entrances:
|
||||
wejscie: &wejscie
|
||||
canonical: wejście
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Wejście 1, Wejście A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *wejscie
|
||||
numeric_probability: 0.1 # e.g. Wejście 1
|
||||
alpha_probability: 0.85 # e.g. Wejście A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
schody: &schody
|
||||
canonical: schody
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *schody
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *polnoc
|
||||
- alternative: *poludnie
|
||||
- alternative: *wschod
|
||||
- alternative: *zachod
|
||||
|
||||
|
||||
po_boxes:
|
||||
skrytka_pocztowa: &skrytka_pocztowa
|
||||
canonical: skrytka pocztowa
|
||||
abbreviated: skr poczt
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Skr Poczt 1234
|
||||
alphanumeric:
|
||||
default: *skrytka_pocztowa
|
||||
numeric_probability: 0.9 # Skr Poczt 123
|
||||
alpha_probability: 0.05 # Skr Poczt A
|
||||
numeric_plus_alpha_probability: 0.04 # Skr Poczt 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Skr Poczt A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
mieszkanie: &mieszkanie
|
||||
canonical: mieszkanie
|
||||
abbreviated: m
|
||||
sample: true
|
||||
canonical_probability: 0.05
|
||||
abbreviated_probability: 0.9
|
||||
sample_probability: 0.05
|
||||
numeric:
|
||||
direction: left
|
||||
pokoj: &pokoj
|
||||
canonical: pokój
|
||||
abbreviated: pok
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *mieszkanie
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *pokoj
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. m. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. m. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *pokoj
|
||||
numeric_probability: 0.95 # e.g. pokój 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pokój 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokój A1
|
||||
alpha_probability: 0.03 # e.g. pokój A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university: *commercial_unit_types
|
||||
1054
resources/addresses/pt.yaml
Normal file
1054
resources/addresses/pt.yaml
Normal file
File diff suppressed because it is too large
Load Diff
504
resources/addresses/ro.yaml
Normal file
504
resources/addresses/ro.yaml
Normal file
@@ -0,0 +1,504 @@
|
||||
# ro.yaml
|
||||
# -------
|
||||
# Romanian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
# If no floor number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.35
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.05
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.3
|
||||
alphanumeric_probability: 0.65
|
||||
standalone_probability: 0.05
|
||||
|
||||
numbers:
|
||||
default: &numar
|
||||
canonical: număr
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#" # e.g. #3, #2F, etc.
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
direction: left # affix goes on the number's left
|
||||
|
||||
# Probabilities for numbers
|
||||
numeric_probability: 0.9
|
||||
numeric_affix_probability: 0.1
|
||||
|
||||
and:
|
||||
default: &si
|
||||
canonical: și
|
||||
abbreviated: "&"
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.1
|
||||
|
||||
cross_streets:
|
||||
and: *si
|
||||
corner_of: &colt
|
||||
canonical: colț
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
at_the_corner_of: &la_coltul_de_pe
|
||||
canonical: la colțul de pe
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
intersection:
|
||||
default: *si
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *colt
|
||||
probability: 0.2
|
||||
- alternative: *la_coltul_de_pe
|
||||
probability: 0.1
|
||||
|
||||
between:
|
||||
canonical: între
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
parentheses_probabililty: 0.5
|
||||
|
||||
|
||||
house_numbers:
|
||||
# fara numar (FN) addresses
|
||||
no_number:
|
||||
default:
|
||||
canonical: fără număr
|
||||
abbreviated: fn
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.7
|
||||
sample_probability: 0.2
|
||||
alphanumeric:
|
||||
default: *numar
|
||||
|
||||
alphanumeric_phrase_probability: 0.7
|
||||
no_number_probability: 0.1 # With this probability, use fara numar if no house_number is specified
|
||||
|
||||
|
||||
|
||||
levels:
|
||||
floor: &etaj
|
||||
canonical: etaj
|
||||
abbreviated: et
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true # Occasionally add variation of "number", e.g. et. nr 2
|
||||
add_number_phrase_probability: 0.05
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
roman_numeral_probability: 0.2
|
||||
# Ground floor
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
aliases:
|
||||
"0":
|
||||
default: *parter
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *etaj
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *etaj
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
numeric_probability: 0.99
|
||||
alpha_probability: 0.01
|
||||
|
||||
blocks:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: bloc
|
||||
abbreviated: bl
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: in apropiere de
|
||||
|
||||
nearby:
|
||||
default:
|
||||
canonical: în apropiere
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: in apropiere
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: aproape de aici
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: aici
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: în jurul aici
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: in jurul aici
|
||||
probability: 0.05
|
||||
near_me:
|
||||
default:
|
||||
canonical: lângă mine
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: langa mine
|
||||
probability: 0.3
|
||||
in:
|
||||
default:
|
||||
canonical: din
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &dreapta
|
||||
canonical: dreapta
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
left: &stanga
|
||||
canonical: stânga
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
alternatives:
|
||||
- alternative: *dreapta
|
||||
probability: 0.5
|
||||
- alternative: *stanga
|
||||
probability: 0.5
|
||||
|
||||
|
||||
cardinal_directions:
|
||||
east: &est
|
||||
canonical: est
|
||||
abbreviated: e
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: e
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &vest
|
||||
canonical: vest
|
||||
abbreviated: v
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &nord
|
||||
canonical: nord
|
||||
abbreviated: n
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &sud
|
||||
canonical: sud
|
||||
abbreviated: s
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.6
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
probability: 0.25
|
||||
- alternative: *est
|
||||
probability: 0.25
|
||||
- alternative: *sud
|
||||
probability: 0.25
|
||||
- alternative: *vest
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
entrada: &intrare
|
||||
canonical: intrare
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Intrare 1, Intare A, etc.
|
||||
alphanumeric:
|
||||
default: *intrare
|
||||
numeric_probability: 0.1 # e.g. Intrare 1
|
||||
alpha_probability: 0.85 # e.g. Intrare A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *vest
|
||||
- alternative: *dreapta
|
||||
- alternative: *stanga
|
||||
|
||||
staircases:
|
||||
scara: &scara
|
||||
canonical: scara
|
||||
abbreviated: sc
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric:
|
||||
# For alphanumerics, Scara A, Scara 1, etc.
|
||||
default: *scara
|
||||
numeric_probability: 0.35 # e.g. Scara 1
|
||||
alpha_probability: 0.6 # e.g. Scara A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right # e.g. Scara Nord
|
||||
direction_probability: 0.8
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *nord
|
||||
- alternative: *sud
|
||||
- alternative: *est
|
||||
- alternative: *vest
|
||||
- alternative: *dreapta
|
||||
- alternative: *stanga
|
||||
|
||||
po_boxes:
|
||||
casuta_postala: &casuta_postala
|
||||
canonical: căsuță poștală
|
||||
abbreviated: cp
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.4 # Apdo No 1234
|
||||
numeric_probability: 1.0
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *casuta_postala
|
||||
numeric_probability: 0.9 # Apdo 123
|
||||
alpha_probability: 0.05 # Apdo A
|
||||
numeric_plus_alpha_probability: 0.04 # Apdo 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Apdo A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
apartament: &apartament
|
||||
canonical: apartament
|
||||
abbreviated: ap
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
sala: &sala
|
||||
canonical: sală
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
birou: &birou
|
||||
canonical: birou
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
lotul: &lotul
|
||||
canonical: lotul
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *apartament
|
||||
probability: 0.9
|
||||
sample: true
|
||||
alternatives:
|
||||
- alternative: *sala
|
||||
probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2o Izq, 2 Dcha, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.1
|
||||
add_direction_numeric: true # Only for numbers
|
||||
|
||||
numeric_probability: 0.9 # e.g. ap 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. ap 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. ap A1
|
||||
alpha_probability: 0.08 # e.g. ap A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
zones:
|
||||
residential: *unit_alphanumeric
|
||||
commercial:
|
||||
default: *birou
|
||||
numeric_probability: 0.9 # e.g. Birou 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Birou 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Birou A1
|
||||
alpha_probability: 0.08 # e.g. Birou A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
industrial:
|
||||
default: *lotul
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *birou
|
||||
probability: 0.3
|
||||
- alternative: *sala
|
||||
probability: 0.2
|
||||
|
||||
numeric_probability: 0.9 # e.g. Lotul 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Lotul 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Lotul A1
|
||||
alpha_probability: 0.08 # e.g. Lotul A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
university:
|
||||
default: *sala
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *birou
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. Sala 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. Sala 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. Sala A1
|
||||
alpha_probability: 0.08 # e.g. Sala A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
1171
resources/addresses/ru.yaml
Normal file
1171
resources/addresses/ru.yaml
Normal file
File diff suppressed because it is too large
Load Diff
603
resources/addresses/sk.yaml
Normal file
603
resources/addresses/sk.yaml
Normal file
@@ -0,0 +1,603 @@
|
||||
# sk.yaml
|
||||
# -------
|
||||
# Slovakian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.95
|
||||
alphanumeric_probability: 0.04
|
||||
standalone_probability: 0.01
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
# Note: no combinations because of the house numbering scheme
|
||||
|
||||
|
||||
numbers:
|
||||
default: &cislo
|
||||
canonical: číslo
|
||||
abbreviated: č
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "č."
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
and:
|
||||
default: &a
|
||||
canonical: a
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
conscription_numbers:
|
||||
alphanumeric:
|
||||
default:
|
||||
canonical: súpisné číslo
|
||||
abbreviated: s.č.
|
||||
canonical_probability: 0.05
|
||||
abbreviated_probability: 0.85
|
||||
sample: true
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
cross_streets:
|
||||
and: *a
|
||||
at: &na
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner_of: &rohu
|
||||
canonical: rohu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &roh
|
||||
canonical: roh
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &na_rohu
|
||||
canonical: na rohu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *a
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *na
|
||||
probability: 0.1
|
||||
- alternative: *roh
|
||||
probability: 0.1
|
||||
- alternative: *rohu
|
||||
probability: 0.1
|
||||
- alternative: *na_rohu
|
||||
probability: 0.1
|
||||
|
||||
between:
|
||||
canonical: medzi
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
levels:
|
||||
floor: &poschodie
|
||||
canonical: poschodie
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
podlazie: &podlazie
|
||||
canonical: podlažie
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
nadzemne_podlazie: &nadzemne_podlazie
|
||||
canonical: nadzemné podlažie
|
||||
abbreviated: np
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
abbreviated_probability: 0.8
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etaz: &etaz
|
||||
canonical: etáž
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
prizemie: &prizemie
|
||||
canonical: prízemie
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
podzemne_podlazie: &podzemne_podlazie
|
||||
canonical: podzemné podlažie
|
||||
abbreviated: pp
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.3
|
||||
# e.g. podzemné podlažie 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. pp1
|
||||
numeric_affix:
|
||||
affix: pp
|
||||
direction: left
|
||||
# e.g. 1. podzemné podlažie
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.985
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *podzemne_podlazie
|
||||
"-1":
|
||||
default: *podzemne_podlazie
|
||||
"0":
|
||||
default: *prizemie
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *poschodie
|
||||
probability: 0.05
|
||||
- alternative: *podlazie
|
||||
probability: 0.05
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *poschodie
|
||||
probability: 0.45
|
||||
alternatives:
|
||||
- alternative: *podlazie
|
||||
probability: 0.35
|
||||
- alternative: *nadzemne_podlazie
|
||||
probability: 0.19
|
||||
- alternative: *etaz
|
||||
probability: 0.01
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: v blízkosti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: u
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: v okolí
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: okolo
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
probability: 0.05
|
||||
nearby:
|
||||
default:
|
||||
canonical: blízkosti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.4
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: blízko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: v blízkosti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tady blízkosti
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tady
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: tu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: v blízkosti tu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
- alternative:
|
||||
canonical: v okolí
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.05
|
||||
near_me:
|
||||
default:
|
||||
canonical: v blízkosti mne
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: v
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: vo
|
||||
probability: 0.3
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &prava
|
||||
canonical: pravá
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &lava
|
||||
canonical: ľavá
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *prava
|
||||
probability: 0.5
|
||||
- alternative: *lava
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &vychod
|
||||
canonical: východ
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zapad
|
||||
canonical: západ
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sever
|
||||
canonical: sever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &juh
|
||||
canonical: juh
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
probability: 0.25
|
||||
- alternative: *vychod
|
||||
probability: 0.25
|
||||
- alternative: *juh
|
||||
probability: 0.25
|
||||
- alternative: *zapad
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
vchod: &vchod
|
||||
canonical: vchod
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Wejście 1, Wejście A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *vchod
|
||||
numeric_probability: 0.1 # e.g. Wejście 1
|
||||
alpha_probability: 0.85 # e.g. Wejście A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
schodisko: &schodisko
|
||||
canonical: schodisko
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *schodisko
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
- alternative: *juh
|
||||
- alternative: *vychod
|
||||
- alternative: *zapad
|
||||
|
||||
po_boxes:
|
||||
postova_priehradka: &postova_priehradka
|
||||
canonical: poštová priehradka
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # poštová priehradka 1234
|
||||
alphanumeric:
|
||||
default: *postova_priehradka
|
||||
numeric_probability: 0.9 # poštová priehradka 123
|
||||
alpha_probability: 0.05 # poštová priehradka A
|
||||
numeric_plus_alpha_probability: 0.04 # poštová priehradka 123G
|
||||
alpha_plus_numeric_probability: 0.01 # poštová priehradka A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
apartaman: &apartaman
|
||||
canonical: apartmán
|
||||
abbreviated: apt
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
izba: &izba
|
||||
canonical: izba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
kancelaria: &kancelaria
|
||||
canonical: kancelária
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *apartaman
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *izba
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. apt. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. apt. A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *izba
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *kancelaria
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. pokoj 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pokoj 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokoj A1
|
||||
alpha_probability: 0.03 # e.g. pokoj A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *izba
|
||||
numeric_probability: 0.95 # e.g. pokoj 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. pok 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. pokoj A1
|
||||
alpha_probability: 0.03 # e.g. pokoj A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
539
resources/addresses/sl.yaml
Normal file
539
resources/addresses/sl.yaml
Normal file
@@ -0,0 +1,539 @@
|
||||
# sl.yaml
|
||||
# -------
|
||||
# Slovenian language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.7
|
||||
alphanumeric_probability: 0.3
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
# For unit types like 2/34
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
|
||||
|
||||
numbers:
|
||||
no_number:
|
||||
default:
|
||||
canonical: brez številke
|
||||
abbreviated: brez št
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
|
||||
default: &stevilke
|
||||
canonical: številke
|
||||
abbreviated: št
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "št."
|
||||
whitespace_probability: 0.6
|
||||
direction: left
|
||||
numeric_probability: 0.6
|
||||
numeric_affix_probability: 0.4
|
||||
|
||||
alphanumeric_phrase_probability: 0.05
|
||||
no_number_probability: 0.05
|
||||
|
||||
|
||||
and:
|
||||
default: &in
|
||||
canonical: in
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
|
||||
cross_streets:
|
||||
i: *in
|
||||
at: &na
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &vogalu
|
||||
canonical: vogalu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_vogalu: &na_vogalu
|
||||
canonical: na vogalu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *in
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *na
|
||||
probability: 0.1
|
||||
- alternative: *vogalu
|
||||
probability: 0.15
|
||||
- alternative: *na_vogalu
|
||||
probability: 0.05
|
||||
|
||||
med: &med
|
||||
canonical: med
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
between:
|
||||
default: *med
|
||||
|
||||
levels:
|
||||
nadstropje: &nadstropje
|
||||
canonical: nadstropje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
pritlicje: &pritlicje
|
||||
canonical: pritličje
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
parter: &parter
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
kleti: &kleti
|
||||
canonical: kleti
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. kleti 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. kleti
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kleti
|
||||
"-1":
|
||||
default: *kleti
|
||||
"0":
|
||||
default: *pritlicje
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *parter
|
||||
probability: 0.4
|
||||
- alternative: *nadstropje
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *nadstropje
|
||||
numeric_probability: 0.69 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.3 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: v bližini
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: pri
|
||||
probability: 0.4
|
||||
|
||||
nearby:
|
||||
default:
|
||||
canonical: v bližini
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: v bližini tukaj
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.3
|
||||
- alternative:
|
||||
canonical: okoli tukaj
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: tukaj
|
||||
probability: 0.1
|
||||
|
||||
near_me:
|
||||
default:
|
||||
canonical: blizu mene
|
||||
|
||||
# Don't worry about agreement
|
||||
in:
|
||||
default:
|
||||
canonical: v
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
directions:
|
||||
right: &prav
|
||||
canonical: prav
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &levo
|
||||
canonical: levo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *prav
|
||||
probability: 0.5
|
||||
- alternative: *levo
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &vzhod
|
||||
canonical: vzhod
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zahod
|
||||
canonical: zahod
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sever
|
||||
canonical: sever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &jug
|
||||
canonical: jug
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
probability: 0.25
|
||||
- alternative: *vzhod
|
||||
probability: 0.23
|
||||
- alternative: *jug
|
||||
probability: 0.23
|
||||
- alternative: *zahod
|
||||
probability: 0.23
|
||||
|
||||
entrances:
|
||||
vhod: &vhod
|
||||
canonical: vhod
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Vhod 1, Vhod A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *vhod
|
||||
numeric_probability: 0.1 # e.g. Vhod 1
|
||||
alpha_probability: 0.85 # e.g. Vhod A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
staircases:
|
||||
stopnisce: &stopnisce
|
||||
canonical: stopnišče
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stopnisce
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *prav
|
||||
probability: 0.2
|
||||
- alternative: *levo
|
||||
probability: 0.2
|
||||
- alternative: *sever
|
||||
probability: 0.15
|
||||
- alternative: *jug
|
||||
probability: 0.15
|
||||
- alternative: *vzhod
|
||||
probability: 0.15
|
||||
- alternative: *zahod
|
||||
probability: 0.15
|
||||
|
||||
po_boxes:
|
||||
postni_predal: &postni_predal
|
||||
canonical: poštni predal
|
||||
abbreviated: p.p
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *postni_predal
|
||||
numeric_probability: 0.9 # pp 123
|
||||
alpha_probability: 0.05 # p.p A
|
||||
numeric_plus_alpha_probability: 0.04 # pp 123G
|
||||
alpha_plus_numeric_probability: 0.01 # pp A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
stanovanje: &stanovanje
|
||||
canonical: stanovanje
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
soba: &soba
|
||||
canonical: soba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
urad: &urad
|
||||
canonical: urad
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *stanovanje
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *soba
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. stanovanje 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. stanovanje A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *soba
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *urad
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *soba
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
953
resources/addresses/sr.yaml
Normal file
953
resources/addresses/sr.yaml
Normal file
@@ -0,0 +1,953 @@
|
||||
# sr.yaml
|
||||
# -------
|
||||
# Serbian language specification
|
||||
|
||||
alphabet: абвгдђежзијклљмнњопрстћуфхцчџш
|
||||
alphanumeric_probability: 0.7
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.8
|
||||
alphanumeric_probability: 0.2
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.7
|
||||
alphanumeric_probability: 0.3
|
||||
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
# For unit types like 2/34
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
|
||||
|
||||
|
||||
numbers:
|
||||
default: &broj
|
||||
canonical: број
|
||||
abbreviated: бр
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "бр."
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
alternatives:
|
||||
- alternative: &broj_latin
|
||||
canonical: broj
|
||||
abbreviated: br
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "br."
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
and:
|
||||
default: &i
|
||||
canonical: и
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: &i_latin
|
||||
canonical: i
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
|
||||
|
||||
cross_streets:
|
||||
i: *i
|
||||
i_latin: *i_latin
|
||||
at: &na
|
||||
canonical: на
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_latin: &na_latin
|
||||
canonical: na
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
corner: &ugao
|
||||
canonical: угао
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
ugao_latin: &ugao_latin
|
||||
canonical: ugao
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_uglu: &na_uglu
|
||||
canonical: на углу
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
na_uglu_latin: &na_uglu_latin
|
||||
canonical: na uglu
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *i
|
||||
probability: 0.65
|
||||
alternatives:
|
||||
- alternative: *i_latin
|
||||
probability: 0.05
|
||||
- alternative: *na
|
||||
probability: 0.075
|
||||
- alternative: *na_latin
|
||||
probability: 0.025
|
||||
- alternative: *ugao
|
||||
probability: 0.1
|
||||
- alternative: *ugao_latin
|
||||
probability: 0.05
|
||||
- alternative: *na_uglu
|
||||
probability: 0.025
|
||||
- alternative: *na_uglu_latin
|
||||
probability: 0.025
|
||||
izmedu: &izmedu
|
||||
canonical: између
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
izmedu_latin: &izmedu_latin
|
||||
canonical: između
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
between:
|
||||
default: *izmedu
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *izmedu_latin
|
||||
probability: 0.1
|
||||
|
||||
levels:
|
||||
sprat: &sprat
|
||||
canonical: спрат
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
sprat_latin: &sprat_latin
|
||||
canonical: sprat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
kat: &kat
|
||||
canonical: кат
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
kat_latin: &kat_latin
|
||||
canonical: kat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etaza: &etaza
|
||||
canonical: етажа
|
||||
abbreviated: ет
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
etaza_latin: &etaza_latin
|
||||
canonical: etaža
|
||||
abbreviated: et
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
prizemlje: &prizemlje
|
||||
canonical: приземље
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
prizemlje_latin: &prizemlje_latin
|
||||
canonical: prizemlje
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
parter: &parter
|
||||
canonical: партер
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
parter_latin: &parter_latin
|
||||
canonical: parter
|
||||
sample: true
|
||||
canonical_probability: 0.9
|
||||
sample_probability: 0.1
|
||||
|
||||
podrum: &podrum
|
||||
canonical: подрум
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. подрум 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. подрум
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
podrum_latin: &podrum_latin
|
||||
canonical: podrum
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. подрум 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. подрум
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *podrum
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *podrum_latin
|
||||
probability: 0.2
|
||||
"-1":
|
||||
default: *podrum
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *podrum_latin
|
||||
probability: 0.2
|
||||
"0":
|
||||
default: *prizemlje
|
||||
probability: 0.45
|
||||
alternatives:
|
||||
- alternative: *prizemlje_latin
|
||||
probability: 0.05
|
||||
- alternative: *parter
|
||||
probability: 0.35
|
||||
- alternative: *parter_latin
|
||||
probability: 0.05
|
||||
- alternative: *sprat
|
||||
probability: 0.04
|
||||
- alternative: *sprat_latin
|
||||
probability: 0.01
|
||||
- alternative: *kat
|
||||
probability: 0.04
|
||||
- alternative: *kat_latin
|
||||
probability: 0.01
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *sprat
|
||||
probability: 0.65
|
||||
alternatives:
|
||||
- alternative: *sprat_latin
|
||||
probability: 0.1
|
||||
- alternative: *kat
|
||||
probability: 0.15
|
||||
- alternative: *kat_latin
|
||||
probability: 0.05
|
||||
- alternative: *etaza
|
||||
probability: 0.04
|
||||
- alternative: *etaza_latin
|
||||
probability: 0.01
|
||||
numeric_probability: 0.69 # With this probability, pick an integer
|
||||
roman_numeral_probability: 0.3 # Pick a Roman numeral for the actual value
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
directions:
|
||||
right: &desno
|
||||
canonical: десно
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
desno_latin: &desno_latin
|
||||
canonical: desno
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &levo
|
||||
canonical: лево
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
levo_latin: &levo_latin
|
||||
canonical: levo
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.45
|
||||
- alternative: *desno_latin
|
||||
probability: 0.05
|
||||
- alternative: *levo
|
||||
probability: 0.45
|
||||
- alternative: *levo_latin
|
||||
probability: 0.05
|
||||
|
||||
cardinal_directions:
|
||||
east: &istok
|
||||
canonical: исток
|
||||
abbreviated: и
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: и
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
istok_latin: &istok_latin
|
||||
canonical: istok
|
||||
abbreviated: i
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: i
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &zapad
|
||||
canonical: запад
|
||||
abbreviated: з
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: з
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
zapad_latin: &zapad_latin
|
||||
canonical: zapad
|
||||
abbreviated: z
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: z
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &sever
|
||||
canonical: север
|
||||
abbreviated: с
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: с
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
sever_latin: &sever_latin
|
||||
canonical: sever
|
||||
abbreviated: s
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &jug
|
||||
canonical: југ
|
||||
abbreviated: ј
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: ј
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
jug_latin: &jug_latin
|
||||
canonical: jug
|
||||
abbreviated: j
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: j
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *sever
|
||||
probability: 0.23
|
||||
- alternative: *sever_latin
|
||||
probability: 0.02
|
||||
- alternative: *istok
|
||||
probability: 0.23
|
||||
- alternative: *istok_latin
|
||||
probability: 0.02
|
||||
- alternative: *jug
|
||||
probability: 0.23
|
||||
- alternative: *jug_latin
|
||||
probability: 0.02
|
||||
- alternative: *zapad
|
||||
probability: 0.23
|
||||
- alternative: *zapad_latin
|
||||
probability: 0.02
|
||||
|
||||
entrances:
|
||||
ulaz: &ulaz
|
||||
canonical: улаз
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
ulaz_latin: &ulaz_latin
|
||||
canonical: ulaz
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Ulaz 1, Ulaz A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ulaz
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *ulaz_latin
|
||||
probability: 0.2
|
||||
numeric_probability: 0.1 # e.g. Ulaz 1
|
||||
alpha_probability: 0.85 # e.g. Ulaz A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
|
||||
staircases:
|
||||
stepeniste: &stepeniste
|
||||
canonical: степениште
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
stepeniste_latin: &stepeniste_latin
|
||||
canonical: stepenište
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *stepeniste
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *stepeniste_latin
|
||||
probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *desno
|
||||
probability: 0.19
|
||||
- alternative: *desno_latin
|
||||
probability: 0.01
|
||||
- alternative: *levo
|
||||
probability: 0.19
|
||||
- alternative: *levo_latin
|
||||
probability: 0.01
|
||||
- alternative: *sever
|
||||
probability: 0.14
|
||||
- alternative: *sever_latin
|
||||
probability: 0.01
|
||||
- alternative: *jug
|
||||
probability: 0.14
|
||||
- alternative: *jug_latin
|
||||
probability: 0.01
|
||||
- alternative: *istok
|
||||
probability: 0.14
|
||||
- alternative: *istok_latin
|
||||
probability: 0.01
|
||||
- alternative: *zapad
|
||||
probability: 0.14
|
||||
- alternative: *zapad_latin
|
||||
probability: 0.01
|
||||
|
||||
po_boxes:
|
||||
postanski_fah: &postanski_fah
|
||||
canonical: поштански фах
|
||||
abbreviated: пф
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # poštanski fah br. 1234
|
||||
postanski_fah_latin: &postanski_fah_latin
|
||||
canonical: poštanski fah
|
||||
abbreviated: pf
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # poštanski fah br. 1234
|
||||
postanski_pretinac: &postanski_pretinac
|
||||
canonical: поштански претинац
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.5
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
postanski_pretinac_latin: &postanski_pretinac_latin
|
||||
canonical: poštanski pretinac
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
postanski_pregradak: &postanski_pregradak
|
||||
canonical: поштански преградак
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.5
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
postanski_pregradak_latin: &postanski_pregradak_latin
|
||||
canonical: poštanski pregradak
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *postanski_fah
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *postanski_fah_latin
|
||||
probability: 0.05
|
||||
- alternative: *postanski_pretinac
|
||||
probability: 0.1
|
||||
- alternative: *postanski_pretinac_latin
|
||||
probability: 0.05
|
||||
- alternative: *postanski_pregradak
|
||||
probability: 0.075
|
||||
- alternative: *postanski_pregradak_latin
|
||||
probability: 0.025
|
||||
numeric_probability: 0.9 # pf 123
|
||||
alpha_probability: 0.05 # pf A
|
||||
numeric_plus_alpha_probability: 0.04 # pf 123G
|
||||
alpha_plus_numeric_probability: 0.01 # pf A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
units:
|
||||
stan: &stan
|
||||
canonical: стан
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
stan_latin: &stan_latin
|
||||
canonical: stan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
apartman: &apartman
|
||||
canonical: апартман
|
||||
abbreviated: ап
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
apartman_latin: &apartman_latin
|
||||
canonical: apartman
|
||||
abbreviated: ap
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
soba: &soba
|
||||
canonical: соба
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
soba_latin: &soba_latin
|
||||
canonical: soba
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
kancelarija: &kancelarija
|
||||
canonical: канцеларија
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
kancelarija_latin: &kancelarija_latin
|
||||
canonical: kancelarija
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *stan
|
||||
probability: 0.5
|
||||
alternatives:
|
||||
- alternative: *stan_latin
|
||||
probability: 0.1
|
||||
- alternative: *apartman
|
||||
probability: 0.2
|
||||
- alternative: *apartman_latin
|
||||
probability: 0.05
|
||||
- alternative: *soba
|
||||
probability: 0.1
|
||||
- alternative: *soba_latin
|
||||
probability: 0.05
|
||||
numeric_probability: 0.9 # e.g. stan. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. stan A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.01
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *soba
|
||||
probability: 0.55
|
||||
alternatives:
|
||||
- alternative: *soba_latin
|
||||
probability: 0.05
|
||||
- alternative: *kancelarija
|
||||
probability: 0.35
|
||||
- alternative: *kancelarija_latin
|
||||
probability: 0.05
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *soba
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *soba_latin
|
||||
probability: 0.1
|
||||
numeric_probability: 0.95 # e.g. soba 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. soba 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. soba A1
|
||||
alpha_probability: 0.03 # e.g. soba A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
795
resources/addresses/sv.yaml
Normal file
795
resources/addresses/sv.yaml
Normal file
@@ -0,0 +1,795 @@
|
||||
# sv.yaml
|
||||
# -------
|
||||
# Swedish language specification.
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85
|
||||
alphanumeric_probability: 0.1
|
||||
standalone_probability: 0.05
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
numbers:
|
||||
default: &nummer
|
||||
canonical: nummer
|
||||
abbreviated: nr
|
||||
sample: true
|
||||
# Probabilities
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
sample_exclude:
|
||||
- "#"
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "#"
|
||||
direction: left
|
||||
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *nummer
|
||||
|
||||
alphanumeric_phrase_probability: 0.0001
|
||||
|
||||
|
||||
and:
|
||||
default: &och
|
||||
canonical: och
|
||||
abbreviated: "&"
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.75
|
||||
sample: true
|
||||
sample_probability: 0.05
|
||||
|
||||
cross_streets:
|
||||
and: *och
|
||||
corner_of: &hornet_av
|
||||
canonical: hörnet av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
at_the_corner_of: &i_hornet_av
|
||||
canonical: i hörnet av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *och
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hornet_av
|
||||
probability: 0.15
|
||||
- alternative: *i_hornet_av
|
||||
probability: 0.15
|
||||
|
||||
between:
|
||||
canonical: mellan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
|
||||
|
||||
levels:
|
||||
vaningen: &vaningen
|
||||
canonical: våningen
|
||||
abbreviated: vån
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
ordinal_probability: 1.0
|
||||
vaning: &vaning
|
||||
canonical: våning
|
||||
abbreviated: vån
|
||||
sample: true
|
||||
canonical_probability: 0.5
|
||||
abbreviated_probability: 0.3
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
ordinal:
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
numeric_probability: 0.8
|
||||
ordinal_probability: 0.2
|
||||
plan: &plan
|
||||
canonical: plan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
entreplan: &entreplan
|
||||
canonical: entréplan
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
trappa_upp: &trappa_upp
|
||||
canonical: trappa upp
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
number_min_abs_value: 2
|
||||
number_max_abs_value: 2
|
||||
number_subtract_abs_value: 1
|
||||
numeric_probability: 0.8
|
||||
ordinal_probability: 0.2
|
||||
trappor_upp: &trappor_upp
|
||||
canonical: trappor upp
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
number_min_abs_value: 3
|
||||
number_subtract_abs_value: 1
|
||||
numeric_probability: 0.8
|
||||
ordinal_probability: 0.2
|
||||
trappa: &trappa
|
||||
canonical: trappa
|
||||
abbreviated: tr
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
number_min_abs_value: 2
|
||||
number_max_abs_value: 2
|
||||
number_subtract_abs_value: 1
|
||||
numeric_probability: 0.8
|
||||
ordinal_probability: 0.2
|
||||
trappor: &trappor
|
||||
canonical: trappor
|
||||
abbreviated: tr
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.8
|
||||
spellout_probability: 0.2
|
||||
number_min_abs_value: 3
|
||||
number_subtract_abs_value: 1
|
||||
numeric_probability: 0.8
|
||||
ordinal_probability: 0.2
|
||||
bottenvaning: &bottenvaning
|
||||
canonical: bottenvåning
|
||||
abbreviated: bv
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
vindsvaningen: &vindsvaningen
|
||||
canonical: vindsvåningen
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
standalone_probability: 1.0
|
||||
vinds: &vinds
|
||||
canonical: vinds
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
standalone_probability: 1.0
|
||||
kallare: &kallare
|
||||
canonical: källare
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
# e.g. 1 källare
|
||||
numeric:
|
||||
direction: right
|
||||
direction_probability: 0.8
|
||||
# e.g. k1
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: left
|
||||
# e.g. 1:a k
|
||||
ordinal:
|
||||
direction: right
|
||||
standalone_probability: 0.9
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
numeric_affix_probability: 0.09
|
||||
ordinal_probability: 0.005
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *kallare
|
||||
probability: 0.95
|
||||
alternatives:
|
||||
- alternative: *vaning
|
||||
probability: 0.025
|
||||
- alternative: *vaningen
|
||||
probability: 0.025
|
||||
"-1":
|
||||
default: *kallare
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *vaning
|
||||
probability: 0.05
|
||||
- alternative: *vaningen
|
||||
probability: 0.05
|
||||
"0":
|
||||
default: *bottenvaning
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *entreplan
|
||||
probability: 0.2
|
||||
- alternative: *vaningen
|
||||
probability: 0.1
|
||||
- alternative: *vaning
|
||||
probability: 0.1
|
||||
"1":
|
||||
default: *bottenvaning
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *entreplan
|
||||
probability: 0.2
|
||||
- alternative: *vaningen
|
||||
probability: 0.1
|
||||
- alternative: *vaning
|
||||
probability: 0.1
|
||||
"top":
|
||||
default: *vaningen
|
||||
probability: 0.35
|
||||
alternatives:
|
||||
- alternative: *vaning
|
||||
probability: 0.35
|
||||
- alternative: *trappor_upp
|
||||
probability: 0.1
|
||||
- alternative: *trappor
|
||||
probability: 0.1
|
||||
- alternative: *vindsvaningen
|
||||
probability: 0.05
|
||||
- alternative: *vinds
|
||||
probability: 0.05
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *vaningen
|
||||
probability: 0.25
|
||||
alternatives:
|
||||
- alternative: *vaning
|
||||
probability: 0.2
|
||||
- alternative: *plan
|
||||
probability: 0.05
|
||||
- alternative: *trappa_upp
|
||||
probability: 0.125
|
||||
- alternative: *trappa
|
||||
probability: 0.125
|
||||
- alternative: *trappor_upp
|
||||
probability: 0.125
|
||||
- alternative: *trappor
|
||||
probability: 0.125
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
categories:
|
||||
near:
|
||||
default:
|
||||
canonical: i närheten av
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: nära
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
nearby:
|
||||
default:
|
||||
canonical: i närheten
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.4
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: runt här
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
- alternative:
|
||||
canonical: nära här
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nära här
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: nära
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
- alternative:
|
||||
canonical: omkring här
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.1
|
||||
near_me:
|
||||
default:
|
||||
canonical: nära mig
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: i närheten av mig
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
probability: 0.2
|
||||
|
||||
in:
|
||||
default:
|
||||
canonical: i
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative:
|
||||
canonical: på
|
||||
probability: 0.2
|
||||
|
||||
|
||||
# Probabilities of each phrase
|
||||
near_probability: 0.35
|
||||
nearby_probability: 0.2
|
||||
near_me_probability: 0.1
|
||||
in_probability: 0.35
|
||||
|
||||
|
||||
directions:
|
||||
right: &hoger
|
||||
canonical: höger
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
sample_probability: 0.9
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: h
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
left: &vanster
|
||||
canonical: vänster
|
||||
sample: true
|
||||
canonical_probability: 0.1
|
||||
sample_probability: 0.9
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
whitespace_probability: 0.1
|
||||
numeric_probability: 0.8
|
||||
numeric_affix_probability: 0.2
|
||||
alternatives:
|
||||
- alternative: *hoger
|
||||
probability: 0.5
|
||||
- alternative: *vanster
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &ost
|
||||
canonical: öst
|
||||
abbreviated: ö
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: ö
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
eastern: &ostra
|
||||
canonical: östra
|
||||
abbreviated: ö:a
|
||||
canonical_probability: 0.9
|
||||
abbreviated_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
west: &vast
|
||||
canonical: väst
|
||||
abbreviated: v
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: v
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
western: &vastra
|
||||
canonical: västra
|
||||
abbreviated: v:a
|
||||
canonical_probability: 0.9
|
||||
abbreviated_probability: 0.1
|
||||
numeric:
|
||||
direction: right
|
||||
|
||||
north: &norr
|
||||
canonical: norr
|
||||
abbreviated: n
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: n
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
northern: &norra
|
||||
canonical: norra
|
||||
abbreviated: n:a
|
||||
canonical_probability: 0.9
|
||||
abbreviated_probability: 0.1
|
||||
|
||||
south: &sod
|
||||
canonical: söd
|
||||
abbreviated: s
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: s
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
southern: &sodra
|
||||
canonical: södra
|
||||
abbreviated: s:a
|
||||
canonical_probability: 0.9
|
||||
abbreviated_probability: 0.1
|
||||
|
||||
alternatives:
|
||||
- alternative: *norr
|
||||
probability: 0.25
|
||||
- alternative: *ost
|
||||
probability: 0.25
|
||||
- alternative: *sod
|
||||
probability: 0.25
|
||||
- alternative: *vast
|
||||
probability: 0.25
|
||||
|
||||
entrances:
|
||||
ingang: &ingang
|
||||
canonical: ingång
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
entre: &entre
|
||||
canonical: entré
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# Eingang 1, Eingang A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *ingang
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *entre
|
||||
probability: 0.4
|
||||
numeric_probability: 0.1 # e.g. Eingang 1
|
||||
alpha_probability: 0.85 # e.g. Eingang A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
staircases:
|
||||
uppgang: &uppgang
|
||||
canonical: uppgång
|
||||
abbreviated: u
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
uppgang_hoger: &uppgang_hoger
|
||||
canonical: uppgång höger
|
||||
abbreviated: uh
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
uppgang_vanster: &uppgang_vanster
|
||||
canonical: uppgång vänster
|
||||
abbreviated: uv
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *uppgang
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *uppgang_hoger
|
||||
probability: 0.2
|
||||
- alternative: *uppgang_vanster
|
||||
probability: 0.2
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: left
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *norr
|
||||
- alternative: *sod
|
||||
- alternative: *ost
|
||||
- alternative: *vast
|
||||
|
||||
po_boxes:
|
||||
box: &box
|
||||
canonical: box
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Box No 1234
|
||||
postlada: &postlada
|
||||
canonical: postlåda
|
||||
abbreviated: pl
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2 # Pl No 1234
|
||||
|
||||
alphanumeric:
|
||||
sample: false
|
||||
default: *box
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *postlada
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # Box 123
|
||||
alpha_probability: 0.05 # Box A
|
||||
numeric_plus_alpha_probability: 0.04 # Box 123G
|
||||
alpha_plus_numeric_probability: 0.01 # Box A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.1
|
||||
- length: 5
|
||||
probability: 0.5
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
|
||||
units:
|
||||
lagenhet: &lagenhet
|
||||
canonical: lägenhet
|
||||
abbreviated: lgh
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
null_phrase_probability: 0.1
|
||||
# Lejlighed nummer 4
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
bostad: &bostad
|
||||
canonical: bostad
|
||||
abbreviated: bst
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.5
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.05
|
||||
lagenhetsnummer: &lagenhetsnummer
|
||||
canonical: lägenhetsnummer
|
||||
abbreviated: lgh nr
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
hus: &hus
|
||||
canonical: hus
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
rum: &rum
|
||||
canonical: rum
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
numeric:
|
||||
direction: left
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *lagenhet
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *lagenhetsnummer
|
||||
probability: 0.05
|
||||
- alternative: *hus
|
||||
probability: 0.1
|
||||
- alternative: *rum
|
||||
probability: 0.1
|
||||
numeric_probability: 0.95 # e.g. Lägenhet 1
|
||||
alpha_probability: 0.05 # e.g. Lgh A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# Separate random probability for adding directions like 2H, 2V, etc.
|
||||
add_direction: true
|
||||
add_direction_probability: 0.005
|
||||
|
||||
# Add directions for plain numbers
|
||||
add_direction_numeric: true
|
||||
# Add direction only e.g. Lejlighed Igjen
|
||||
add_direction_standalone: true
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.2
|
||||
|
||||
# Use the actual floor phrase as long as the whole phrase is numeric
|
||||
# Has the effect of creating Bolignummer-style units
|
||||
use_floor_affix_unit_num_digits: 2
|
||||
|
||||
# In Swedish addresses, the ground level is 10, floors are 11, 12, ... basements are 9, 8, ...
|
||||
use_floor_ground_starts_at: 10
|
||||
# For single digit floors, use 09, 08, etc.
|
||||
use_floor_floor_num_digits: 2
|
||||
|
||||
|
||||
countries:
|
||||
# Swedish addresses in Finland
|
||||
fi:
|
||||
units:
|
||||
alphanumeric:
|
||||
default: *bostad
|
||||
probability: 1.0
|
||||
alternatives: []
|
||||
|
||||
add_direction: false
|
||||
add_direction_numeric: false
|
||||
add_direction_standalone: false
|
||||
|
||||
use_floor_probability: 0.1
|
||||
|
||||
use_floor_affix_unit_num_digits: 0
|
||||
|
||||
use_floor_ground_starts_at: 1
|
||||
use_floor_floor_num_digits: 2
|
||||
503
resources/addresses/tr.yaml
Normal file
503
resources/addresses/tr.yaml
Normal file
@@ -0,0 +1,503 @@
|
||||
# tr.yaml
|
||||
# -------
|
||||
# Turkish language specification
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.9
|
||||
alphanumeric_probability: 0.1
|
||||
|
||||
staircase:
|
||||
null_probability: 0.99
|
||||
alphanumeric_probability: 0.01
|
||||
|
||||
entrance:
|
||||
null_probability: 0.999
|
||||
alphanumeric_probability: 0.001
|
||||
|
||||
unit:
|
||||
null_probability: 0.7
|
||||
alphanumeric_probability: 0.3
|
||||
|
||||
combinations:
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- staircase
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- level
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.1
|
||||
# For unit types like 2/34
|
||||
-
|
||||
components:
|
||||
- house_number
|
||||
- unit
|
||||
label: house_number
|
||||
separators:
|
||||
- separator: "/"
|
||||
probability: 0.95
|
||||
- separator: "-"
|
||||
probability: 0.05
|
||||
probability: 0.005
|
||||
|
||||
|
||||
numbers:
|
||||
|
||||
default: &numara
|
||||
canonical: numara
|
||||
abbreviated: "no:"
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.6
|
||||
sample_probability: 0.1
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_affix:
|
||||
affix: "no:"
|
||||
whitespace_probability: 0.6
|
||||
direction: left
|
||||
numeric_probability: 0.4
|
||||
numeric_affix_probability: 0.6
|
||||
|
||||
alphanumeric_phrase_probability: 0.05
|
||||
no_number_probability: 0.05
|
||||
|
||||
|
||||
and:
|
||||
default: &ve
|
||||
canonical: ve
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
|
||||
|
||||
cross_streets:
|
||||
ve: *ve
|
||||
corner_of: &kose
|
||||
canonical: köşe
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
kosesinde: &kosesinde
|
||||
canonical: köşesinde
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
intersection:
|
||||
default: *ve
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *kose
|
||||
probability: 0.1
|
||||
- alternative: *kosesinde
|
||||
probability: 0.1
|
||||
|
||||
arasinda: &arasinda
|
||||
canonical: arasında
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
parentheses_probability: 0.5
|
||||
between:
|
||||
default: *arasinda
|
||||
|
||||
levels:
|
||||
kat: &kat
|
||||
canonical: kat
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.9
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
roman_numeral_probability: 0.7
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
numeric_probability: 0.4
|
||||
ordinal_probability: 0.6
|
||||
|
||||
zemin_kat: &zemin_kat
|
||||
canonical: zemin kat
|
||||
abbreviated: zk
|
||||
sample: true
|
||||
canonical_probability: 0.3
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.3
|
||||
asma_kat: &asma_kat
|
||||
canonical: asma kat
|
||||
half_floors: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
sample: true
|
||||
# e.g. asma kat 2
|
||||
numeric:
|
||||
direction: left
|
||||
# e.g. 2. asma kat
|
||||
ordinal:
|
||||
direction: right
|
||||
numeric_probability: 0.1
|
||||
ordinal_probability: 0.2
|
||||
standalone_probability: 0.6
|
||||
bodrum: &bodrum
|
||||
canonical: bodrum
|
||||
sample: true
|
||||
canonical_probability: 0.7
|
||||
sample_probability: 0.3
|
||||
# e.g. bodrum 1
|
||||
numeric:
|
||||
direction: left
|
||||
direction_probability: 0.8
|
||||
# e.g. 1. bodrum
|
||||
ordinal:
|
||||
direction: right
|
||||
digits:
|
||||
ascii_probability: 0.7
|
||||
roman_numeral_probability: 0.3
|
||||
standalone_probability: 0.99
|
||||
number_abs_value: true
|
||||
number_min_abs_value: 1
|
||||
numeric_probability: 0.005
|
||||
ordinal_probability: 0.005
|
||||
|
||||
aliases:
|
||||
"<-1":
|
||||
default: *bodrum
|
||||
"-1":
|
||||
default: *bodrum
|
||||
# Special token for half-floors
|
||||
half_floors:
|
||||
default: *asma_kat
|
||||
"0":
|
||||
default: *zemin_kat
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *kat
|
||||
probability: 0.1
|
||||
|
||||
numbering_starts_at: 0
|
||||
|
||||
alphanumeric:
|
||||
default: *kat
|
||||
numeric_probability: 0.99 # With this probability, pick an integer
|
||||
alpha_probability: 0.0098 # With this probability, pick a letter e.g. A
|
||||
numeric_plus_alpha_probability: 0.0001 # e.g. 2A
|
||||
alpha_plus_numeric_probability: 0.0001 # e.g. A2
|
||||
|
||||
|
||||
directions:
|
||||
right: &sag
|
||||
canonical: sağ
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
left: &sol
|
||||
canonical: sol
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: right
|
||||
alternatives:
|
||||
- alternative: *sag
|
||||
probability: 0.5
|
||||
- alternative: *sol
|
||||
probability: 0.5
|
||||
|
||||
cardinal_directions:
|
||||
east: &dogu
|
||||
canonical: doğu
|
||||
abbreviated: d
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: d
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
west: &bati
|
||||
canonical: batı
|
||||
abbreviated: b
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: b
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
north: &kuzey
|
||||
canonical: kuzey
|
||||
abbreviated: k
|
||||
canonical_probability: 0.95
|
||||
abbreviated_probability: 0.05
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: k
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
south: &guney
|
||||
canonical: güney
|
||||
abbreviated: g
|
||||
sample: true
|
||||
canonical_probability: 0.75
|
||||
abbreviated_probability: 0.1
|
||||
sample_probability: 0.15
|
||||
numeric:
|
||||
direction: right
|
||||
numeric_affix:
|
||||
affix: g
|
||||
direction: right
|
||||
numeric_probability: 0.5
|
||||
numeric_affix_probability: 0.5
|
||||
|
||||
alternatives:
|
||||
- alternative: *kuzey
|
||||
probability: 0.25
|
||||
- alternative: *dogu
|
||||
probability: 0.23
|
||||
- alternative: *guney
|
||||
probability: 0.23
|
||||
- alternative: *bati
|
||||
probability: 0.23
|
||||
|
||||
entrances:
|
||||
giris: &giris
|
||||
canonical: giriş
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
# giriş 1, giriş A, etc.
|
||||
alphanumeric: &entrance_alphanumeric
|
||||
default: *giris
|
||||
numeric_probability: 0.1 # e.g. giriş 1
|
||||
alpha_probability: 0.85 # e.g. giriş A
|
||||
numeric_plus_alpha_probability: 0.025 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.025 # e.g. A1
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
|
||||
staircases:
|
||||
merdiven: &merdiven
|
||||
canonical: merdiven
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
|
||||
|
||||
alphanumeric: &staircase_alphanumeric
|
||||
default: *merdiven
|
||||
numeric_probability: 0.75
|
||||
alpha_probability: 0.2
|
||||
numeric_plus_alpha_probability: 0.025
|
||||
alpha_plus_numeric_probability: 0.025
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
directional:
|
||||
direction: right
|
||||
direction_probability: 0.85
|
||||
modifier:
|
||||
alternatives:
|
||||
- alternative: *sag
|
||||
probability: 0.2
|
||||
- alternative: *sol
|
||||
probability: 0.2
|
||||
- alternative: *kuzey
|
||||
probability: 0.15
|
||||
- alternative: *guney
|
||||
probability: 0.15
|
||||
- alternative: *dogu
|
||||
probability: 0.15
|
||||
- alternative: *bati
|
||||
probability: 0.15
|
||||
|
||||
po_boxes:
|
||||
posta_kutusu: &posta_kutusu
|
||||
canonical: posta kutusu
|
||||
abbreviated: pk
|
||||
sample: true
|
||||
canonical_probability: 0.2
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.2
|
||||
|
||||
alphanumeric:
|
||||
default: *posta_kutusu
|
||||
numeric_probability: 0.9 # pp 123
|
||||
alpha_probability: 0.05 # p.p A
|
||||
numeric_plus_alpha_probability: 0.04 # pp 123G
|
||||
alpha_plus_numeric_probability: 0.01 # pp A123
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
|
||||
units:
|
||||
daire: &daire
|
||||
canonical: daire
|
||||
abbreviated: d
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.4
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
apartman: &apartman
|
||||
canonical: apartman
|
||||
abbreviated: apt
|
||||
sample: true
|
||||
canonical_probability: 0.4
|
||||
abbreviated_probability: 0.2
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
oda: &oda
|
||||
canonical: oda
|
||||
sample: true
|
||||
canonical_probability: 0.8
|
||||
sample_probability: 0.2
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
ofis: &ofis
|
||||
canonical: ofis
|
||||
sample: true
|
||||
canonical_probability: 0.6
|
||||
sample_probability: 0.4
|
||||
numeric:
|
||||
direction: left
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.1
|
||||
|
||||
alphanumeric: &unit_alphanumeric
|
||||
default: *daire
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *apartman
|
||||
probability: 0.3
|
||||
- alternative: *oda
|
||||
probability: 0.1
|
||||
numeric_probability: 0.9 # e.g. d. 1
|
||||
numeric_plus_alpha_probability: 0.03 # e.g. 1A
|
||||
alpha_plus_numeric_probability: 0.03 # e.g. A1
|
||||
alpha_probability: 0.04 # e.g. daire A
|
||||
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
|
||||
# If there are 10 floors, create unit numbers like #301 or #1032
|
||||
use_floor_probability: 0.05
|
||||
|
||||
zones:
|
||||
commercial: &commercial_unit_types
|
||||
default: *oda
|
||||
probability: 0.6
|
||||
alternatives:
|
||||
- alternative: *ofis
|
||||
probability: 0.4
|
||||
numeric_probability: 0.95 # e.g. oda 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. oda 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. oda A1
|
||||
alpha_probability: 0.03 # e.g. oda A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
university:
|
||||
default: *oda
|
||||
numeric_probability: 0.95 # e.g. oda 1
|
||||
numeric_plus_alpha_probability: 0.01 # e.g. oda 1A
|
||||
alpha_plus_numeric_probability: 0.01 # e.g. oda A1
|
||||
alpha_probability: 0.03 # e.g. oda A
|
||||
alpha_plus_numeric:
|
||||
whitespace_probability: 0.1
|
||||
numeric_plus_alpha:
|
||||
whitespace_probability: 0.1
|
||||
1001
resources/addresses/uk.yaml
Normal file
1001
resources/addresses/uk.yaml
Normal file
File diff suppressed because it is too large
Load Diff
292
resources/addresses/zh.yaml
Normal file
292
resources/addresses/zh.yaml
Normal file
@@ -0,0 +1,292 @@
|
||||
# zh.yaml
|
||||
# -------
|
||||
# Chinese language specification (default is mainland China, Hong Kong below)
|
||||
|
||||
whitespace: false
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.15
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
numbers:
|
||||
default: &hao
|
||||
canonical: 号
|
||||
numeric_affix:
|
||||
affix: 号
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: &hao_traditional
|
||||
canonical: 號
|
||||
numeric_affix:
|
||||
affix: 號
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
probability: 0.2
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *hao
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *hao_traditional
|
||||
probability: 0.2
|
||||
alphanumeric_phrase_probability: 0.6
|
||||
|
||||
levels:
|
||||
lou: &lou
|
||||
canonical: 楼
|
||||
numeric_affix:
|
||||
affix: 楼
|
||||
direction: right
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.5
|
||||
digits:
|
||||
ascii_probability: 0.6
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.3
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
lou_traditional: &lou_traditional
|
||||
canonical: 樓
|
||||
numeric_affix:
|
||||
affix: 樓
|
||||
direction: right
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.5
|
||||
digits:
|
||||
ascii_probability: 0.6
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.3
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
ceng: &ceng
|
||||
canonical: 层
|
||||
numeric_affix:
|
||||
affix: 层
|
||||
direction: right
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.5
|
||||
digits:
|
||||
ascii_probability: 0.6
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.3
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
ceng_traditional: &ceng_traditional
|
||||
canonical: 層
|
||||
numeric_affix:
|
||||
affix: 層
|
||||
direction: right
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.5
|
||||
digits:
|
||||
ascii_probability: 0.6
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.3
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *lou
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *lou_traditional
|
||||
probability: 0.05
|
||||
- alternative: *ceng
|
||||
probability: 0.08
|
||||
- alternative: *ceng_traditional
|
||||
probability: 0.02
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
youzheng_xinxiang: &youzheng_xinxiang
|
||||
canonical: 邮政信箱
|
||||
numeric_affix:
|
||||
affix: 邮政信箱
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
use_number_phrase: true
|
||||
use_number_phrase_probability: 0.8
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
youzheng_xinxiang_traditional: &youzheng_xinxiang_traditional
|
||||
canonical: 郵政信箱
|
||||
numeric_affix:
|
||||
affix: 郵政信箱
|
||||
direction: left
|
||||
digits:
|
||||
ascii_probability: 0.3
|
||||
unicode_full_width_probability: 0.5
|
||||
spellout_probability: 0.2
|
||||
use_number_phrase: true
|
||||
use_number_phrase_probability: 0.8
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
|
||||
alphanumeric:
|
||||
default: *youzheng_xinxiang
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: *youzheng_xinxiang_traditional
|
||||
probability: 0.1
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default: &youbian
|
||||
canonical: 邮编
|
||||
numeric_affix:
|
||||
affix: 邮编
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.9
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 0.1
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: &youbian_traditional
|
||||
canonical: 郵編
|
||||
numeric_affix:
|
||||
affix: 郵編
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.9
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 0.1
|
||||
probability: 0.1
|
||||
|
||||
units:
|
||||
shi: &shi
|
||||
canonical: 室
|
||||
numeric_affix:
|
||||
affix: 室
|
||||
direction: right
|
||||
add_number_phrase: true
|
||||
add_number_phrase_probability: 0.5
|
||||
digits:
|
||||
ascii_probability: 0.6
|
||||
unicode_full_width_probability: 0.1
|
||||
spellout_probability: 0.3
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *shi
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
|
||||
|
||||
countries:
|
||||
# Hong Kong
|
||||
hk:
|
||||
components:
|
||||
# Floor number a little more common in Hong Kong than mainland China
|
||||
level:
|
||||
null_probability: 0.75
|
||||
alphanumeric_probability: 0.25
|
||||
|
||||
numbers: &numbers_prefer_traditional
|
||||
default: *hao_traditional
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hao
|
||||
probability: 0.3
|
||||
|
||||
house_numbers: &house_number_prefer_traditional
|
||||
alphanumeric:
|
||||
default: *hao_traditional
|
||||
probability: 0.7
|
||||
alternatives:
|
||||
- alternative: *hao
|
||||
probability: 0.3
|
||||
alphanumeric_phrase_probability: 0.6
|
||||
|
||||
levels: &levels_prefer_traditional
|
||||
alphanumeric:
|
||||
default: *lou_traditional
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *lou
|
||||
probability: 0.15
|
||||
- alternative: *ceng_traditional
|
||||
probability: 0.06
|
||||
- alternative: *ceng
|
||||
probability: 0.04
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes: &po_boxes_prefer_traditional
|
||||
alphanumeric:
|
||||
default: *youzheng_xinxiang_traditional
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *youzheng_xinxiang
|
||||
probability: 0.25
|
||||
numeric_probability: 1.0
|
||||
|
||||
|
||||
postcodes: &postcodes_prefer_traditional
|
||||
alphanumeric:
|
||||
default: *youbian_traditional
|
||||
probability: 0.75
|
||||
alternatives:
|
||||
- alternative: *youbian
|
||||
probability: 0.25
|
||||
|
||||
# Macau
|
||||
mo:
|
||||
numbers: *numbers_prefer_traditional
|
||||
house_numbers: *house_number_prefer_traditional
|
||||
levels: *levels_prefer_traditional
|
||||
po_boxes: *po_boxes_prefer_traditional
|
||||
postcodes: *postcodes_prefer_traditional
|
||||
|
||||
units:
|
||||
alphanumeric_probability:
|
||||
numeric_probability: 0.9
|
||||
alpha_probability: 0.1
|
||||
|
||||
|
||||
# Taiwan
|
||||
tw:
|
||||
numbers: *numbers_prefer_traditional
|
||||
house_numbers: *house_number_prefer_traditional
|
||||
levels: *levels_prefer_traditional
|
||||
po_boxes: *po_boxes_prefer_traditional
|
||||
postcodes: *postcodes_prefer_traditional
|
||||
|
||||
units:
|
||||
alphanumeric_probability:
|
||||
numeric_probability: 0.9
|
||||
alpha_probability: 0.1
|
||||
153
resources/addresses/zh_pinyin.yaml
Normal file
153
resources/addresses/zh_pinyin.yaml
Normal file
@@ -0,0 +1,153 @@
|
||||
# zh_pinyin.yaml
|
||||
# --------------
|
||||
# Chinese (Pinyin)
|
||||
|
||||
whitespace: false
|
||||
|
||||
components:
|
||||
level:
|
||||
null_probability: 0.85 # Probability of doing nothing if no floor number is specified
|
||||
alphanumeric_probability: 0.15
|
||||
|
||||
unit:
|
||||
# If no unit number is specified
|
||||
null_probability: 0.6
|
||||
alphanumeric_probability: 0.4
|
||||
|
||||
numbers:
|
||||
default: &hao
|
||||
canonical: hao
|
||||
numeric_affix:
|
||||
affix: -hao
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
house_numbers:
|
||||
alphanumeric:
|
||||
default: *hao
|
||||
alphanumeric_phrase_probability: 0.6
|
||||
|
||||
levels:
|
||||
lou: &lou
|
||||
canonical: lóu
|
||||
numeric_affix:
|
||||
affix: -lóu
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
lou_no_accent: &lou_no_accent
|
||||
canonical: lou
|
||||
numeric_affix:
|
||||
affix: -lou
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
ceng: &ceng
|
||||
canonical: céng
|
||||
numeric_affix:
|
||||
affix: -céng
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
ceng_no_accent: &ceng_no_accent
|
||||
canonical: ceng
|
||||
numeric_affix:
|
||||
affix: -ceng
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
numbering_starts_at: 1
|
||||
|
||||
alphanumeric:
|
||||
default: *lou
|
||||
probability: 0.85
|
||||
alternatives:
|
||||
- alternative: *lou_no_accent
|
||||
probability: 0.05
|
||||
- alternative: *ceng
|
||||
probability: 0.08
|
||||
- alternative: *ceng_no_accent
|
||||
probability: 0.02
|
||||
numeric_probability: 1.0
|
||||
|
||||
po_boxes:
|
||||
youzheng_xinxiang: &youzheng_xinxiang
|
||||
canonical: youzheng xinxiang
|
||||
numeric:
|
||||
direction: left
|
||||
numeric_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *youzheng_xinxiang
|
||||
numeric_probability: 1.0
|
||||
|
||||
digits:
|
||||
- length: 1
|
||||
probability: 0.05
|
||||
- length: 2
|
||||
probability: 0.1
|
||||
- length: 3
|
||||
probability: 0.2
|
||||
- length: 4
|
||||
probability: 0.5
|
||||
- length: 5
|
||||
probability: 0.1
|
||||
- length: 6
|
||||
probability: 0.05
|
||||
|
||||
postcodes:
|
||||
alphanumeric:
|
||||
default: &youbian
|
||||
canonical: yóubiān
|
||||
numeric:
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.9
|
||||
numeric_probability: 0.1
|
||||
probability: 0.9
|
||||
alternatives:
|
||||
- alternative: &youbian_no_accent
|
||||
canonical: youbian
|
||||
numeric:
|
||||
direction: left
|
||||
# null_probability means the chance of doing nothing e.g. just the postal code
|
||||
null_probability: 0.9
|
||||
numeric_probability: 0.1
|
||||
probability: 0.1
|
||||
|
||||
units:
|
||||
shi: &shi
|
||||
canonical: shì
|
||||
numeric_affix:
|
||||
affix: -shì
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
shi_no_accent: &shi_no_accent
|
||||
canonical: shi
|
||||
numeric_affix:
|
||||
affix: -shi
|
||||
upper_case: false
|
||||
direction: right
|
||||
numeric_probability: 0.0
|
||||
numeric_affix_probability: 1.0
|
||||
|
||||
alphanumeric:
|
||||
default: *shi
|
||||
probability: 0.8
|
||||
alternatives:
|
||||
- alternative: *shi_no_accent
|
||||
probability: 0.2
|
||||
numeric_probability: 1.0
|
||||
use_positive_numbers_probability: 1.0
|
||||
# If we have a floor number (from building:levels), use it
|
||||
use_floor_probability: 0.8
|
||||
2
resources/boundaries/geonames/ad.yaml
Normal file
2
resources/boundaries/geonames/ad.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
3
resources/boundaries/geonames/ar.yaml
Normal file
3
resources/boundaries/geonames/ar.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/at.yaml
Normal file
3
resources/boundaries/geonames/at.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
3
resources/boundaries/geonames/au.yaml
Normal file
3
resources/boundaries/geonames/au.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
3
resources/boundaries/geonames/ax.yaml
Normal file
3
resources/boundaries/geonames/ax.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/bd.yaml
Normal file
3
resources/boundaries/geonames/bd.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# unclear what admin2 is, maybe city
|
||||
3
resources/boundaries/geonames/be.yaml
Normal file
3
resources/boundaries/geonames/be.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/bg.yaml
Normal file
3
resources/boundaries/geonames/bg.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/br.yaml
Normal file
3
resources/boundaries/geonames/br.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/ca.yaml
Normal file
3
resources/boundaries/geonames/ca.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/ch.yaml
Normal file
3
resources/boundaries/geonames/ch.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
4
resources/boundaries/geonames/cz.yaml
Normal file
4
resources/boundaries/geonames/cz.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
admin_codes:
|
||||
# The GeoNames admin1 boundaries are admin_level=5 or 6 in OSM
|
||||
# However, they do appear to be states, might need to update Czech OSM config
|
||||
admin1: state_district
|
||||
3
resources/boundaries/geonames/de.yaml
Normal file
3
resources/boundaries/geonames/de.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/dk.yaml
Normal file
3
resources/boundaries/geonames/dk.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of city and island, need to list specifically
|
||||
3
resources/boundaries/geonames/do.yaml
Normal file
3
resources/boundaries/geonames/do.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/dz.yaml
Normal file
3
resources/boundaries/geonames/dz.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/es.yaml
Normal file
3
resources/boundaries/geonames/es.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
5
resources/boundaries/geonames/fi.yaml
Normal file
5
resources/boundaries/geonames/fi.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
admin_codes:
|
||||
# The GeoNames admin1 boundaries are admin_level=6 in OSM
|
||||
# However, they do appear to be states, might need to update Finnish OSM config
|
||||
admin1: state_district
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/fo.yaml
Normal file
3
resources/boundaries/geonames/fo.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/fr.yaml
Normal file
3
resources/boundaries/geonames/fr.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/gb.yaml
Normal file
3
resources/boundaries/geonames/gb.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/gt.yaml
Normal file
3
resources/boundaries/geonames/gt.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
2
resources/boundaries/geonames/gu.yaml
Normal file
2
resources/boundaries/geonames/gu.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
3
resources/boundaries/geonames/hr.yaml
Normal file
3
resources/boundaries/geonames/hr.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of city and city_district, need to list specifically
|
||||
4
resources/boundaries/geonames/hu.yaml
Normal file
4
resources/boundaries/geonames/hu.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
admin_codes:
|
||||
# The GeoNames admin1 boundaries are admin_level=6 in OSM
|
||||
# However, they do appear to be states, might need to update Hungary OSM config
|
||||
admin1: state_district
|
||||
3
resources/boundaries/geonames/ie.yaml
Normal file
3
resources/boundaries/geonames/ie.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
2
resources/boundaries/geonames/im.yaml
Normal file
2
resources/boundaries/geonames/im.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
3
resources/boundaries/geonames/in.yaml
Normal file
3
resources/boundaries/geonames/in.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/is.yaml
Normal file
3
resources/boundaries/geonames/is.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/it.yaml
Normal file
3
resources/boundaries/geonames/it.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
2
resources/boundaries/geonames/je.yaml
Normal file
2
resources/boundaries/geonames/je.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
3
resources/boundaries/geonames/jp.yaml
Normal file
3
resources/boundaries/geonames/jp.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
2
resources/boundaries/geonames/li.yaml
Normal file
2
resources/boundaries/geonames/li.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
3
resources/boundaries/geonames/lk.yaml
Normal file
3
resources/boundaries/geonames/lk.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/lt.yaml
Normal file
3
resources/boundaries/geonames/lt.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
4
resources/boundaries/geonames/lu.yaml
Normal file
4
resources/boundaries/geonames/lu.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
admin_codes:
|
||||
# The admin1 names don't appear to exist in OSM, but would be states otherwise
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
2
resources/boundaries/geonames/md.yaml
Normal file
2
resources/boundaries/geonames/md.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: state_district
|
||||
2
resources/boundaries/geonames/mp.yaml
Normal file
2
resources/boundaries/geonames/mp.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: state_district
|
||||
2
resources/boundaries/geonames/mt.yaml
Normal file
2
resources/boundaries/geonames/mt.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
3
resources/boundaries/geonames/mx.yaml
Normal file
3
resources/boundaries/geonames/mx.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/my.yaml
Normal file
3
resources/boundaries/geonames/my.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/nl.yaml
Normal file
3
resources/boundaries/geonames/nl.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/no.yaml
Normal file
3
resources/boundaries/geonames/no.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/nz.yaml
Normal file
3
resources/boundaries/geonames/nz.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/ph.yaml
Normal file
3
resources/boundaries/geonames/ph.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: country_region
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
3
resources/boundaries/geonames/pk.yaml
Normal file
3
resources/boundaries/geonames/pk.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/pl.yaml
Normal file
3
resources/boundaries/geonames/pl.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
6
resources/boundaries/geonames/pr.yaml
Normal file
6
resources/boundaries/geonames/pr.yaml
Normal file
@@ -0,0 +1,6 @@
|
||||
admin_codes:
|
||||
admin1: state_district
|
||||
# The notion of a "barrio" in the official sense in PR is not quite a
|
||||
# municipality, and has no current official purpose, but might be useful
|
||||
# to have the name + "barrio" version available in libpostal
|
||||
admin2: city
|
||||
8
resources/boundaries/geonames/pt.yaml
Normal file
8
resources/boundaries/geonames/pt.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
admin_codes:
|
||||
admin1: state_district
|
||||
admin2: city
|
||||
|
||||
overrides:
|
||||
id:
|
||||
"2593105": "state" # Madeira
|
||||
"3411865": "state" # Azores
|
||||
4
resources/boundaries/geonames/ro.yaml
Normal file
4
resources/boundaries/geonames/ro.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# These are mostly admin_level=6, which maps to city in OSM
|
||||
admin2: city
|
||||
3
resources/boundaries/geonames/ru.yaml
Normal file
3
resources/boundaries/geonames/ru.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
3
resources/boundaries/geonames/se.yaml
Normal file
3
resources/boundaries/geonames/se.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: city
|
||||
22
resources/boundaries/geonames/si.yaml
Normal file
22
resources/boundaries/geonames/si.yaml
Normal file
@@ -0,0 +1,22 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
|
||||
overrides:
|
||||
id:
|
||||
# Districts of Ljubljana (suburbs in OSM)
|
||||
"3196350": "suburb" # Opština Ljubljana-Vič-Rudnik
|
||||
"3196352": "suburb" # Opština [historical] Ljubljana-Šiška
|
||||
"3196355": "suburb" # Opština Ljubljana-Moste-Polje
|
||||
"3196356": "suburb" # Opština Ljubljana-Center
|
||||
"3196357": "suburb" # Opčina Ljubljana-Bežigrad
|
||||
"9794374": "suburb" # Črnuče District
|
||||
"9794375": "suburb" # Dravlje District
|
||||
"9794376": "suburb" # Golovec District
|
||||
"9794377": "suburb" # Jarše District
|
||||
"9794378": "suburb" # Posavje District
|
||||
"9794379": "suburb" # Rožnik District
|
||||
"9794380": "suburb" # Sostro District
|
||||
"9794381": "suburb" # Šentvid District
|
||||
"9794382": "suburb" # Šmarna Gora District
|
||||
"9794384": "suburb" # Trnovo District
|
||||
"9794386": "suburb" # Vič District
|
||||
17
resources/boundaries/geonames/sk.yaml
Normal file
17
resources/boundaries/geonames/sk.yaml
Normal file
@@ -0,0 +1,17 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
# admin2 is a mix of state_district and city, need to list specifically
|
||||
admin2: state_district
|
||||
overrides:
|
||||
id:
|
||||
# Districts of Bratislava
|
||||
"8986283": "city_district" # Okres Bratislava I
|
||||
"8986339": "city_district" # Okres Bratislava II
|
||||
"8986340": "city_district" # Okres Bratislava III
|
||||
"8986341": "city_district" # Okres Bratislava IV
|
||||
"8986342": "city_district" # Okres Bratislava V
|
||||
# Districts of Košice
|
||||
"8986335": "city_district" # Košice I
|
||||
"8986336": "city_district" # Košice II
|
||||
"8986337": "city_district" # Košice III
|
||||
"8986338": "city_district" # Košice IV
|
||||
2
resources/boundaries/geonames/sm.yaml
Normal file
2
resources/boundaries/geonames/sm.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
admin_codes:
|
||||
admin1: city
|
||||
8
resources/boundaries/geonames/th.yaml
Normal file
8
resources/boundaries/geonames/th.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
|
||||
overrides:
|
||||
id:
|
||||
# Bangkok the state is treated as a city
|
||||
# Note: we do this in OSM to get the boundary, so duplicate in GeoNames
|
||||
"1609348": "city"
|
||||
3
resources/boundaries/geonames/tr.yaml
Normal file
3
resources/boundaries/geonames/tr.yaml
Normal file
@@ -0,0 +1,3 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
16
resources/boundaries/geonames/us.yaml
Normal file
16
resources/boundaries/geonames/us.yaml
Normal file
@@ -0,0 +1,16 @@
|
||||
admin_codes:
|
||||
admin1: state
|
||||
admin2: state_district
|
||||
|
||||
overrides:
|
||||
id:
|
||||
# Manhattan (Island)
|
||||
"8479493": "city_district"
|
||||
# Brooklyn
|
||||
"5110300": "city_district"
|
||||
# Bronx
|
||||
"5110266": "city_district"
|
||||
# Queens
|
||||
"5133266": "city_district"
|
||||
# Staten Island
|
||||
"5139568": "city_district"
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user