[docs] Adding descriptions of remaning dictionary types

This commit is contained in:
Al
2016-03-28 17:40:43 -04:00
parent 46e16e1946
commit 8791efadf3

View File

@@ -35,6 +35,7 @@ suffix can be either concatenated to the main token or separated
- **directionals.txt**: strings indicating directions (cardinal and
lower/central/upper, etc.)
- **entrance.txt**: string indicating an entrance, usually just the word "entrance" and its appropriate abbreviations.
- **given_names.txt**: a dictionary of given names derived from Wikipedia has been provided in the special "all" language directory. Western given names are usually abbreviated using the first letter but specifying them all would create too many potential variations. Instead of trying to expand "J" to every possible J name, it might be better to abbreviate possible given names and add those versions as alternate forms of the string for matching purposes.
- **house_number.txt**: strings that may be added as part of the house/building number (for languages like Spanish where it's common to say "No. 123" or "No. Ext. 123" for the house/building number instead of just "123" as in English).
- **level_types_basement.txt**: strings indicating a basement level.
- **level_types_mezzanine.txt**: strings indicating a mezzanine level.
@@ -43,6 +44,8 @@ lower/central/upper, etc.)
- **level_types_sub_basement.txt**: strings indicating a sub-basement level.
- **no_number.txt**: strings like "sin número" used for houses with no number.
- **nulls.txt**: strings meaning "not applicable" e.g. in spreadsheets or database fields that might have missing values
- **organizations.txt**: e.g. common retail chains, organizational acronyms for government agencies, the United Nations, etc.
- **people.txt**: abbreviations for specific people like MLK for Martin Luther King, Jr. or CDG for Charles du Galle
- **personal_suffixes.txt**: post-nominal suffixes, usually generational e.g. Junior/Senior in English or der Jungere in German.
- **personal_titles.txt**: civilian, royal, clerical, and military titles e.g. "Saint", "General", etc.
- **place_names.txt**: strings found in names of places e.g. "theatre",
@@ -54,6 +57,7 @@ lower/central/upper, etc.)
which may be ignored in some contexts
- **street_types.txt**: words like "street", "road", "drive" which indicate
a thoroughfare and their respective abbreviations.
- **surnames.txt**: a dictionary of surnames derived from Wikipedia has been provided in the special "all" language directory. If there are specific abbreviations for surnames in a language like Mdez. for Menendez, add them in the specific language's dictionary.
- **synonyms.txt**: any miscellaneous synonyms/abbreviations e.g. "bros"
expands to "brothers", etc. These have no special meaning and will essentially
just be treated as string replacement.