Commit Graph

2191 Commits

Author SHA1 Message Date
Al
efa75919e6 [dictionaries] numero sign in French 2016-07-21 17:04:57 -04:00
Al
ee71d94e85 [addresses] Adding Roman numerals to the Polish config for floor numbers 2016-07-21 17:04:57 -04:00
Al
11c6564783 [addresses] Russian address config 2016-07-21 17:04:57 -04:00
Al
7bc459f1a9 [dictionaries] Russian dictionaries to support address configs 2016-07-21 17:04:57 -04:00
Al
53052e6d25 [addresses] Polish address config and dictionary updates 2016-07-21 17:04:57 -04:00
Al
558d643042 [numex] Portuguese ordinals fix 2016-07-21 17:04:57 -04:00
Al
b15675f8cb [addresses/dictionaries] Adding rez-de-chaussée bas and rez-de-chaussée haut in French 2016-07-21 17:04:57 -04:00
Al
d89e9dcd04 [dictionaries] Variations on sin numero for Spanish 2016-07-21 17:04:57 -04:00
Al
ee27dc5ea1 [addresses/dictionaries] Updates to Portuguese configs, variations for Brasil 2016-07-21 17:04:57 -04:00
Al
8a5dd26dbf [numex] Adding method to do cardinal number spellout by hundreds e.g. twenty-three seventeen instead of two thousand three three hundred seventeen 2016-07-21 17:04:57 -04:00
Al
eee68d1ca5 [numex] Ordinal spellout using the numex configs 2016-07-21 17:04:57 -04:00
Al
c628b9bee8 [dictionaries] English cross streets 2016-07-21 17:04:57 -04:00
Al
8383d5bb12 [numex] Adding numeric expression spellout in the Python geodata module for generating training data 2016-07-21 17:04:57 -04:00
Al
53ea1c139a [osm/addresses] using new is_numeric in AddressComponents expansion and removing venue names that are identical to the house number 2016-07-21 17:04:57 -04:00
Al
8926293063 [parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments 2016-07-21 17:04:57 -04:00
Al
44908ff95a [parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces 2016-07-21 17:04:57 -04:00
Al
41ae742285 [fix] tokenized trie search when falling off the trie at the start of a valid phrase 2016-07-21 17:04:57 -04:00
Al
6e60b3bbda [fix] semicolon in #define 2016-07-21 17:04:57 -04:00
Al
0f76c8c631 [dictionaries] Portuguese abbreviations 2016-07-21 17:04:57 -04:00
Al
b8aba86471 [addresses] Implementing unit types which use concatenated floors with offsets for basement (e.g. Norway) 2016-07-21 17:04:57 -04:00
Al
c29d1ad947 [addresses] Implementing number_min_abs_value, number_max_abs_value outside of number_abs_value constraint 2016-07-21 17:04:57 -04:00
Al
589497cb16 [addresses] Adding Portuguese sub-building config 2016-07-21 17:04:57 -04:00
Al
2be41732f8 [dictionaries] Portuguese dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
1bd62313f4 [dictionaries] Adding e/ to ambiguous in Spanish dictionaries 2016-07-21 17:04:57 -04:00
Al
6b7e4f8515 [dictionaries] Adding No to Germanic-language number synonyms 2016-07-21 17:04:57 -04:00
Al
619127e4b1 [fix] adding back staircase in Swedish sub-building config 2016-07-21 17:04:57 -04:00
Al
bc70a54b09 [addresses] Swedish address config 2016-07-21 17:04:57 -04:00
Al
b622315d0f [addresses] Lower probability of null phrase in Norwegian configs 2016-07-21 17:04:57 -04:00
Al
ac22f270bb [dictionaries] Swedish dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
d8ddae362f [addresses] venstre in Norway rather than igjen 2016-07-21 17:04:57 -04:00
Al
cd9b33983a [addresses] Adding parterre for ground floor in Switzerland 2016-07-21 17:04:57 -04:00
Al
a61d9b1548 [dictionaries] adding phrases meaning 'near' or 'in' for Norwegian to the dictionaries 2016-07-21 17:04:57 -04:00
Al
541fe6c5ac [dictionaries] no standalone level types for Norway 2016-07-21 17:04:57 -04:00
Al
06fdf1c532 [fix] /underetasje/hovedetasje/ in Norwegian and translating category phrases from Danish 2016-07-21 17:04:57 -04:00
Al
0222049b88 [addresses] Danish level/unit and entrance/unit combinations 2016-07-21 17:04:57 -04:00
Al
03b9825390 [addresses/units] Adding special handling for floor phrase + unit concatenation in the unit field (handles bruksenhetsnummer/bolignummer-style addresses in Norway) 2016-07-21 17:04:57 -04:00
Al
9d7239d0ad [addresses] Adding null-phrase/null-phrase-alpha-only handling and zero padding to numbered components in sub-building configs 2016-07-21 17:04:57 -04:00
Al
420b169d48 [addresses] adding nb.yaml to valid configs 2016-07-21 17:04:57 -04:00
Al
d50495f609 [addresses] null_phrase_alpha_only for phrases like 3o B in Spain 2016-07-21 17:04:57 -04:00
Al
52db502929 [addresses] Norwegian address configs 2016-07-21 17:04:57 -04:00
Al
2831b70747 [dictionaries] Norwegian sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
b5d4dd6f37 [tokenization] Including full-width numbers in numeric tokens 2016-07-21 17:04:57 -04:00
Al
02d40c23a6 [numex] Norwegian ordinal indicators 2016-07-21 17:04:57 -04:00
Al
0136c88629 [addresses] Updates to Danish sub-building config 2016-07-21 17:04:57 -04:00
Al
5834f6b8ed [dictionaries] Updates to Danish sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
23736f2650 [fix] return None if there are no ordinal suffixes for a given language 2016-07-21 17:04:57 -04:00
Al
a6da72a831 [fix] addr:place= 2016-07-21 17:04:57 -04:00
Al
ca88ff7f73 [osm] Adding railway stations to venues/addresses data sets 2016-07-21 17:04:57 -04:00
Al
b22d30cb52 [addresses] Adding Danish config to parsed configs 2016-07-21 17:04:57 -04:00
Al
003c95f9eb [formatting] Adding Danish config to formatter and adjusting continental European template insertions 2016-07-21 17:04:57 -04:00