Commit Graph

2180 Commits

Author SHA1 Message Date
Al
c628b9bee8 [dictionaries] English cross streets 2016-07-21 17:04:57 -04:00
Al
8383d5bb12 [numex] Adding numeric expression spellout in the Python geodata module for generating training data 2016-07-21 17:04:57 -04:00
Al
53ea1c139a [osm/addresses] using new is_numeric in AddressComponents expansion and removing venue names that are identical to the house number 2016-07-21 17:04:57 -04:00
Al
8926293063 [parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments 2016-07-21 17:04:57 -04:00
Al
44908ff95a [parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces 2016-07-21 17:04:57 -04:00
Al
41ae742285 [fix] tokenized trie search when falling off the trie at the start of a valid phrase 2016-07-21 17:04:57 -04:00
Al
6e60b3bbda [fix] semicolon in #define 2016-07-21 17:04:57 -04:00
Al
0f76c8c631 [dictionaries] Portuguese abbreviations 2016-07-21 17:04:57 -04:00
Al
b8aba86471 [addresses] Implementing unit types which use concatenated floors with offsets for basement (e.g. Norway) 2016-07-21 17:04:57 -04:00
Al
c29d1ad947 [addresses] Implementing number_min_abs_value, number_max_abs_value outside of number_abs_value constraint 2016-07-21 17:04:57 -04:00
Al
589497cb16 [addresses] Adding Portuguese sub-building config 2016-07-21 17:04:57 -04:00
Al
2be41732f8 [dictionaries] Portuguese dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
1bd62313f4 [dictionaries] Adding e/ to ambiguous in Spanish dictionaries 2016-07-21 17:04:57 -04:00
Al
6b7e4f8515 [dictionaries] Adding No to Germanic-language number synonyms 2016-07-21 17:04:57 -04:00
Al
619127e4b1 [fix] adding back staircase in Swedish sub-building config 2016-07-21 17:04:57 -04:00
Al
bc70a54b09 [addresses] Swedish address config 2016-07-21 17:04:57 -04:00
Al
b622315d0f [addresses] Lower probability of null phrase in Norwegian configs 2016-07-21 17:04:57 -04:00
Al
ac22f270bb [dictionaries] Swedish dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
d8ddae362f [addresses] venstre in Norway rather than igjen 2016-07-21 17:04:57 -04:00
Al
cd9b33983a [addresses] Adding parterre for ground floor in Switzerland 2016-07-21 17:04:57 -04:00
Al
a61d9b1548 [dictionaries] adding phrases meaning 'near' or 'in' for Norwegian to the dictionaries 2016-07-21 17:04:57 -04:00
Al
541fe6c5ac [dictionaries] no standalone level types for Norway 2016-07-21 17:04:57 -04:00
Al
06fdf1c532 [fix] /underetasje/hovedetasje/ in Norwegian and translating category phrases from Danish 2016-07-21 17:04:57 -04:00
Al
0222049b88 [addresses] Danish level/unit and entrance/unit combinations 2016-07-21 17:04:57 -04:00
Al
03b9825390 [addresses/units] Adding special handling for floor phrase + unit concatenation in the unit field (handles bruksenhetsnummer/bolignummer-style addresses in Norway) 2016-07-21 17:04:57 -04:00
Al
9d7239d0ad [addresses] Adding null-phrase/null-phrase-alpha-only handling and zero padding to numbered components in sub-building configs 2016-07-21 17:04:57 -04:00
Al
420b169d48 [addresses] adding nb.yaml to valid configs 2016-07-21 17:04:57 -04:00
Al
d50495f609 [addresses] null_phrase_alpha_only for phrases like 3o B in Spain 2016-07-21 17:04:57 -04:00
Al
52db502929 [addresses] Norwegian address configs 2016-07-21 17:04:57 -04:00
Al
2831b70747 [dictionaries] Norwegian sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
b5d4dd6f37 [tokenization] Including full-width numbers in numeric tokens 2016-07-21 17:04:57 -04:00
Al
02d40c23a6 [numex] Norwegian ordinal indicators 2016-07-21 17:04:57 -04:00
Al
0136c88629 [addresses] Updates to Danish sub-building config 2016-07-21 17:04:57 -04:00
Al
5834f6b8ed [dictionaries] Updates to Danish sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
23736f2650 [fix] return None if there are no ordinal suffixes for a given language 2016-07-21 17:04:57 -04:00
Al
a6da72a831 [fix] addr:place= 2016-07-21 17:04:57 -04:00
Al
ca88ff7f73 [osm] Adding railway stations to venues/addresses data sets 2016-07-21 17:04:57 -04:00
Al
b22d30cb52 [addresses] Adding Danish config to parsed configs 2016-07-21 17:04:57 -04:00
Al
003c95f9eb [formatting] Adding Danish config to formatter and adjusting continental European template insertions 2016-07-21 17:04:57 -04:00
Al
b8ae1ad61d [addresses] Danish address config 2016-07-21 17:04:57 -04:00
Al
6f5b0e16a1 [dictionaries] Danish sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
1d09060012 [fix] adjusting a few probabilities for German 2016-07-21 17:04:57 -04:00
Al
6861c09caa [addresses/dictionaries] Adding Catalan address config 2016-07-21 17:04:57 -04:00
Al
4fa8c2aa8e [addresses] Dutch cross streets 2016-07-21 17:04:57 -04:00
Al
6e4ca716df [fix] Adding sampling for French intersections 2016-07-21 17:04:57 -04:00
Al
38e17bd1b2 [fix] adding sampling to Spanish intersections 2016-07-21 17:04:57 -04:00
Al
72e647902d [fix] name 2016-07-21 17:04:57 -04:00
Al
03be909a60 [fix] name 2016-07-21 17:04:57 -04:00
Al
45e069be6a [dictionaries] Adding suite to Spanish dictionaries, used sometimes in Latin America, removing entre from stopwords as it's part of the intersections dictionary 2016-07-21 17:04:57 -04:00
Al
127883facc [addresses] Spanish intersections, suite 2016-07-21 17:04:57 -04:00