Commit Graph

2177 Commits

Author SHA1 Message Date
Al
8926293063 [parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments 2016-07-21 17:04:57 -04:00
Al
44908ff95a [parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces 2016-07-21 17:04:57 -04:00
Al
41ae742285 [fix] tokenized trie search when falling off the trie at the start of a valid phrase 2016-07-21 17:04:57 -04:00
Al
6e60b3bbda [fix] semicolon in #define 2016-07-21 17:04:57 -04:00
Al
0f76c8c631 [dictionaries] Portuguese abbreviations 2016-07-21 17:04:57 -04:00
Al
b8aba86471 [addresses] Implementing unit types which use concatenated floors with offsets for basement (e.g. Norway) 2016-07-21 17:04:57 -04:00
Al
c29d1ad947 [addresses] Implementing number_min_abs_value, number_max_abs_value outside of number_abs_value constraint 2016-07-21 17:04:57 -04:00
Al
589497cb16 [addresses] Adding Portuguese sub-building config 2016-07-21 17:04:57 -04:00
Al
2be41732f8 [dictionaries] Portuguese dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
1bd62313f4 [dictionaries] Adding e/ to ambiguous in Spanish dictionaries 2016-07-21 17:04:57 -04:00
Al
6b7e4f8515 [dictionaries] Adding No to Germanic-language number synonyms 2016-07-21 17:04:57 -04:00
Al
619127e4b1 [fix] adding back staircase in Swedish sub-building config 2016-07-21 17:04:57 -04:00
Al
bc70a54b09 [addresses] Swedish address config 2016-07-21 17:04:57 -04:00
Al
b622315d0f [addresses] Lower probability of null phrase in Norwegian configs 2016-07-21 17:04:57 -04:00
Al
ac22f270bb [dictionaries] Swedish dictionaries to support sub-building config 2016-07-21 17:04:57 -04:00
Al
d8ddae362f [addresses] venstre in Norway rather than igjen 2016-07-21 17:04:57 -04:00
Al
cd9b33983a [addresses] Adding parterre for ground floor in Switzerland 2016-07-21 17:04:57 -04:00
Al
a61d9b1548 [dictionaries] adding phrases meaning 'near' or 'in' for Norwegian to the dictionaries 2016-07-21 17:04:57 -04:00
Al
541fe6c5ac [dictionaries] no standalone level types for Norway 2016-07-21 17:04:57 -04:00
Al
06fdf1c532 [fix] /underetasje/hovedetasje/ in Norwegian and translating category phrases from Danish 2016-07-21 17:04:57 -04:00
Al
0222049b88 [addresses] Danish level/unit and entrance/unit combinations 2016-07-21 17:04:57 -04:00
Al
03b9825390 [addresses/units] Adding special handling for floor phrase + unit concatenation in the unit field (handles bruksenhetsnummer/bolignummer-style addresses in Norway) 2016-07-21 17:04:57 -04:00
Al
9d7239d0ad [addresses] Adding null-phrase/null-phrase-alpha-only handling and zero padding to numbered components in sub-building configs 2016-07-21 17:04:57 -04:00
Al
420b169d48 [addresses] adding nb.yaml to valid configs 2016-07-21 17:04:57 -04:00
Al
d50495f609 [addresses] null_phrase_alpha_only for phrases like 3o B in Spain 2016-07-21 17:04:57 -04:00
Al
52db502929 [addresses] Norwegian address configs 2016-07-21 17:04:57 -04:00
Al
2831b70747 [dictionaries] Norwegian sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
b5d4dd6f37 [tokenization] Including full-width numbers in numeric tokens 2016-07-21 17:04:57 -04:00
Al
02d40c23a6 [numex] Norwegian ordinal indicators 2016-07-21 17:04:57 -04:00
Al
0136c88629 [addresses] Updates to Danish sub-building config 2016-07-21 17:04:57 -04:00
Al
5834f6b8ed [dictionaries] Updates to Danish sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
23736f2650 [fix] return None if there are no ordinal suffixes for a given language 2016-07-21 17:04:57 -04:00
Al
a6da72a831 [fix] addr:place= 2016-07-21 17:04:57 -04:00
Al
ca88ff7f73 [osm] Adding railway stations to venues/addresses data sets 2016-07-21 17:04:57 -04:00
Al
b22d30cb52 [addresses] Adding Danish config to parsed configs 2016-07-21 17:04:57 -04:00
Al
003c95f9eb [formatting] Adding Danish config to formatter and adjusting continental European template insertions 2016-07-21 17:04:57 -04:00
Al
b8ae1ad61d [addresses] Danish address config 2016-07-21 17:04:57 -04:00
Al
6f5b0e16a1 [dictionaries] Danish sub-building dictionaries 2016-07-21 17:04:57 -04:00
Al
1d09060012 [fix] adjusting a few probabilities for German 2016-07-21 17:04:57 -04:00
Al
6861c09caa [addresses/dictionaries] Adding Catalan address config 2016-07-21 17:04:57 -04:00
Al
4fa8c2aa8e [addresses] Dutch cross streets 2016-07-21 17:04:57 -04:00
Al
6e4ca716df [fix] Adding sampling for French intersections 2016-07-21 17:04:57 -04:00
Al
38e17bd1b2 [fix] adding sampling to Spanish intersections 2016-07-21 17:04:57 -04:00
Al
72e647902d [fix] name 2016-07-21 17:04:57 -04:00
Al
03be909a60 [fix] name 2016-07-21 17:04:57 -04:00
Al
45e069be6a [dictionaries] Adding suite to Spanish dictionaries, used sometimes in Latin America, removing entre from stopwords as it's part of the intersections dictionary 2016-07-21 17:04:57 -04:00
Al
127883facc [addresses] Spanish intersections, suite 2016-07-21 17:04:57 -04:00
Al
14f08e5991 [formatting] Adding aliases in formatting config, so e.g. most of the Francophone world shares France's config without needing to be the case for every French address (e.g. Belgium), generic config for continental Europe, etc. 2016-07-21 17:04:57 -04:00
Al
75e9d94684 [dictionaries] Adding case postale to French dictionaries 2016-07-21 17:04:57 -04:00
Al
ad7ef082a5 [dictionaries] extended Dutch dictionaries 2016-07-21 17:04:57 -04:00