Commit Graph

2187 Commits

Author SHA1 Message Date
Al
9c43a6fdf8 [dictionaries] English cross streets 2016-06-24 16:12:33 -04:00
Al
e2a9a57269 [numex] Adding numeric expression spellout in the Python geodata module for generating training data 2016-06-24 16:10:36 -04:00
Al
cf2ed2b299 [osm/addresses] using new is_numeric in AddressComponents expansion and removing venue names that are identical to the house number 2016-06-23 13:59:40 -04:00
Al
106dfa80c3 [parser/cli] Using NFC normalization on the output in the parser client (closes #30). Optional command-line arg for parser output dir, useful for spot-checking different experiments 2016-06-22 11:56:35 -04:00
Al
e19bc86c5a [parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces 2016-06-22 11:50:42 -04:00
Al
3ff2f726d0 [fix] tokenized trie search when falling off the trie at the start of a valid phrase 2016-06-21 15:48:47 -04:00
Al
935a31df07 [fix] semicolon in #define 2016-06-21 15:16:14 -04:00
Al
b90239206f [dictionaries] Portuguese abbreviations 2016-06-16 19:18:02 +02:00
Al
082dbe6dd2 [addresses] Implementing unit types which use concatenated floors with offsets for basement (e.g. Norway) 2016-06-16 01:45:43 +02:00
Al
1f08cce1a7 [addresses] Implementing number_min_abs_value, number_max_abs_value outside of number_abs_value constraint 2016-06-16 01:44:12 +02:00
Al
c76e7ab776 [addresses] Adding Portuguese sub-building config 2016-06-16 01:43:03 +02:00
Al
68db871a33 [dictionaries] Portuguese dictionaries to support sub-building config 2016-06-16 01:42:21 +02:00
Al
cb0c913c34 [dictionaries] Adding e/ to ambiguous in Spanish dictionaries 2016-06-16 01:41:54 +02:00
Al
f22fcb7932 [dictionaries] Adding No to Germanic-language number synonyms 2016-06-16 01:41:06 +02:00
Al
a576e32371 [fix] adding back staircase in Swedish sub-building config 2016-06-15 23:39:16 +02:00
Al
5d7fabaa19 [addresses] Swedish address config 2016-06-15 16:34:37 +02:00
Al
d680f400d5 [addresses] Lower probability of null phrase in Norwegian configs 2016-06-15 16:30:42 +02:00
Al
2854621f2e [dictionaries] Swedish dictionaries to support sub-building config 2016-06-15 16:29:26 +02:00
Al
8d31acbe17 [addresses] venstre in Norway rather than igjen 2016-06-15 14:22:25 +02:00
Al
145786a4f3 [addresses] Adding parterre for ground floor in Switzerland 2016-06-15 14:20:42 +02:00
Al
715686954a [dictionaries] adding phrases meaning 'near' or 'in' for Norwegian to the dictionaries 2016-06-15 03:13:35 +02:00
Al
4e9f9fbac0 [dictionaries] no standalone level types for Norway 2016-06-15 03:12:54 +02:00
Al
8e27dd2554 [fix] /underetasje/hovedetasje/ in Norwegian and translating category phrases from Danish 2016-06-15 03:12:24 +02:00
Al
b07a594f79 [addresses] Danish level/unit and entrance/unit combinations 2016-06-15 02:55:25 +02:00
Al
ccd1d4825c [addresses/units] Adding special handling for floor phrase + unit concatenation in the unit field (handles bruksenhetsnummer/bolignummer-style addresses in Norway) 2016-06-14 22:02:14 +02:00
Al
f02d393b90 [addresses] Adding null-phrase/null-phrase-alpha-only handling and zero padding to numbered components in sub-building configs 2016-06-14 21:53:43 +02:00
Al
e6ac8062d8 [addresses] adding nb.yaml to valid configs 2016-06-14 21:52:11 +02:00
Al
4e192d0c2a [addresses] null_phrase_alpha_only for phrases like 3o B in Spain 2016-06-14 21:51:47 +02:00
Al
699b882a31 [addresses] Norwegian address configs 2016-06-14 21:36:32 +02:00
Al
2c0bdd9afe [dictionaries] Norwegian sub-building dictionaries 2016-06-14 21:35:23 +02:00
Al
eb1b410d63 [tokenization] Including full-width numbers in numeric tokens 2016-06-14 01:28:25 +02:00
Al
faf7ccbddd [numex] Norwegian ordinal indicators 2016-06-13 16:46:50 +02:00
Al
e79ef340ba [addresses] Updates to Danish sub-building config 2016-06-13 16:46:25 +02:00
Al
3557a2313c [dictionaries] Updates to Danish sub-building dictionaries 2016-06-13 16:45:45 +02:00
Al
e1cb8b4bbb [fix] return None if there are no ordinal suffixes for a given language 2016-06-13 16:17:26 +02:00
Al
1f7186d9f2 [fix] addr:place= 2016-06-09 16:17:21 +02:00
Al
e0306b2147 [osm] Adding railway stations to venues/addresses data sets 2016-06-09 14:59:37 +02:00
Al
89c09fb8aa [addresses] Adding Danish config to parsed configs 2016-06-07 18:04:24 -04:00
Al
95842a0a8d [formatting] Adding Danish config to formatter and adjusting continental European template insertions 2016-06-07 18:03:41 -04:00
Al
085ba945e2 [addresses] Danish address config 2016-06-07 18:03:08 -04:00
Al
30e1114e6e [dictionaries] Danish sub-building dictionaries 2016-06-07 18:01:30 -04:00
Al
135d50827d [fix] adjusting a few probabilities for German 2016-06-07 17:58:22 -04:00
Al
8854c372ac [addresses/dictionaries] Adding Catalan address config 2016-06-02 21:06:29 -04:00
Al
d947af8152 [addresses] Dutch cross streets 2016-06-02 12:26:12 -04:00
Al
a5ae40f7ee [fix] Adding sampling for French intersections 2016-06-02 12:22:25 -04:00
Al
18c25fd4fc [fix] adding sampling to Spanish intersections 2016-06-02 12:21:18 -04:00
Al
3b0712ef41 [fix] name 2016-06-02 12:17:40 -04:00
Al
24b84dd503 [fix] name 2016-06-02 03:05:31 -04:00
Al
2958cbfacb [dictionaries] Adding suite to Spanish dictionaries, used sometimes in Latin America, removing entre from stopwords as it's part of the intersections dictionary 2016-06-02 00:31:40 -04:00
Al
a05eb0fd51 [addresses] Spanish intersections, suite 2016-06-02 00:26:11 -04:00