Commit Graph

4429 Commits

Author SHA1 Message Date
Al
1780c5e053 [fix] moving enum 2016-12-31 13:01:57 -05:00
Al
475aa3dbfa [strings] fixing and simplifying string tree iterator. This version is inspired by Python's itertools.product (itertoolsmodule.c has so many goodies) 2016-12-31 03:22:27 -05:00
Al
261ec3888a [strings] header changes for new utf8 lower/upper functions 2016-12-31 03:20:43 -05:00
Al
58b063b632 [strings] making string_tree_iterator_done more meaningful (returns true if the iterator has no paths left to traverse) 2016-12-31 00:54:36 -05:00
Al
8978000320 [strings] adding latest utf8proc, new functions for utf8_lower (instead of case folding) and utf8_upper, and a utf8_is_whitespace that takes things like tabs into account 2016-12-31 00:52:12 -05:00
Al
db16e656ca [parser/cli] adding .print_features option in address_parser client for debugging 2016-12-31 00:20:35 -05:00
Al
bdb51a244e [phrases] fix case in trie search when searching for tokens in a string tail. If we're on the last token in a sequenence and the token matches the tail, check that the tail is complete, and if so return the match before exiting the loop. Affects multiword phrases that tend to appear toward the end of a sequence (long country names like "United States of America", etc.) 2016-12-29 16:17:09 -05:00
Al
2d077699e6 [places] adding is_in property to the set of tags for the places index. This may allow us to make more granular exceptions for node-based places that are actually suburbs but classified as {hamlet, village, locality, town}, etc. if the is_in contains a city that's also a boundary or nearby point 2016-12-29 14:04:13 -05:00
Al
cad57b94b2 [boundaries] mapping place=hamlet to suburb for all of Malaysia. place=village becomes suburb as well in the urban core 2016-12-29 14:01:57 -05:00
Al
21a2a7419a [addresses] only add village as city component if no city can be found in the area 2016-12-29 13:41:05 -05:00
Al
8080e16791 [openaddresses] adding Joinville, Brasil and adding OSM boundaries for Brasilian address data sets 2016-12-29 13:27:49 -05:00
Al
0b6947840c [dictionaries] removing Belarusian place_names.txt 2016-12-29 03:24:57 -05:00
Al
05732f6718 [build] Makefile changes for new parser feature extraction 2016-12-29 02:39:29 -05:00
Al
091167ed3c [api] remove geodb from libpostal.c 2016-12-29 02:35:43 -05:00
Al
acd953ce51 [parser] first pass at new parser feature extraction
- removing geodb phrases
- use Latin-ASCII-simple transliteration (no umlauts, etc.)
- no digit normalization for admin component phrases and postcodes
- tag = START + word, special feature for first word in the sequence
- add the new admin boundary categories
- for hyphenated non-phrase words, add each sub-word
- for rare and unknown words, add ngram features of 3-6 characters with
  underscores to indicate beginnings and endings (similar to language
  classifier features)
- defines notion of "rare words" (known words with a frequency <= n where
  n > the unknown word threshold), so known words can share
  statistical strength with artificial and real unknown words
2016-12-29 02:17:35 -05:00
Al
e62101b8bf [parser] remove geodb from address_parser_test, sort confusion matrix 2016-12-29 02:14:40 -05:00
Al
174529e8d0 [parser] remove geodb and fix small memory leak in address_parser_train 2016-12-29 02:12:06 -05:00
Al
bde5fdfaad [merge] merging in master 2016-12-29 02:00:31 -05:00
Al
646d96e13e Merge remote-tracking branch 'origin/master' into parser-data 2016-12-29 01:58:38 -05:00
Al
a26a01ece3 [openaddresses] adding SEMCOG counties, MI 2016-12-28 19:37:44 -05:00
Al
22b4a215f4 [places] additional form for West Indies 2016-12-28 17:58:32 -05:00
Al
f58ebbdf7f [fix] var name 2016-12-28 14:37:00 -05:00
Al
7ee44a584b [fix] genitive case for Russian/Ukrainian toponyms, not locative (#125) 2016-12-28 14:34:28 -05:00
Al
e6e4b28e43 [addresses] making the город/г. prefix apply to the Russian language rather than the country 2016-12-28 13:26:19 -05:00
Al
f995fdf9d2 [fix] default None 2016-12-28 05:09:15 -05:00
Al
3dc6a69bf5 [openaddresses] adding locative names in OpenAddresses as well, which contains some Ukraine data sets 2016-12-28 04:59:55 -05:00
Al
91013fe296 [fix] moving checks inside the add_locatives function, fixing float cast 2016-12-28 04:59:27 -05:00
Al
6f009fb8a6 [addresses] adding pymorphy2 for converting Russian and Ukrainian place names (sticking with state and staet_district for the moment) to the locative case as mentioned in #125 2016-12-28 04:48:32 -05:00
Al
e91907a21b [boundaries] actually, the urban okrugs/districts seem to function more like neighborhoods in St Petersburg and Moscow, calling the raions city_district and the okrugs suburb 2016-12-28 01:36:11 -05:00
Travis
6c35eb9e65 [auto][ci skip] Adding data files from Travis build #186 2016-12-28 06:29:35 +00:00
Al
a86d6d5528 [merge] merging in master 2016-12-28 01:11:04 -05:00
Al Barrentine
47c3b0091b Merge pull request #147 from Komzpa/patch-1
Remove place names that are not place names (RU, BE)
2016-12-28 01:08:48 -05:00
Al
e23951a90f [dictionaries] new Ukrainian place names dictionary from http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases/UK 2016-12-28 01:08:01 -05:00
Al
0bcaf816c4 [dictionaries] new Russian place names dictionary from http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases/RU 2016-12-28 01:07:35 -05:00
Al
561d195be4 [fix] add global_overrides_last=True for federal cities in Russia 2016-12-28 00:49:13 -05:00
Al
4ce8f414ef [boundaries] adding Moscow and St Petersburg as cities despite technically having "state" boundaries 2016-12-28 00:25:20 -05:00
Al
12c7bed275 [fix] /exceptions/overrides/ 2016-12-28 00:16:22 -05:00
Al
1afe97b508 [fix] /containing/contained_by/ 2016-12-28 00:04:18 -05:00
Al
66eda96b75 [boundaries] admin_level=8 is city_district in Moscow and St Petersburg 2016-12-27 23:59:14 -05:00
Al
4344c5fdf3 [formatting] adding non-zero invert probabilities to all the former Soviet states. Other template insertions can still apply afterward for #125 2016-12-27 23:25:49 -05:00
Al
25e966411d [formatting] adding the ability to invert the address template (line by line, preserving order within each line) with certain probabilities 2016-12-27 23:25:49 -05:00
Al
1c17f1f2e2 [names/ru] adding г. (город) prefix to Russian city names 50% of the time in various forms per #125 2016-12-27 23:25:41 -05:00
Al
165056ccd8 [names] adding configurable prefix/suffix additions for boundary names 2016-12-27 20:32:23 -05:00
Travis
dc528affd5 [auto][ci skip] Adding data files from Travis build #184 2016-12-27 23:45:40 +00:00
Al Barrentine
2a42ea016b Merge pull request #148 from Komzpa/patch-2
Ukrainian place names that are actually whatever
2016-12-27 18:35:48 -05:00
Darafei Praliaskouski
e514778645 Ukrainian place names that are actually whatever 2016-12-27 15:21:05 +03:00
Darafei Praliaskouski
dba8c28e6a Remove Russian place names that are actually street names 2016-12-27 13:28:23 +03:00
Darafei Praliaskouski
38a6618e40 Remove Belarusian place names that are not place names
These all are parts of streets.
2016-12-27 13:26:30 +03:00
Al
80a9c1b308 [addresses] move country-specific cleanups to before reverse geocoding as those deal with the user-specified components 2016-12-27 04:19:57 -05:00
Al
d9c28ec160 [names] adding regional council and regional municipality to suffixes 2016-12-27 03:45:09 -05:00