Commit Graph

539 Commits

Author SHA1 Message Date
Al
12a688df36 [osm] Splitting out generic amenities like ATM, fuel, restrooms, etc. so they can be used in category queries. Adding subdivision polygons, postcode polygons, building polygons, adding a few types of place keys to venues data set 2016-07-21 17:04:57 -04:00
Al
fc689222da [osm] adding civil boundaries (e.g. postal areas in Dublin), fixing output files 2016-07-21 17:04:57 -04:00
Al
492b6ee235 [categories] Using TSV files instead of YAML for category queries, easier to edit 2016-07-21 17:04:57 -04:00
Al
f3a9f4a257 [fix] removing init_gazetteers, doing it at the module level 2016-07-21 17:04:57 -04:00
Al
0162194dbc [dictionaries] Adding dictionary type enums to the generator script 2016-07-21 17:04:57 -04:00
Al
5d19aacb25 [categories] Keeping keys sorted in generated YAML files, ignoring Interlingua queries 2016-07-21 17:04:57 -04:00
Al
e65711f6fa [fix] changing extension to .yaml 2016-07-21 17:04:57 -04:00
Al
a1a4c5ff7c [build] Moving dependencies for scripts into a requirements.txt 2016-07-21 17:04:57 -04:00
Al
3cd32584c6 [categories] Moving category configs to YAML files 2016-07-21 17:04:57 -04:00
Al
2b4a9f0962 [osm] Splitting category queries data into several files (amenities, buildings, natural features, waterways) 2016-07-21 17:04:57 -04:00
Al
b25682e761 [polygons/zones] Adding a polygon reader for OSM zones (named residential/commercial/industrial/military areas) which are closed ways and can be used in addresses e.g. in office parks, larger housing complexes, etc. 2016-07-21 17:04:57 -04:00
Al
9ee1f56a0f [categories] documentation of Nominatim special phrases scraper 2016-07-21 17:04:57 -04:00
Al
ac18e383bd [osm] Building OSM file for deriving category queries, zone data for including the names of residential, commercial and industrial areas in the parser. Named landuse and historic features are considered valid places/venues. 2016-07-21 17:04:57 -04:00
Al
af73bb300d [fix] Adding islands to admin borders 2016-07-21 17:04:57 -04:00
Al
5e2d9f371e [numex] Moving numex script to a different subpackage, adding function for creating ordinals 2016-07-21 17:04:57 -04:00
Al
e6b59980e7 [categories] Scraper for Nominatim Special Phrases, translated into a number of languages 2016-07-21 17:04:57 -04:00
Al
1bc92d6995 [fix] output path in numex.py 2016-03-29 11:25:36 -04:00
Al
2a2d1738a3 [fix] path for running numex.py 2016-03-29 11:15:24 -04:00
Al
7696179843 [osm] Removing generic amenities like ATMs, parking, restrooms, etc. from addresses but keeping them in venues to support generic queries 2016-03-14 01:07:03 -04:00
Al
18e2c7519e [fix] Absolute dir check in generating expansion data files 2016-03-13 23:23:46 -04:00
Al
c5498c6c0c [osm] Incorporating airports, and only including certain values for tourism= and leisure= since not all are physical place types, adding building= to addresses 2016-03-12 15:02:31 -05:00
Al Barrentine
942e5df1b9 Merge pull request #40 from thatdatabaseguy/master
Including landmarks + more venues in OSM training data
2016-03-11 16:47:11 -05:00
Al
7a24ced43c [fix] longitude validation 2016-03-11 16:35:33 -05:00
Al
99f452c7b1 [geo] Validate lat/lon in latlon_to_decimal 2016-03-11 16:18:31 -05:00
Al
a2f186a0ee [geo] Adding lat/lon validation functions for the training scripts 2016-03-11 14:09:10 -05:00
Al
f7d6943994 [fix] no comma in download_quattroshapes filenames 2016-03-10 23:40:54 -05:00
Al
a71fa7bd8d [osm] tourism= keys should only be included in some cases. Listing everything on taginfo with >= 100 uses 2016-03-10 14:17:38 -05:00
Al
d43fe201ff [osm] No longer requiring street name in OSM planet addresses. Adding leisure and tourism keys to capture things like parks, squares, etc. Adding place=locality for neighborhoods. 2016-03-09 18:19:33 -05:00
Al
1003832b9c [fix] README should not be included in building address dictionaries 2016-03-09 11:18:19 -05:00
Al
08085ee08b [languages][ci skip] Checking in script to extract address phrases in various languages using frequent itemsets 2016-03-08 14:35:20 -05:00
Al
a483fd5d42 [fix][ci skip] pip installing some light requirements when the dictionaries/numex files change. Only building transliteration if the data file changed (the CLDR files are not in-repo so will be built offline) 2016-03-04 16:17:05 -05:00
Al
52ebc9fc46 [fix] Paths relative to the current file in address_dictionaries.py so it can be run from anywhere 2016-02-24 13:10:44 -05:00
Al
393fd7e0f3 [build] Using env var for data dir in geodata build script 2016-02-08 01:11:42 -05:00
Al
b4dcb83e10 [fix] sets of potential languages in case phrase matches multiple dictionaries 2016-01-24 17:57:12 -05:00
Al
b713d102d1 [languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language 2016-01-24 17:43:14 -05:00
Al
b3e730d83f [languages] If there's a single default language, assume ambiguous abbreviations are the default 2016-01-24 17:15:02 -05:00
Al
fffaeecfc6 [languages] Only count regional defaults when returning languages 2016-01-24 16:35:14 -05:00
Al
f8a0463aa0 [languages] Language disambiguation treats the national languages as non-default 2016-01-24 15:10:04 -05:00
Al
f04360732c [languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example) 2016-01-24 03:17:21 -05:00
Al
00ce71223f [osm] Using the default probabilities for abbreviations in ways training data 2016-01-24 00:53:41 -05:00
Al
bab7a0f961 [osm] splitting streets (way names) on semicolons 2016-01-24 00:42:25 -05:00
Al
3485738c2b [fix] regional languages in French Canada 2016-01-24 00:20:34 -05:00
Al
7646adfc0f [osm] Adding abbreviated street names in addition to the originals 2016-01-23 23:23:58 -05:00
Al
67130383ce [fix] converting semicolons to commas in OSM house numbers and picking one at random 2016-01-23 23:16:19 -05:00
Al
1bb797f783 [fix] spacing in phrases 2016-01-23 21:59:49 -05:00
Al
3a8c3dfcf6 [fix] spacing in phrases at end of string 2016-01-23 21:51:40 -05:00
Al
78450bfad9 [fix] Spaces in abbreviation 2016-01-23 21:36:20 -05:00
Al
308ceb5a5f [fix] convert UTF8 slices back to unicode before using with the Python trie 2016-01-23 20:20:23 -05:00
Al
5eb6bb309b [fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string 2016-01-23 20:09:45 -05:00
Al
d61207e95a [fix] var name 2016-01-23 18:01:02 -05:00