Al
|
5e2d9f371e
|
[numex] Moving numex script to a different subpackage, adding function for creating ordinals
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
e6b59980e7
|
[categories] Scraper for Nominatim Special Phrases, translated into a number of languages
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
1bc92d6995
|
[fix] output path in numex.py
|
2016-03-29 11:25:36 -04:00 |
|
Al
|
2a2d1738a3
|
[fix] path for running numex.py
|
2016-03-29 11:15:24 -04:00 |
|
Al
|
7696179843
|
[osm] Removing generic amenities like ATMs, parking, restrooms, etc. from addresses but keeping them in venues to support generic queries
|
2016-03-14 01:07:03 -04:00 |
|
Al
|
18e2c7519e
|
[fix] Absolute dir check in generating expansion data files
|
2016-03-13 23:23:46 -04:00 |
|
Al
|
c5498c6c0c
|
[osm] Incorporating airports, and only including certain values for tourism= and leisure= since not all are physical place types, adding building= to addresses
|
2016-03-12 15:02:31 -05:00 |
|
Al Barrentine
|
942e5df1b9
|
Merge pull request #40 from thatdatabaseguy/master
Including landmarks + more venues in OSM training data
|
2016-03-11 16:47:11 -05:00 |
|
Al
|
7a24ced43c
|
[fix] longitude validation
|
2016-03-11 16:35:33 -05:00 |
|
Al
|
99f452c7b1
|
[geo] Validate lat/lon in latlon_to_decimal
|
2016-03-11 16:18:31 -05:00 |
|
Al
|
a2f186a0ee
|
[geo] Adding lat/lon validation functions for the training scripts
|
2016-03-11 14:09:10 -05:00 |
|
Al
|
f7d6943994
|
[fix] no comma in download_quattroshapes filenames
|
2016-03-10 23:40:54 -05:00 |
|
Al
|
a71fa7bd8d
|
[osm] tourism= keys should only be included in some cases. Listing everything on taginfo with >= 100 uses
|
2016-03-10 14:17:38 -05:00 |
|
Al
|
d43fe201ff
|
[osm] No longer requiring street name in OSM planet addresses. Adding leisure and tourism keys to capture things like parks, squares, etc. Adding place=locality for neighborhoods.
|
2016-03-09 18:19:33 -05:00 |
|
Al
|
1003832b9c
|
[fix] README should not be included in building address dictionaries
|
2016-03-09 11:18:19 -05:00 |
|
Al
|
08085ee08b
|
[languages][ci skip] Checking in script to extract address phrases in various languages using frequent itemsets
|
2016-03-08 14:35:20 -05:00 |
|
Al
|
a483fd5d42
|
[fix][ci skip] pip installing some light requirements when the dictionaries/numex files change. Only building transliteration if the data file changed (the CLDR files are not in-repo so will be built offline)
|
2016-03-04 16:17:05 -05:00 |
|
Al
|
52ebc9fc46
|
[fix] Paths relative to the current file in address_dictionaries.py so it can be run from anywhere
|
2016-02-24 13:10:44 -05:00 |
|
Al
|
393fd7e0f3
|
[build] Using env var for data dir in geodata build script
|
2016-02-08 01:11:42 -05:00 |
|
Al
|
b4dcb83e10
|
[fix] sets of potential languages in case phrase matches multiple dictionaries
|
2016-01-24 17:57:12 -05:00 |
|
Al
|
b713d102d1
|
[languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language
|
2016-01-24 17:43:14 -05:00 |
|
Al
|
b3e730d83f
|
[languages] If there's a single default language, assume ambiguous abbreviations are the default
|
2016-01-24 17:15:02 -05:00 |
|
Al
|
fffaeecfc6
|
[languages] Only count regional defaults when returning languages
|
2016-01-24 16:35:14 -05:00 |
|
Al
|
f8a0463aa0
|
[languages] Language disambiguation treats the national languages as non-default
|
2016-01-24 15:10:04 -05:00 |
|
Al
|
f04360732c
|
[languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example)
|
2016-01-24 03:17:21 -05:00 |
|
Al
|
00ce71223f
|
[osm] Using the default probabilities for abbreviations in ways training data
|
2016-01-24 00:53:41 -05:00 |
|
Al
|
bab7a0f961
|
[osm] splitting streets (way names) on semicolons
|
2016-01-24 00:42:25 -05:00 |
|
Al
|
3485738c2b
|
[fix] regional languages in French Canada
|
2016-01-24 00:20:34 -05:00 |
|
Al
|
7646adfc0f
|
[osm] Adding abbreviated street names in addition to the originals
|
2016-01-23 23:23:58 -05:00 |
|
Al
|
67130383ce
|
[fix] converting semicolons to commas in OSM house numbers and picking one at random
|
2016-01-23 23:16:19 -05:00 |
|
Al
|
1bb797f783
|
[fix] spacing in phrases
|
2016-01-23 21:59:49 -05:00 |
|
Al
|
3a8c3dfcf6
|
[fix] spacing in phrases at end of string
|
2016-01-23 21:51:40 -05:00 |
|
Al
|
78450bfad9
|
[fix] Spaces in abbreviation
|
2016-01-23 21:36:20 -05:00 |
|
Al
|
308ceb5a5f
|
[fix] convert UTF8 slices back to unicode before using with the Python trie
|
2016-01-23 20:20:23 -05:00 |
|
Al
|
5eb6bb309b
|
[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string
|
2016-01-23 20:09:45 -05:00 |
|
Al
|
d61207e95a
|
[fix] var name
|
2016-01-23 18:01:02 -05:00 |
|
Al
|
e44cba1d06
|
[fix] geonames db not required in OSM training data
|
2016-01-23 17:59:55 -05:00 |
|
Al
|
4f03711e60
|
[osm] Adding abbreviated training examples to ways language training data
|
2016-01-23 14:10:47 -05:00 |
|
Al
|
c9fb4ee69d
|
[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used
|
2016-01-22 17:58:24 -05:00 |
|
Al
|
ea9bb3f2d5
|
[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled
|
2016-01-22 15:48:21 -05:00 |
|
Al
|
f9f6558e06
|
[fix] simple whitespace field splits for the limited format training data (used for language classification)
|
2016-01-22 04:34:42 -05:00 |
|
Al
|
cd1db7b288
|
[fix] Making sure rare components are dropped first, adding state and country back in
|
2016-01-22 04:17:19 -05:00 |
|
Al
|
adc3a00264
|
[fix] var name
|
2016-01-22 04:10:16 -05:00 |
|
Al
|
261beffa36
|
[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities
|
2016-01-22 04:00:45 -05:00 |
|
Al
|
a6cc3d0114
|
[fix] Adding state to the more frequently dropped components
|
2016-01-22 03:56:38 -05:00 |
|
Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
9dd965a6fa
|
[fix] removing gazetteer configuration from disambiguation module
|
2016-01-22 03:18:18 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
5a68e7aeef
|
[fix] import
|
2016-01-22 03:00:43 -05:00 |
|