[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
This commit is contained in:
@@ -113,7 +113,7 @@ AMBIGUOUS_LANGUAGE = 'xxx'
|
||||
|
||||
|
||||
def disambiguate_language(text, languages):
|
||||
valid_languages = OrderedDict([(l['lang'], l['default']) for l in languages])
|
||||
valid_languages = OrderedDict(languages)
|
||||
tokens = tokenize(safe_decode(text).replace(u'-', u' ').lower())
|
||||
|
||||
current_lang = None
|
||||
|
||||
Reference in New Issue
Block a user