Al
|
fa7b855ecb
|
[languages] Earlier exit on finding ambiguous script spans
|
2015-08-24 03:07:57 -04:00 |
|
Al
|
90f333b16c
|
[languages] Adding English non-default dictionaries to a number of countries where English can be found in OSM
|
2015-08-24 02:49:49 -04:00 |
|
Al
|
e1d336716c
|
[languages] Non-default language canonicals, more test cases
|
2015-08-24 02:21:53 -04:00 |
|
Al
|
c1ce91abbf
|
[languages] Better handling of non-default langauge canonicals in default langauge text
|
2015-08-24 01:26:17 -04:00 |
|
Al
|
96d7b990b5
|
[fix] .items()
|
2015-08-23 23:39:30 -04:00 |
|
Al
|
9f6f4feea1
|
[dictionaries/languages] Adding English gazetteers for Bahrain, pas abbreviation for paseo
|
2015-08-23 23:32:34 -04:00 |
|
Al
|
84e0982cbc
|
[languages] Allow stopwords to help disambiguate if they can, otherwise ignore them
|
2015-08-23 23:04:17 -04:00 |
|
Al
|
d14be57e73
|
[dictionaries] Adding exit as an English street type
|
2015-08-23 22:51:22 -04:00 |
|
Al
|
7053c6b60b
|
[fix] language disambiguation
|
2015-08-23 22:50:27 -04:00 |
|
Al
|
e26776a5e9
|
[dictionaries] Occitan stopwords for disambiguating from French
|
2015-08-23 16:35:46 -04:00 |
|
Al
|
f6d84531bc
|
[languages] If a non-Latin script in a string would prohibit the found language, return ambiguous. Adding some test cases for sanity checking the labeling
|
2015-08-23 16:34:26 -04:00 |
|
Al
|
b8e4c19146
|
[mv] Moving the get regional/country languages logic out of language polygons
|
2015-08-23 14:25:33 -04:00 |
|
Al
|
43178747f8
|
[languages] Using stopwords only to account for how ambiguous a phrase is, not for disambiguation
|
2015-08-23 04:28:44 -04:00 |
|
Al
|
d8763e9d6c
|
[languages] Adding non-canonicals only for streets, prefixes and suffixes. Better handling of default langauges, abbreviations and ambiguity
|
2015-08-23 03:42:24 -04:00 |
|
Al
|
9c176961ff
|
[dictionaries] Norwegian street types from the suffix dictionary
|
2015-08-23 02:32:44 -04:00 |
|
Al
|
122a81b610
|
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
|
2015-08-23 02:26:06 -04:00 |
|
Al
|
a419dad630
|
[languages] Adding canonical back in to language disambiguation (for prefixes/suffixes too), using non-canonicals/abbreviations in non-default languages if there are no other abbreviations found, adding in stopwords dictionaries
|
2015-08-23 00:43:37 -04:00 |
|
Al
|
a7d9cc1782
|
[fix] No longer using abbreviations for default languages, can be stopwords, etc.
|
2015-08-22 23:34:15 -04:00 |
|
Al
|
0701bb6f08
|
[fix] import
|
2015-08-22 23:19:43 -04:00 |
|
Al
|
723058886a
|
[languages] Disambiguation uses language defaults, unicode normalized canonicals are treated as canonicals
|
2015-08-22 23:18:09 -04:00 |
|
Al
|
6231e17f2b
|
[languages] Disambiguation in language labeling better handles default languages and only uses canonical forms for non-default languages
|
2015-08-22 20:26:39 -04:00 |
|
Al
|
bf829f7cb6
|
[polygons] Adding a main to generate language polygons
|
2015-08-22 17:45:04 -04:00 |
|
Al
|
5c15c4a99f
|
[languages] Adding non-default Spanish and French gazetteers to the US, and giving the country of Jersey shared English/French defaults instead of just English
|
2015-08-22 15:21:04 -04:00 |
|
Al
|
e70c2453ee
|
[fix] import
|
2015-08-22 15:04:30 -04:00 |
|
Al
|
3902715258
|
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
|
2015-08-22 14:11:49 -04:00 |
|
Al
|
f6e521e3f3
|
[geonames] Adding covering index to geonames DB
|
2015-08-22 13:54:25 -04:00 |
|
Al
|
bd31dc99f2
|
[mv] csv_utils
|
2015-08-22 13:53:44 -04:00 |
|
Al
|
cc43409b72
|
[languages] Adding English gazetteers to many countries where the default language is Arabic but the road signs may be in English
|
2015-08-22 13:42:31 -04:00 |
|
Al
|
c5a9c392d4
|
[languages] Refactorying street_types_gazetteer a bit so dictionaries are configurable
|
2015-08-21 09:23:05 -04:00 |
|
Al
|
baa60aab65
|
[fix] language dismabiguation module
|
2015-08-21 08:03:20 -04:00 |
|
Al
|
4976be64e5
|
[fix] var name
|
2015-08-21 08:02:26 -04:00 |
|
Al
|
8e56568cab
|
[fix] typo
|
2015-08-21 08:01:49 -04:00 |
|
Al
|
ca6d802a43
|
[languages] Moving language id methods into a separate package
|
2015-08-21 08:00:56 -04:00 |
|
Al
|
9d2f7e4bd1
|
[fix] var name
|
2015-08-18 16:20:12 -04:00 |
|
Al
|
0528d1b578
|
[osm] OSM untagged formatted addresses try to use language namespaced tags
|
2015-08-18 16:18:27 -04:00 |
|
Al
|
330002197a
|
[fix] via in English is a stopword, not a street type
|
2015-08-18 16:00:48 -04:00 |
|
Al
|
c09cb4dd82
|
[osm] OSM untagged formatted addresses now use the new language labeling scheme
|
2015-08-18 15:13:10 -04:00 |
|
Al
|
3daba2ddcd
|
[fix] removing debug print
|
2015-08-18 13:22:48 -04:00 |
|
Al
|
089a197155
|
[dictionaries] Updates to Galician and Catalan where they overlap with Spanish
|
2015-08-18 13:14:21 -04:00 |
|
Al
|
faf3435ffc
|
[fix] English dictionaries
|
2015-08-18 12:40:09 -04:00 |
|
Al
|
9183ba4e01
|
[dictionaries] Accented Gran Via for Catalan
|
2015-08-18 12:39:40 -04:00 |
|
Al
|
07b43e524e
|
[dictionaries] A few more Catalan terms that are the same as in Spanish
|
2015-08-18 12:23:11 -04:00 |
|
Al
|
ffe76f0403
|
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
|
2015-08-18 12:10:06 -04:00 |
|
Al
|
3b55b51ef1
|
[fix] English dictionary
|
2015-08-18 11:34:18 -04:00 |
|
Al
|
0e00625dbd
|
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
|
2015-08-18 11:12:27 -04:00 |
|
Al
|
fb7f2999e5
|
[dictionaries] Moving a few terms in German dictionaries
|
2015-08-18 11:06:53 -04:00 |
|
Al
|
c5d14e9c4d
|
[dictionaries] A few new terms in Dutch dictionaries to help distinguish from German
|
2015-08-18 11:06:10 -04:00 |
|
Al
|
4d115fdd88
|
[dictionaries] Better categorization of French dictionaries
|
2015-08-18 11:05:39 -04:00 |
|
Al
|
0f883a8872
|
[dictionaries] A few English dictionary terms that came up in language detection tests
|
2015-08-18 11:04:53 -04:00 |
|
Al
|
db7ffa7cab
|
[dictionaries] Updating Catalan dictionaries with place types to help distinguish from Spanish
|
2015-08-18 11:03:44 -04:00 |
|