Al
|
122a81b610
|
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
|
2015-08-23 02:26:06 -04:00 |
|
Al
|
a419dad630
|
[languages] Adding canonical back in to language disambiguation (for prefixes/suffixes too), using non-canonicals/abbreviations in non-default languages if there are no other abbreviations found, adding in stopwords dictionaries
|
2015-08-23 00:43:37 -04:00 |
|
Al
|
a7d9cc1782
|
[fix] No longer using abbreviations for default languages, can be stopwords, etc.
|
2015-08-22 23:34:15 -04:00 |
|
Al
|
0701bb6f08
|
[fix] import
|
2015-08-22 23:19:43 -04:00 |
|
Al
|
723058886a
|
[languages] Disambiguation uses language defaults, unicode normalized canonicals are treated as canonicals
|
2015-08-22 23:18:09 -04:00 |
|
Al
|
6231e17f2b
|
[languages] Disambiguation in language labeling better handles default languages and only uses canonical forms for non-default languages
|
2015-08-22 20:26:39 -04:00 |
|
Al
|
bf829f7cb6
|
[polygons] Adding a main to generate language polygons
|
2015-08-22 17:45:04 -04:00 |
|
Al
|
e70c2453ee
|
[fix] import
|
2015-08-22 15:04:30 -04:00 |
|
Al
|
3902715258
|
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
|
2015-08-22 14:11:49 -04:00 |
|
Al
|
f6e521e3f3
|
[geonames] Adding covering index to geonames DB
|
2015-08-22 13:54:25 -04:00 |
|
Al
|
bd31dc99f2
|
[mv] csv_utils
|
2015-08-22 13:53:44 -04:00 |
|
Al
|
c5a9c392d4
|
[languages] Refactorying street_types_gazetteer a bit so dictionaries are configurable
|
2015-08-21 09:23:05 -04:00 |
|
Al
|
baa60aab65
|
[fix] language dismabiguation module
|
2015-08-21 08:03:20 -04:00 |
|
Al
|
4976be64e5
|
[fix] var name
|
2015-08-21 08:02:26 -04:00 |
|
Al
|
8e56568cab
|
[fix] typo
|
2015-08-21 08:01:49 -04:00 |
|
Al
|
ca6d802a43
|
[languages] Moving language id methods into a separate package
|
2015-08-21 08:00:56 -04:00 |
|
Al
|
9d2f7e4bd1
|
[fix] var name
|
2015-08-18 16:20:12 -04:00 |
|
Al
|
0528d1b578
|
[osm] OSM untagged formatted addresses try to use language namespaced tags
|
2015-08-18 16:18:27 -04:00 |
|
Al
|
c09cb4dd82
|
[osm] OSM untagged formatted addresses now use the new language labeling scheme
|
2015-08-18 15:13:10 -04:00 |
|
Al
|
3daba2ddcd
|
[fix] removing debug print
|
2015-08-18 13:22:48 -04:00 |
|
Al
|
ffe76f0403
|
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
|
2015-08-18 12:10:06 -04:00 |
|
Al
|
0e00625dbd
|
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
|
2015-08-18 11:12:27 -04:00 |
|
Al
|
b72d9af7dc
|
[fix] items
|
2015-08-18 04:17:34 -04:00 |
|
Al
|
f3bb3c8356
|
[fix] getter
|
2015-08-18 04:13:19 -04:00 |
|
Al
|
ebd5e96bd7
|
[fix] name
|
2015-08-18 04:05:04 -04:00 |
|
Al
|
b5be1e8df5
|
[fix] var name
|
2015-08-18 03:56:23 -04:00 |
|
Al
|
e84f932042
|
[fix] language polys
|
2015-08-18 03:51:30 -04:00 |
|
Al
|
bada7fd13b
|
[polygons] Changes to languages polygons to support new regional language handling
|
2015-08-18 03:27:11 -04:00 |
|
Al
|
d97c725bbc
|
[languages] Allowing specification of multiple regional languages
|
2015-08-18 03:18:52 -04:00 |
|
Al
|
89071ea21a
|
[osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well)
|
2015-08-15 03:25:45 -04:00 |
|
Al
|
c505260912
|
[fix] var name
|
2015-08-15 02:47:31 -04:00 |
|
Al
|
548ce79b99
|
[fix] street addresses by language
|
2015-08-15 02:44:04 -04:00 |
|
Al
|
74a751ce0a
|
[osm] Adding a new OSM training data option for writing out full formatted addresses without place names
|
2015-08-15 02:39:49 -04:00 |
|
Al
|
05b8f555d5
|
[fix] language polygon index
|
2015-08-14 21:22:15 -04:00 |
|
Al
|
0e92abd53e
|
[osm] Adding building tag to venues training set construction
|
2015-08-14 21:07:07 -04:00 |
|
Al
|
191c0e3ce5
|
[languages] Changing Bonaire's default road sign language to Papiamento to help distinguish from Dutch
|
2015-08-14 21:06:16 -04:00 |
|
Al
|
cad1f95bbb
|
[osm] Making minimal_only the default in formatted addresses, expanding list of acceptable combinations of address fields
|
2015-08-14 10:21:17 -04:00 |
|
Al
|
1e936ac9dc
|
[fix] road+house_number as minimal keys for formatting addresses
|
2015-08-14 04:09:51 -04:00 |
|
Al
|
83bbd67c9c
|
[fix] param
|
2015-08-14 00:57:17 -04:00 |
|
Al
|
e993ddcb51
|
[fix] splitter
|
2015-08-14 00:54:06 -04:00 |
|
Al
|
dc2766ae5d
|
[fix] __init__
|
2015-08-14 00:49:06 -04:00 |
|
Al
|
62c67aa970
|
[osm] Using pipe splitter for address components
|
2015-08-14 00:45:49 -04:00 |
|
Al
|
2bd763be03
|
[osm] Prefer amenity tag, skip if the building tag is simply building=yes
|
2015-08-13 21:16:34 -04:00 |
|
Al
|
c844d0484a
|
[fix] carriage returns
|
2015-08-13 21:07:12 -04:00 |
|
Al
|
ef14aa2b7e
|
[osm] Replacing escape chars at write time as there's no quoting, adding building key to venue training data
|
2015-08-13 19:30:44 -04:00 |
|
Al
|
9125f07af0
|
[polygons] Separating out simplify polygon into a method in RTree index
|
2015-08-13 18:43:35 -04:00 |
|
Al
|
46f2c68a69
|
[osm] Using tsv_no_quote writers in all OSM training data files
|
2015-08-13 18:40:41 -04:00 |
|
Al
|
88d63c85d2
|
[utils] no-quote CSV dialect
|
2015-08-13 18:26:51 -04:00 |
|
Al
|
03febc7e20
|
[scripts] Better script code aliasing
|
2015-08-13 18:25:55 -04:00 |
|
Al
|
b54ff95ecc
|
[mv] csv_utils
|
2015-08-13 18:19:54 -04:00 |
|