Al
|
ca25b48687
|
[fix] Not writing empty fields in formatted addresses
|
2015-09-22 08:13:55 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|
Al
|
e566063343
|
[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
|
2015-09-08 09:18:08 -07:00 |
|
Al
|
8525529968
|
[osm] Not requiring qualified name tags to process OSM toponyms
|
2015-09-06 21:03:01 -07:00 |
|
Al
|
df20e2cbc0
|
[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
|
2015-09-04 14:13:33 -04:00 |
|
Al
|
17fcfa8b59
|
[fix] adding house to ignore keys rather than aliasing it
|
2015-09-04 12:40:08 -04:00 |
|
Al
|
d64a27bc57
|
[osm] Converting relations to nodes in borders training data
|
2015-09-04 12:32:25 -04:00 |
|
Al
|
168b7f59da
|
[fix] default indices in strip_component
|
2015-09-04 12:29:47 -04:00 |
|
Al
|
64db63e3eb
|
[osm] Removing house tag
|
2015-09-04 12:23:47 -04:00 |
|
Al
|
4ebdca0ea7
|
[fix] var
|
2015-09-03 21:01:20 -04:00 |
|
Al
|
8345afbcd0
|
[fix] exclude country toponyms where the default languages is well represented
|
2015-09-03 20:56:58 -04:00 |
|
Al
|
20bb191624
|
[fix] chaining
|
2015-09-03 20:52:00 -04:00 |
|
Al
|
e7cf5000fe
|
[fix] Exclude polygons with > 1 regional language
|
2015-09-03 20:48:04 -04:00 |
|
Al
|
9a9530c1b9
|
[fix] unqualified names
|
2015-09-03 20:37:22 -04:00 |
|
Al
|
a5fdd911d8
|
[fix] only use name key for default names
|
2015-09-03 20:35:08 -04:00 |
|
Al
|
d8e1432533
|
[osm] Adding unqualified names in single-language countries
|
2015-09-03 20:31:49 -04:00 |
|
Al
|
b15d2d70aa
|
[fix] top language
|
2015-09-03 20:09:46 -04:00 |
|
Al
|
44bf94a158
|
[osm] Better borders training data set (only need the metadata, not the polygons)
|
2015-09-03 20:09:03 -04:00 |
|
Al
|
55af9b0a0c
|
[fix] OSM address tagged training data formatting
|
2015-09-03 18:35:19 -04:00 |
|
Al
|
c6bfc0e021
|
[osm] Postponing punctuation stripping until after address template rendering
|
2015-09-03 18:13:41 -04:00 |
|
Al
|
d54fb25e45
|
[osm] don't bother with the R-tree check if there are no name:* tags in border data set
|
2015-09-03 17:54:40 -04:00 |
|
Al
|
33af61095b
|
[fix] var
|
2015-09-03 17:49:52 -04:00 |
|
Al
|
294101ad80
|
[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
|
2015-09-03 17:46:57 -04:00 |
|
Al
|
e1e5c16637
|
[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
|
2015-09-03 16:50:30 -04:00 |
|
Al
|
040a26a6f2
|
[fix] import
|
2015-09-03 13:54:23 -04:00 |
|
Al
|
7787427c58
|
[fix] typo
|
2015-09-03 13:53:18 -04:00 |
|
Al
|
23633e95dd
|
[osm] Only adding country default language toponyms to training data
|
2015-09-03 13:44:41 -04:00 |
|
Al
|
11c01f64d2
|
[osm] OrderedDict of attrs in OSM training data
|
2015-09-03 11:11:18 -04:00 |
|
Al
|
27eb4e4aed
|
[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
|
2015-09-03 10:19:11 -04:00 |
|
Al
|
db57855c95
|
[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
|
2015-09-03 10:06:54 -04:00 |
|
Al
|
a2ec8001b0
|
[osm] Removing postal code keys in formatted language training data
|
2015-08-24 14:08:36 -04:00 |
|
Al
|
e70c2453ee
|
[fix] import
|
2015-08-22 15:04:30 -04:00 |
|
Al
|
3902715258
|
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
|
2015-08-22 14:11:49 -04:00 |
|
Al
|
4976be64e5
|
[fix] var name
|
2015-08-21 08:02:26 -04:00 |
|
Al
|
8e56568cab
|
[fix] typo
|
2015-08-21 08:01:49 -04:00 |
|
Al
|
ca6d802a43
|
[languages] Moving language id methods into a separate package
|
2015-08-21 08:00:56 -04:00 |
|
Al
|
9d2f7e4bd1
|
[fix] var name
|
2015-08-18 16:20:12 -04:00 |
|
Al
|
0528d1b578
|
[osm] OSM untagged formatted addresses try to use language namespaced tags
|
2015-08-18 16:18:27 -04:00 |
|
Al
|
c09cb4dd82
|
[osm] OSM untagged formatted addresses now use the new language labeling scheme
|
2015-08-18 15:13:10 -04:00 |
|
Al
|
3daba2ddcd
|
[fix] removing debug print
|
2015-08-18 13:22:48 -04:00 |
|
Al
|
ffe76f0403
|
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
|
2015-08-18 12:10:06 -04:00 |
|
Al
|
0e00625dbd
|
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
|
2015-08-18 11:12:27 -04:00 |
|
Al
|
89071ea21a
|
[osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well)
|
2015-08-15 03:25:45 -04:00 |
|
Al
|
c505260912
|
[fix] var name
|
2015-08-15 02:47:31 -04:00 |
|
Al
|
548ce79b99
|
[fix] street addresses by language
|
2015-08-15 02:44:04 -04:00 |
|
Al
|
74a751ce0a
|
[osm] Adding a new OSM training data option for writing out full formatted addresses without place names
|
2015-08-15 02:39:49 -04:00 |
|
Al
|
0e92abd53e
|
[osm] Adding building tag to venues training set construction
|
2015-08-14 21:07:07 -04:00 |
|