Al
|
747de1944b
|
[fix] Accounting for unknown scripts in disambiguation
|
2015-09-21 18:05:28 -04:00 |
|
Al
|
134cf616d6
|
[osm] Using street for language disambiguation in training data
|
2015-09-21 04:09:15 -04:00 |
|
Al
|
84cf21df88
|
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples
|
2015-09-20 20:05:46 -04:00 |
|
Al
|
6731395ca0
|
[osm] Separating tagged from untagged output
|
2015-09-19 14:11:47 -04:00 |
|
Al
|
35f1c02caf
|
[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately
|
2015-09-10 12:44:13 -07:00 |
|
Al
|
440a8158b6
|
[polygons] Adding in country languages for regional polygons without a default language
|
2015-09-10 12:34:26 -07:00 |
|
Al
|
fca7f21b1d
|
[polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class
|
2015-09-10 11:06:18 -07:00 |
|
Al
|
b85fe50fad
|
[osm] Training data for toponyms only cares about valid languages for name field
|
2015-09-08 16:38:05 -07:00 |
|
Al
|
e566063343
|
[osm] Doing an all-to-nodes conversion and an additional filter on the borders data set
|
2015-09-08 09:18:08 -07:00 |
|
Al
|
8525529968
|
[osm] Not requiring qualified name tags to process OSM toponyms
|
2015-09-06 21:03:01 -07:00 |
|
Al
|
df20e2cbc0
|
[osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language
|
2015-09-04 14:13:33 -04:00 |
|
Al
|
17fcfa8b59
|
[fix] adding house to ignore keys rather than aliasing it
|
2015-09-04 12:40:08 -04:00 |
|
Al
|
d64a27bc57
|
[osm] Converting relations to nodes in borders training data
|
2015-09-04 12:32:25 -04:00 |
|
Al
|
168b7f59da
|
[fix] default indices in strip_component
|
2015-09-04 12:29:47 -04:00 |
|
Al
|
64db63e3eb
|
[osm] Removing house tag
|
2015-09-04 12:23:47 -04:00 |
|
Al
|
6a20ce5e85
|
[language_id] Adding formatted addresses and toponyms to language training data
|
2015-09-04 01:46:49 -04:00 |
|
Al
|
4ebdca0ea7
|
[fix] var
|
2015-09-03 21:01:20 -04:00 |
|
Al
|
8345afbcd0
|
[fix] exclude country toponyms where the default languages is well represented
|
2015-09-03 20:56:58 -04:00 |
|
Al
|
20bb191624
|
[fix] chaining
|
2015-09-03 20:52:00 -04:00 |
|
Al
|
e7cf5000fe
|
[fix] Exclude polygons with > 1 regional language
|
2015-09-03 20:48:04 -04:00 |
|
Al
|
9a9530c1b9
|
[fix] unqualified names
|
2015-09-03 20:37:22 -04:00 |
|
Al
|
a5fdd911d8
|
[fix] only use name key for default names
|
2015-09-03 20:35:08 -04:00 |
|
Al
|
d8e1432533
|
[osm] Adding unqualified names in single-language countries
|
2015-09-03 20:31:49 -04:00 |
|
Al
|
b15d2d70aa
|
[fix] top language
|
2015-09-03 20:09:46 -04:00 |
|
Al
|
44bf94a158
|
[osm] Better borders training data set (only need the metadata, not the polygons)
|
2015-09-03 20:09:03 -04:00 |
|
Al
|
55af9b0a0c
|
[fix] OSM address tagged training data formatting
|
2015-09-03 18:35:19 -04:00 |
|
Al
|
c6bfc0e021
|
[osm] Postponing punctuation stripping until after address template rendering
|
2015-09-03 18:13:41 -04:00 |
|
Al
|
d54fb25e45
|
[osm] don't bother with the R-tree check if there are no name:* tags in border data set
|
2015-09-03 17:54:40 -04:00 |
|
Al
|
33af61095b
|
[fix] var
|
2015-09-03 17:49:52 -04:00 |
|
Al
|
294101ad80
|
[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
|
2015-09-03 17:46:57 -04:00 |
|
Al
|
e1e5c16637
|
[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
|
2015-09-03 16:50:30 -04:00 |
|
Al
|
040a26a6f2
|
[fix] import
|
2015-09-03 13:54:23 -04:00 |
|
Al
|
7787427c58
|
[fix] typo
|
2015-09-03 13:53:18 -04:00 |
|
Al
|
23633e95dd
|
[osm] Only adding country default language toponyms to training data
|
2015-09-03 13:44:41 -04:00 |
|
Al
|
11c01f64d2
|
[osm] OrderedDict of attrs in OSM training data
|
2015-09-03 11:11:18 -04:00 |
|
Al
|
27eb4e4aed
|
[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
|
2015-09-03 10:19:11 -04:00 |
|
Al
|
db57855c95
|
[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
|
2015-09-03 10:06:54 -04:00 |
|
Al
|
a916668f28
|
[i18n] Local file for ISO 15924
|
2015-09-01 23:58:36 -04:00 |
|
Al
|
a2ec8001b0
|
[osm] Removing postal code keys in formatted language training data
|
2015-08-24 14:08:36 -04:00 |
|
Al
|
8bbcb60aee
|
[languages] Moving search_suffix and search_prefix into methods
|
2015-08-24 14:04:36 -04:00 |
|
Al
|
c68f56e61d
|
[fix] paths
|
2015-08-24 12:58:27 -04:00 |
|
Al
|
d620cb6fc3
|
[fix] Calculating splits in Python rather than bash
|
2015-08-24 12:47:51 -04:00 |
|
Al
|
c754d275af
|
[fix] str
|
2015-08-24 12:24:55 -04:00 |
|
Al
|
96cb289b79
|
[languages] Script to create language training/cross-validation/test data splits
|
2015-08-24 12:18:23 -04:00 |
|
Al
|
fa7b855ecb
|
[languages] Earlier exit on finding ambiguous script spans
|
2015-08-24 03:07:57 -04:00 |
|
Al
|
e1d336716c
|
[languages] Non-default language canonicals, more test cases
|
2015-08-24 02:21:53 -04:00 |
|
Al
|
c1ce91abbf
|
[languages] Better handling of non-default langauge canonicals in default langauge text
|
2015-08-24 01:26:17 -04:00 |
|
Al
|
96d7b990b5
|
[fix] .items()
|
2015-08-23 23:39:30 -04:00 |
|
Al
|
84e0982cbc
|
[languages] Allow stopwords to help disambiguate if they can, otherwise ignore them
|
2015-08-23 23:04:17 -04:00 |
|
Al
|
7053c6b60b
|
[fix] language disambiguation
|
2015-08-23 22:50:27 -04:00 |
|