Al
|
c6bfc0e021
|
[osm] Postponing punctuation stripping until after address template rendering
|
2015-09-03 18:13:41 -04:00 |
|
Al
|
d54fb25e45
|
[osm] don't bother with the R-tree check if there are no name:* tags in border data set
|
2015-09-03 17:54:40 -04:00 |
|
Al
|
33af61095b
|
[fix] var
|
2015-09-03 17:49:52 -04:00 |
|
Al
|
294101ad80
|
[osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma)
|
2015-09-03 17:46:57 -04:00 |
|
Al
|
e1e5c16637
|
[osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity
|
2015-09-03 16:50:30 -04:00 |
|
Al
|
040a26a6f2
|
[fix] import
|
2015-09-03 13:54:23 -04:00 |
|
Al
|
7787427c58
|
[fix] typo
|
2015-09-03 13:53:18 -04:00 |
|
Al
|
23633e95dd
|
[osm] Only adding country default language toponyms to training data
|
2015-09-03 13:44:41 -04:00 |
|
Al
|
11c01f64d2
|
[osm] OrderedDict of attrs in OSM training data
|
2015-09-03 11:11:18 -04:00 |
|
Al
|
27eb4e4aed
|
[osm] Adding a toponym language training set using planet-borders.osm (all admin borders)
|
2015-09-03 10:19:11 -04:00 |
|
Al
|
db57855c95
|
[osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added
|
2015-09-03 10:06:54 -04:00 |
|
Al
|
a916668f28
|
[i18n] Local file for ISO 15924
|
2015-09-01 23:58:36 -04:00 |
|
Al
|
ee4d73c65d
|
[math] sparse matrix I/O methods
|
2015-09-01 00:29:11 -04:00 |
|
Al
|
a8f6617294
|
[phrases] Adding num_keys attribute to trie
|
2015-08-31 21:41:34 -04:00 |
|
Al
|
aac5b37e76
|
[fix] Removing default dirent include
|
2015-08-31 21:38:29 -04:00 |
|
Al
|
bb50c7ea2c
|
[math] Adding sigmoid and softmax functions
|
2015-08-31 21:04:21 -04:00 |
|
Al
|
a090a22bca
|
[math] Adding compressed sparse row (CSR) format sparse matrix, designed for dynamic construction, just the methods needed for logistic regression for now i.e. no sparse dot products
|
2015-08-31 16:42:41 -04:00 |
|
Al
|
0f617454d3
|
[math] Dense matrices
|
2015-08-31 14:57:11 -04:00 |
|
Al
|
0ee72b8dfb
|
[math] can only use memset for *_array_new_zeros
|
2015-08-31 14:44:43 -04:00 |
|
Al
|
c566eaecf1
|
[dictionaries] Rebuilding address expansion data and uploading new files to S3
|
2015-08-31 14:33:28 -04:00 |
|
Al
|
789150ae33
|
[math] Using regular C arrays instead of vectors for vector_math.h
|
2015-08-30 02:41:31 -04:00 |
|
Al
|
07b0bed602
|
[math] Only float vectors have *_array_log, *_array_exp, etc.
|
2015-08-26 17:58:07 -04:00 |
|
Al
|
a2ec8001b0
|
[osm] Removing postal code keys in formatted language training data
|
2015-08-24 14:08:36 -04:00 |
|
Al
|
8bbcb60aee
|
[languages] Moving search_suffix and search_prefix into methods
|
2015-08-24 14:04:36 -04:00 |
|
Al
|
c68f56e61d
|
[fix] paths
|
2015-08-24 12:58:27 -04:00 |
|
Al
|
d620cb6fc3
|
[fix] Calculating splits in Python rather than bash
|
2015-08-24 12:47:51 -04:00 |
|
Al
|
c754d275af
|
[fix] str
|
2015-08-24 12:24:55 -04:00 |
|
Al
|
96cb289b79
|
[languages] Script to create language training/cross-validation/test data splits
|
2015-08-24 12:18:23 -04:00 |
|
Al
|
fa7b855ecb
|
[languages] Earlier exit on finding ambiguous script spans
|
2015-08-24 03:07:57 -04:00 |
|
Al
|
90f333b16c
|
[languages] Adding English non-default dictionaries to a number of countries where English can be found in OSM
|
2015-08-24 02:49:49 -04:00 |
|
Al
|
e1d336716c
|
[languages] Non-default language canonicals, more test cases
|
2015-08-24 02:21:53 -04:00 |
|
Al
|
c1ce91abbf
|
[languages] Better handling of non-default langauge canonicals in default langauge text
|
2015-08-24 01:26:17 -04:00 |
|
Al
|
96d7b990b5
|
[fix] .items()
|
2015-08-23 23:39:30 -04:00 |
|
Al
|
9f6f4feea1
|
[dictionaries/languages] Adding English gazetteers for Bahrain, pas abbreviation for paseo
|
2015-08-23 23:32:34 -04:00 |
|
Al
|
84e0982cbc
|
[languages] Allow stopwords to help disambiguate if they can, otherwise ignore them
|
2015-08-23 23:04:17 -04:00 |
|
Al
|
d14be57e73
|
[dictionaries] Adding exit as an English street type
|
2015-08-23 22:51:22 -04:00 |
|
Al
|
7053c6b60b
|
[fix] language disambiguation
|
2015-08-23 22:50:27 -04:00 |
|
Al
|
e26776a5e9
|
[dictionaries] Occitan stopwords for disambiguating from French
|
2015-08-23 16:35:46 -04:00 |
|
Al
|
f6d84531bc
|
[languages] If a non-Latin script in a string would prohibit the found language, return ambiguous. Adding some test cases for sanity checking the labeling
|
2015-08-23 16:34:26 -04:00 |
|
Al
|
b8e4c19146
|
[mv] Moving the get regional/country languages logic out of language polygons
|
2015-08-23 14:25:33 -04:00 |
|
Al
|
43178747f8
|
[languages] Using stopwords only to account for how ambiguous a phrase is, not for disambiguation
|
2015-08-23 04:28:44 -04:00 |
|
Al
|
d8763e9d6c
|
[languages] Adding non-canonicals only for streets, prefixes and suffixes. Better handling of default langauges, abbreviations and ambiguity
|
2015-08-23 03:42:24 -04:00 |
|
Al
|
9c176961ff
|
[dictionaries] Norwegian street types from the suffix dictionary
|
2015-08-23 02:32:44 -04:00 |
|
Al
|
122a81b610
|
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
|
2015-08-23 02:26:06 -04:00 |
|
Al
|
a419dad630
|
[languages] Adding canonical back in to language disambiguation (for prefixes/suffixes too), using non-canonicals/abbreviations in non-default languages if there are no other abbreviations found, adding in stopwords dictionaries
|
2015-08-23 00:43:37 -04:00 |
|
Al
|
a7d9cc1782
|
[fix] No longer using abbreviations for default languages, can be stopwords, etc.
|
2015-08-22 23:34:15 -04:00 |
|
Al
|
0701bb6f08
|
[fix] import
|
2015-08-22 23:19:43 -04:00 |
|
Al
|
723058886a
|
[languages] Disambiguation uses language defaults, unicode normalized canonicals are treated as canonicals
|
2015-08-22 23:18:09 -04:00 |
|
Al
|
6231e17f2b
|
[languages] Disambiguation in language labeling better handles default languages and only uses canonical forms for non-default languages
|
2015-08-22 20:26:39 -04:00 |
|
Al
|
bf829f7cb6
|
[polygons] Adding a main to generate language polygons
|
2015-08-22 17:45:04 -04:00 |
|