Al
|
71c51f2e45
|
[language_classification] Making directory optional on language_classifier client/test program
|
2016-01-27 03:18:53 -05:00 |
|
Al
|
c770468d03
|
[expansion] Regenerated address_expansion_data.c
|
2016-01-27 03:17:59 -05:00 |
|
Al
|
36f52d9707
|
[fix] Removing feature printing
|
2016-01-26 15:34:56 -05:00 |
|
Al
|
239f8adec6
|
[docs] README updates now that the Python repo is separate
|
2016-01-26 02:40:07 -05:00 |
|
Al
|
cffc7e1034
|
[rm] Removing Python bindings from this project, moving to https://github.com/openvenues/pypostal
|
2016-01-26 02:17:23 -05:00 |
|
Al
|
5077462754
|
[fix] temporary files for language classifier training
|
2016-01-26 01:42:21 -05:00 |
|
Al
|
426edccbf8
|
[language_classification] Simple accuracy-based test program for language classifier.
|
2016-01-26 01:29:56 -05:00 |
|
Al
|
9abbf42bf4
|
[language_classifier] Command-line client for language classification
|
2016-01-26 01:20:59 -05:00 |
|
Al
|
314b65e192
|
[build] Adding shuffle.c to language_classifier_train
|
2016-01-26 01:18:35 -05:00 |
|
Al
|
ababb8f2d0
|
[fix] sign comparison in regularized gradient computation for logistic regression
|
2016-01-26 01:16:16 -05:00 |
|
Al
|
ae2b839f17
|
[build] Adding language classifier train/test/cli programs to the build
|
2016-01-26 00:09:07 -05:00 |
|
Al
|
299998d8b5
|
[languages] Making Basque the only default in the Basque region.
|
2016-01-24 19:35:03 -05:00 |
|
Al
|
b4dcb83e10
|
[fix] sets of potential languages in case phrase matches multiple dictionaries
|
2016-01-24 17:57:12 -05:00 |
|
Al
|
b713d102d1
|
[languages] using whole phrase len, not first token, in disambiguation. Using single unambiguous observed default language or unambiguous observed language
|
2016-01-24 17:43:14 -05:00 |
|
Al
|
b3e730d83f
|
[languages] If there's a single default language, assume ambiguous abbreviations are the default
|
2016-01-24 17:15:02 -05:00 |
|
Al
|
fffaeecfc6
|
[languages] Only count regional defaults when returning languages
|
2016-01-24 16:35:14 -05:00 |
|
Al
|
b735c79326
|
[languages] Adding Spanish in as a secondary default in Spain to supplement regional language defaults so we're more careful in disambiguation
|
2016-01-24 16:34:23 -05:00 |
|
Al
|
f8a0463aa0
|
[languages] Language disambiguation treats the national languages as non-default
|
2016-01-24 15:10:04 -05:00 |
|
Al
|
87aff60a7e
|
[dictionaries] Gulch
|
2016-01-24 03:23:40 -05:00 |
|
Al
|
f04360732c
|
[languages] Single character cannot be sufficient to disambiguate with multiple languages (Avenue A for example)
|
2016-01-24 03:17:21 -05:00 |
|
Al
|
cb914ae85b
|
[dictionaries] Adding a few terms to English dictionaries for automated disambiguation in the US/Canada
|
2016-01-24 03:15:10 -05:00 |
|
Al
|
00ce71223f
|
[osm] Using the default probabilities for abbreviations in ways training data
|
2016-01-24 00:53:41 -05:00 |
|
Al
|
bab7a0f961
|
[osm] splitting streets (way names) on semicolons
|
2016-01-24 00:42:25 -05:00 |
|
Al
|
3485738c2b
|
[fix] regional languages in French Canada
|
2016-01-24 00:20:34 -05:00 |
|
Al
|
7646adfc0f
|
[osm] Adding abbreviated street names in addition to the originals
|
2016-01-23 23:23:58 -05:00 |
|
Al
|
67130383ce
|
[fix] converting semicolons to commas in OSM house numbers and picking one at random
|
2016-01-23 23:16:19 -05:00 |
|
Al
|
1bb797f783
|
[fix] spacing in phrases
|
2016-01-23 21:59:49 -05:00 |
|
Al
|
3a8c3dfcf6
|
[fix] spacing in phrases at end of string
|
2016-01-23 21:51:40 -05:00 |
|
Al
|
78450bfad9
|
[fix] Spaces in abbreviation
|
2016-01-23 21:36:20 -05:00 |
|
Al
|
308ceb5a5f
|
[fix] convert UTF8 slices back to unicode before using with the Python trie
|
2016-01-23 20:20:23 -05:00 |
|
Al
|
5eb6bb309b
|
[fix] Only adding whitespace back into tokenized strings during abbreviation if it existed in the original string
|
2016-01-23 20:09:45 -05:00 |
|
Al
|
d61207e95a
|
[fix] var name
|
2016-01-23 18:01:02 -05:00 |
|
Al
|
e44cba1d06
|
[fix] geonames db not required in OSM training data
|
2016-01-23 17:59:55 -05:00 |
|
Al
|
4f03711e60
|
[osm] Adding abbreviated training examples to ways language training data
|
2016-01-23 14:10:47 -05:00 |
|
Al
|
c9fb4ee69d
|
[osm/formatting] Dropping state more often than not, except in the US and Canada where those fields are more commonly used
|
2016-01-22 17:58:24 -05:00 |
|
Al
|
ea9bb3f2d5
|
[fix] Abbreviation probabilities should only apply once, not once per dictionary. Also fixing issues where some of the abbreviations were doubled
|
2016-01-22 15:48:21 -05:00 |
|
Al
|
f9f6558e06
|
[fix] simple whitespace field splits for the limited format training data (used for language classification)
|
2016-01-22 04:34:42 -05:00 |
|
Al
|
cd1db7b288
|
[fix] Making sure rare components are dropped first, adding state and country back in
|
2016-01-22 04:17:19 -05:00 |
|
Al
|
adc3a00264
|
[fix] var name
|
2016-01-22 04:10:16 -05:00 |
|
Al
|
261beffa36
|
[fix] Actually better to remove country and state from rare components and let them use the standard dropout probabilities
|
2016-01-22 04:00:45 -05:00 |
|
Al
|
a6cc3d0114
|
[fix] Adding state to the more frequently dropped components
|
2016-01-22 03:56:38 -05:00 |
|
Al
|
bca3dae004
|
[fix] state full name probabilities for limited vs. full formatted OSM training sets
|
2016-01-22 03:54:20 -05:00 |
|
Al
|
d1cf253092
|
[osm/formatting] Higher probability of dropout for rare components like counties, etc.
|
2016-01-22 03:39:35 -05:00 |
|
Al
|
9dd965a6fa
|
[fix] removing gazetteer configuration from disambiguation module
|
2016-01-22 03:18:18 -05:00 |
|
Al
|
b22646ee30
|
[mv] Moving gazetteers into their own module
|
2016-01-22 03:15:56 -05:00 |
|
Al
|
5a68e7aeef
|
[fix] import
|
2016-01-22 03:00:43 -05:00 |
|
Al
|
6ac72576bc
|
[osm/formatting] Randomly abbreviating street names and venue names using all the available libpostal dictionaries. Refactoring OSM formatting into separate methods which can be individually tested. Adding override for special phrases like UK
|
2016-01-22 02:56:39 -05:00 |
|
Al
|
f4995d4f0f
|
[languages] Adding several different types of dictionaries for name expansion/abbreviation in OSM
|
2016-01-22 00:51:32 -05:00 |
|
Al
|
89aa039692
|
[dictionaries] Adding some Italian month abbreviations
|
2016-01-21 15:12:46 -05:00 |
|
Al
|
26cbb1eb8d
|
[languages] Fixing multiple expansions in the same dictionary for Python trie, adding length for prefixes/suffixes
|
2016-01-21 04:29:14 -05:00 |
|