Al
|
beec43fe15
|
[expansion] regenerating expansion data
|
2015-12-08 15:28:54 -05:00 |
|
Al
|
35db855819
|
[fix] canonical index in address expansion data, should be -1 for all canonical phrases
|
2015-12-08 15:09:51 -05:00 |
|
Al
|
e1ea2ac704
|
[expansion] Toponym dictionaries can apply to street names and place names
|
2015-12-08 02:10:22 -05:00 |
|
Al
|
bfc517ae42
|
[fix] Belgium districts
|
2015-12-07 22:11:11 -05:00 |
|
Al
|
cbe5cd7429
|
[expansion] The ambiguous expansions dictionary shouldn't add to the component bitset
|
2015-12-07 20:36:56 -05:00 |
|
Al
|
d35f519629
|
[expansion] Fixing case where non-ideographic tokens like # can potentially be concatenated with surrounding tokens and should normalized with whitespace in between
|
2015-12-07 19:18:46 -05:00 |
|
Al
|
f5739dd42b
|
[math] Signatures for array_exp and array_log
|
2015-12-07 18:10:04 -05:00 |
|
Al
|
0d8d396108
|
[expansion] Fixing cases like ML King where a global (all languages) expansion subsumes the specific language expansion (like English)
|
2015-12-07 18:09:25 -05:00 |
|
Al
|
9bab70909d
|
[numex] Always adding a version of the string without Roman numeral expansion since many times those tokens can be ambiguous
|
2015-12-07 14:29:18 -05:00 |
|
Al
|
f8a3081d0f
|
[fix] city name in OSM formatting
|
2015-12-07 02:33:12 -05:00 |
|
Al
|
a066ee9aad
|
[math] Only reallocate on matrix_resize if needed
|
2015-12-07 01:20:16 -05:00 |
|
Al
|
cfd0dc69f2
|
[parsing] Using the entire phrase as the ith word
|
2015-12-07 01:19:38 -05:00 |
|
Al
|
8186e2606e
|
[dictionaries] Regenerating address expansion data file
|
2015-12-06 16:56:27 -05:00 |
|
Al
|
4dba0c54e4
|
[dictionaries] Adding state abbreviations for US, CA and AU into dictionaries
|
2015-12-06 16:47:36 -05:00 |
|
Al
|
b25a738000
|
[osm] Doing more deduping in the OSM training data to avoid confusing the parser when city, state, district all have the same name
|
2015-12-06 16:14:02 -05:00 |
|
Al
|
44f7fd0844
|
[math] Matrix resize
|
2015-12-06 03:20:03 -05:00 |
|
Al
|
dd8f8b4d7b
|
[fix] prefix/suffix regexes
|
2015-12-05 18:41:22 -05:00 |
|
Al
|
5fcb6d2c30
|
[fix] typo
|
2015-12-05 16:23:58 -05:00 |
|
Al
|
3a7ba0288f
|
[fix] .get
|
2015-12-05 16:13:15 -05:00 |
|
Al
|
c92a6de477
|
[fix] name
|
2015-12-05 15:49:50 -05:00 |
|
Al
|
2a4210f93f
|
[osm] Stripping standard city prefixes/suffies e.g. Township of
|
2015-12-05 15:42:22 -05:00 |
|
Al
|
596c5ffdd3
|
[fix] Tokenized trie search
|
2015-12-05 15:21:52 -05:00 |
|
Al
|
24208c209f
|
[parsing] Adding a training data derived index of complete phrases from suburb up to country. Only adding bias and word features for non phrases, using UNKNOWN_WORD and UNKNOWN_NUMERIC for infrequent tokens (not meeting minimum vocab count threshold).
|
2015-12-05 14:34:19 -05:00 |
|
Al
|
f41158b8b3
|
[osm] Avoid using the alternate name (e.g. Brooklyn instead of Kings County) when it is the same as city
|
2015-12-05 14:21:07 -05:00 |
|
Al
|
7c26317903
|
[fix] osm components
|
2015-12-03 19:30:15 -05:00 |
|
Al
|
42a8890652
|
[osm] Only removing local language city if there are prior components from OSM
|
2015-12-03 19:11:03 -05:00 |
|
Al
|
ab0a4e622d
|
[formatting] Switching back over to OpenCageData
|
2015-12-03 18:03:21 -05:00 |
|
Al
|
5af95ee613
|
[osm] Adding GeoNames abbreviated city names in a small percentage of cases to get variations like NYC, BK, SF, etc. in the training data
|
2015-12-03 18:00:05 -05:00 |
|
Al
|
25e89bcc41
|
[fix] tokenized trie search edge case where tail is stored on the space node
|
2015-12-03 12:25:21 -05:00 |
|
Al
|
218361f43f
|
[osm] Removing multilinestring boundaries from OSM polygon index (often partial boundaries e.g. France-Germany)
|
2015-12-03 00:51:09 -05:00 |
|
Al
|
43287db90a
|
[normalization/phrases] Fixing a bug which occurs with an already-separated elision
|
2015-12-02 16:04:39 -05:00 |
|
Al
|
87c04b4d37
|
[fix] path in setup.py
|
2015-12-02 14:22:11 -05:00 |
|
Al
|
09a3e2ab64
|
[fix] pip install command
|
2015-12-02 13:43:57 -05:00 |
|
Al
|
746b5d0f34
|
[fix] transliterate using string_equals
|
2015-12-02 13:09:43 -05:00 |
|
Al
|
d0aaff1482
|
[utils] string_equals with NULL check
|
2015-12-01 13:12:08 -05:00 |
|
Al
|
f322ae0a1c
|
[build] adding shuffle.c to Makefile rule
|
2015-12-01 11:28:33 -05:00 |
|
Al
|
b94264b745
|
[parser] Forgot to add shuffle.h/.c
|
2015-12-01 11:25:28 -05:00 |
|
Al
|
116fe857db
|
[parser] gshuf (Mac equivalent of shuf) is quite a bit slower than shuf, so removing it. Need to train on Linux unless a better alternative is found for shuffling large files on Mac
|
2015-12-01 11:24:44 -05:00 |
|
Al
|
8484d4fffd
|
[fix] venue names should be removed probabilistically in the training data, giving neighborhoods a slightly better chance of being included
|
2015-11-30 23:28:12 -05:00 |
|
Al
|
6ef40c1769
|
[fix] dupe checking
|
2015-11-30 18:43:11 -05:00 |
|
Al
|
af170de019
|
[fix] Smaller probabilities on adding neighborhoods and admin polygons, eliminating duplicates on the row level
|
2015-11-30 18:35:31 -05:00 |
|
Al
|
621fd79002
|
[fix] var
|
2015-11-30 18:20:26 -05:00 |
|
Al
|
b430fb7657
|
[osm/formatting] Adding pick random name logic to neighborhoods as well, getting rid of drop probabilities as they're covered elsewhere, adding several forms of venue names to the training data
|
2015-11-30 18:10:18 -05:00 |
|
Al
|
d4b6450f19
|
[formatting] Not applying template replacements from address formatting by default
|
2015-11-30 16:11:13 -05:00 |
|
Al
|
839a12b212
|
[osm/formatting] Changing drop probabilities and doing it in random order
|
2015-11-30 15:27:35 -05:00 |
|
Al
|
5f13041140
|
[parsing/build] Makefile changes for address parser
|
2015-11-30 14:51:43 -05:00 |
|
Al
|
4ca911baf8
|
[parsing] Adding a command-line client (with history) to test address parsing
|
2015-11-30 14:51:01 -05:00 |
|
Al
|
89677d94a3
|
[parsing] Initial commit of the address parser, training/testing, feature function, I/O
|
2015-11-30 14:48:13 -05:00 |
|
Al
|
e62eb1e697
|
[math] Matrix file I/O
|
2015-11-30 12:53:18 -05:00 |
|
Al
|
5682c347ac
|
[fix] close file handle
|
2015-11-30 12:51:13 -05:00 |
|