Al
|
182d60b623
|
[fix] removing include
|
2017-02-23 22:45:03 -05:00 |
|
Al
|
6a079e86b3
|
[fix] using size_t instead of int in address_parser/address_parser_train
|
2017-02-20 19:22:13 -08:00 |
|
Al
|
8ea5405c20
|
[parser] using separate arrays for features requiring tag history and making the tagger responsible for those features so the feature function does not require passing in prev and prev2 explicitly (i.e. don't need to run the feature function multiple times if using global best-sequence prediction)
|
2017-02-19 14:21:58 -08:00 |
|
Al
|
ba0ccc82a3
|
[fix] var name in address_parser_train
|
2017-02-15 22:22:33 -05:00 |
|
Al
|
ff245d74f8
|
[parser] building an index of postal codes and their valid admin contexts (city, state, country, etc.) during training e.g. "11216" => ["brooklyn", "ny"]. Postal code phrases like CP in Spanish are removed when constructing the index.
|
2017-02-10 00:50:48 -05:00 |
|
Al
|
174529e8d0
|
[parser] remove geodb and fix small memory leak in address_parser_train
|
2016-12-29 02:12:06 -05:00 |
|
Al
|
4677874610
|
[parser] stripping postal codes of phrases like CP (in Spanish) before adding them to the gazetteers, whether it's concatenated or a separate token. Adding a command-line argument for the number of iterations
|
2016-11-30 15:58:03 -08:00 |
|
Al
|
1b09b7f2e5
|
[fix] Adding country_region to address_parser_train
|
2016-07-28 16:18:32 -04:00 |
|
Al
|
44908ff95a
|
[parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
16501aba17
|
[fix] Need to load transliteration module for Latin-ASCII normalization
|
2016-07-21 17:04:57 -04:00 |
|
Al
|
6ef7c90278
|
[fix] using string_equals, handles NULLs
|
2016-01-05 14:08:10 -05:00 |
|
Al
|
24208c209f
|
[parsing] Adding a training data derived index of complete phrases from suburb up to country. Only adding bias and word features for non phrases, using UNKNOWN_WORD and UNKNOWN_NUMERIC for infrequent tokens (not meeting minimum vocab count threshold).
|
2015-12-05 14:34:19 -05:00 |
|
Al
|
116fe857db
|
[parser] gshuf (Mac equivalent of shuf) is quite a bit slower than shuf, so removing it. Need to train on Linux unless a better alternative is found for shuffling large files on Mac
|
2015-12-01 11:24:44 -05:00 |
|
Al
|
89677d94a3
|
[parsing] Initial commit of the address parser, training/testing, feature function, I/O
|
2015-11-30 14:48:13 -05:00 |
|