Al
0ec2e57afa
[fix] adding yaml to requirements-simple.txt for CI
2017-04-05 08:33:39 -04:00
Al
64fae1e241
[fix] /AC_CONFIG_MACRO_DIRS/AC_CONFIG_MACRO_DIR/
2017-04-05 08:27:44 -04:00
Al
2b3fb196a1
[build] add pkg-config to packages in Travis config, remove libsnappy-dev
2017-04-05 08:24:26 -04:00
Al
8cef3c4eb9
[docs] new parser GIF, featuring addresses relevant to current events
2017-04-05 07:21:48 -04:00
Al
aaae1e055e
[docs] fix spacing
2017-04-05 02:03:39 -04:00
Al
9c7eac61eb
[docs] merge README from master, move bindings below examples
2017-04-05 02:02:59 -04:00
Al
8ec6e546f5
[test] adding more tests from the demo
2017-04-04 20:52:28 -04:00
Al
22443e31cc
[parser] removing special commands other than .exit from address_parser_cli
2017-04-04 20:49:37 -04:00
Al
8742574257
[parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time
2017-04-04 20:40:55 -04:00
Al
67157fbd98
[docs] moving blog post to first paragraph
2017-04-03 21:04:37 -04:00
Al
b8f65d0a06
[docs] aesthetic README changes
2017-04-03 18:18:02 -04:00
Al
f746c6eec6
[openaddresses] Sampson and Yadkin counties, NC, and Union County, SC
2017-04-03 18:08:55 -04:00
Al
bca449e653
[openaddresses] Rown County, NC
2017-04-03 17:57:03 -04:00
Al
6102fd3459
[openaddresses] Carteret County, NC
2017-04-03 16:55:21 -04:00
Al
342740c3a6
[openaddresses] Bladen County, NC
2017-04-03 16:53:43 -04:00
Al
7c67ca6edb
[openaddresses] Beaufort County, NC
2017-04-03 16:52:15 -04:00
Al
680a2e6357
[openaddresses] city of Ruidoso, NM
2017-04-03 16:50:27 -04:00
Al
921e635b7a
[openaddresses] add Caddo Parisn, LA
2017-04-03 16:48:30 -04:00
Al
e0dc0c9b86
[openaddresses] add Desoto County, FL
2017-04-03 16:45:56 -04:00
Al
20adc591a8
[openaddresses] adding OSM boundaries to Clear Creek County, CO as new data set doesn't list city
2017-04-03 16:38:53 -04:00
Al
4b16b5bccd
[docs] README fixes
2017-04-03 16:35:48 -04:00
Al
97ffdbaee0
[openaddresses] removing Lawrence County, SD. Covered by new statewide and has some weird addresses
2017-04-03 16:16:52 -04:00
Al
e4290a489f
[openaddresses] Fall River County, SD
2017-04-03 16:15:21 -04:00
Al
c3a6445290
[docs] README updates for 1.0 release, adding training data section
2017-04-03 15:59:01 -04:00
Al
65a0d82bda
[openaddresses] moving Buenos Aires, adding Boulder County, CO
2017-04-03 13:08:34 -04:00
Al
eff7a7a27a
[optimization] moving regularization methods to their own module
2017-04-03 00:16:30 -04:00
Al
957aa0c0c9
[utils] cartesian product iterator for grid search during model selection
2017-04-03 00:15:31 -04:00
Al
4a72afc712
[build] Makefile changes for new language_classifier_train
2017-04-02 23:55:31 -04:00
Al
378a11c88f
[fix] expansion array destroy API in libpostal expand program
2017-04-02 23:55:04 -04:00
Al
c5e2f89ee9
[fix] declaring is_common_script function as static
2017-04-02 23:53:21 -04:00
Al
5dfdd4b7eb
[language_classification] Runtime language classifier can now use dense or sparse weights, with a different header signature for the sparse version (using old signature for the dense version, so backward-compatible)
2017-04-02 23:51:54 -04:00
Al
835d851310
[log] log the offending line if token count does not match in language_classifier_io
2017-04-02 23:47:07 -04:00
Al
964ac15e51
[language_classification] adding options to language_classifier_train for using SGD with {L2, L1} regularization or FTRL-Proximal using both.
...
1. Creates sparse matrix for L1 SGD and FTRL
2. Uses the one standard-error rule during cross-validation.
Parameters within one standard error of the lowest-cost solution
are preferred if they are better regularized.
3. Pulls weights matrix for only the features that occurred
in a given batch. In the case of FTRL, this needs to be computed
each on each batch, so the sparsity helps here.
2017-04-02 23:46:14 -04:00
Al
58661c9f27
[languages] adding replace_hyphens and split_alpha_from_numeric in language classifier input normalization
2017-04-02 23:32:24 -04:00
Al
e4ed759f0d
[math] using new matrix methods in softmax
2017-04-02 23:29:52 -04:00
Al
3aab15a0a0
[math] adding mean, variance and standard deviation to generic vector functions
2017-04-02 23:29:15 -04:00
Al
3cb513a8f2
[utils] hash_get is no longer a string-only function, can be used for generic hashtables
2017-04-02 23:28:17 -04:00
Al
95e39ad91c
[utils] removing default chunk size from address_parser_train
2017-04-02 23:26:51 -04:00
Al
a4431dbb27
[classification] removing regularization update from gradient computation in logistic regression, as that's now handled by the optimizer
2017-04-02 14:32:14 -04:00
Al
64c049730a
[classification] flexible logistic regression trainer that can handle either SGD (with either L1 or L2) or FTRL as optimiers
2017-04-02 14:30:14 -04:00
Al
cf88bc7f65
[optimization] implemented Google's FTRL-Proximal, adapted for the multiclass/multinomial case. It is L1 and L2 regularized, and should both encourage sparsity with the L1 penalty while being robust to collinearity of features due to the L2 penalty. Ref: https://research.google.com/pubs/archive/41159.pdf
2017-04-02 14:28:25 -04:00
Al
ed05aaabb1
[utils] adding default chunk size to shuffle.h
2017-04-02 13:51:45 -04:00
Al
96e1ca5e89
[utils] sparse_matrix_add_unique_columns_alias, adds the actual column indices to hashtable/array and aliases those in the table from 1 to N (where N is the number of unique columns in this batch). This way it's compatible with smaller matrices of batch weights.
2017-04-02 13:48:46 -04:00
Al
a2563a4dcd
[optimization] new sgd_trainer struct to manage weights in stochastic gradient descent, allows L1 or L2 regularization, cumulative penalties instead of exponential decay, SGD using L1 regularization encouraged sparsity and can produce a sparse matrix after training rather than a dense one
2017-04-02 13:44:59 -04:00
Al
19fe084974
[utils] adding non-branching sign functions
2017-04-02 13:41:57 -04:00
Al
74a281e332
[dictionaries] more abbreviations for MLK
2017-04-01 00:54:14 -04:00
Al
7f30fb8e38
[openaddresses] add OSM boundaries to King, NC
2017-03-31 21:13:32 -04:00
Al
b52f137b5d
[openaddresses] adding units to Chelan County, WA, adding Island County, WA
2017-03-31 18:08:43 -04:00
Al
6ec4c1fdc9
[openaddresses] adding units to city of Columbia, MO
2017-03-31 17:44:04 -04:00
Al
f349607412
[openaddresses] adding units in Boone County, MO
2017-03-31 17:27:35 -04:00