Iestyn Pryce
|
ecd07b18c1
|
Fix log_* formats which expect size_t but receive uint32_t.
|
2017-05-19 22:31:56 +01:00 |
|
Al
|
caebf4e2c9
|
[classification] correcting cost functions in SGD and FTRL for use in parameter sweeps
|
2017-04-05 14:18:13 -04:00 |
|
Al
|
a2563a4dcd
|
[optimization] new sgd_trainer struct to manage weights in stochastic gradient descent, allows L1 or L2 regularization, cumulative penalties instead of exponential decay, SGD using L1 regularization encouraged sparsity and can produce a sparse matrix after training rather than a dense one
|
2017-04-02 13:44:59 -04:00 |
|
Al
|
46cd725c13
|
[math] Generic dense matrix implementation using BLAS calls for matrix-matrix multiplication if available
|
2016-08-06 00:40:01 -04:00 |
|
Al
|
7d727fc8f0
|
[optimization] Using adapted learning rate in stochastic gradient descent (if lambda > 0)
|
2016-01-17 20:59:47 -05:00 |
|
Al
|
622dc354e7
|
[optimization] Adding learning rate to lazy sparse update in stochastic gradient descent
|
2016-01-12 11:04:16 -05:00 |
|
Al
|
7cc201dec3
|
[optimization] Moving gamma_t calculation to the header in SGD
|
2016-01-11 16:40:50 -05:00 |
|
Al
|
b85e454a58
|
[fix] var
|
2016-01-09 03:43:53 -05:00 |
|
Al
|
62017fd33d
|
[optimization] Using sparse updates in stochastic gradient descent. Decomposing the updates into the gradient of the loss function (zero for features not observed in the current batch) and the gradient of the regularization term. The derivative of the regularization term in L2-regularized models is equivalent to an exponential decay function. Before computing the gradient for the current batch, we bring the weights up to date only for the features observed in that batch, and update only those values
|
2016-01-09 03:37:31 -05:00 |
|
Al
|
8b70529711
|
[optimization] Stochastic gradient descent with gain schedule a la Leon Bottou
|
2016-01-08 00:54:17 -05:00 |
|