libpostal

Author	SHA1	Message	Date
Al	7d727fc8f0	[optimization] Using adapted learning rate in stochastic gradient descent (if lambda > 0)	2016-01-17 20:59:47 -05:00
Al	622dc354e7	[optimization] Adding learning rate to lazy sparse update in stochastic gradient descent	2016-01-12 11:04:16 -05:00
Al	7cc201dec3	[optimization] Moving gamma_t calculation to the header in SGD	2016-01-11 16:40:50 -05:00
Al	b85e454a58	[fix] var	2016-01-09 03:43:53 -05:00
Al	62017fd33d	[optimization] Using sparse updates in stochastic gradient descent. Decomposing the updates into the gradient of the loss function (zero for features not observed in the current batch) and the gradient of the regularization term. The derivative of the regularization term in L2-regularized models is equivalent to an exponential decay function. Before computing the gradient for the current batch, we bring the weights up to date only for the features observed in that batch, and update only those values	2016-01-09 03:37:31 -05:00
Al	8b70529711	[optimization] Stochastic gradient descent with gain schedule a la Leon Bottou	2016-01-08 00:54:17 -05:00