[optimization] Using sparse updates in stochastic gradient descent. Decomposing the updates into the gradient of the loss function (zero for features not observed in the current batch) and the gradient of the regularization term. The derivative of the regularization term in L2-regularized models is equivalent to an exponential decay function. Before computing the gradient for the current batch, we bring the weights up to date only for the features observed in that batch, and update only those values

2016-01-09 03:12:54 -05:00
parent aa22db11b2
commit 62017fd33d
3 changed files with 120 additions and 8 deletions
--- a/src/logistic_regression.c
+++ b/src/logistic_regression.c
@@ -138,7 +138,7 @@ static bool logistic_regression_gradient_params(matrix_t *theta, matrix_t *gradi
    }


-    // If the vector last_updated was provided, update the only the relevant columns in x
+    // Update the only the relevant columns in x
    if (regularize && x_cols != NULL) {
        size_t batch_rows = x_cols->n;
        uint32_t *cols = x_cols->a;