[parser] learning a sparser averaged perceptron model for the parser using the following method:

- store a vector of update counts for each feature in the model - when the model updates after making a mistake, increment the update counters for the observed features in that example - after the model is finished training, keep only the features that participated in a minimum number of updates This method is described in greater detail in this paper from Yoav Goldberg: https://www.cs.bgu.ac.il/~yoavg/publications/acl2011sparse.pdf The authors there report a 4x size reduction at only a trivial cost in terms of accuracy. So far the trials on libpostal indicate roughly the same, though at lower training set sizes the accuracy cost is greater. This method is more effective than simple feature pruning as feature pruning methods are usually based on the frequency of the feature in the training set, and infrequent features can still be important. However, the perceptron's early iterations make many updates on irrelevant featuers simply because the weights for the more relevant features aren't tuned yet. The number of updates a feature participates in can be seen as a measure of its relevance to classifying examples. This commit introduces --min-features option to address_parser_train (default=5), so it can effectively be turned off by using "--min-features 0" or "--min-features 1".
2017-03-06 21:56:10 -05:00
parent 5c1c1ae0f2
commit 95015990ab
3 changed files with 104 additions and 18 deletions
--- a/src/averaged_perceptron_trainer.h
+++ b/src/averaged_perceptron_trainer.h
@@ -60,15 +60,17 @@ typedef struct averaged_perceptron_trainer {
    uint64_t num_updates;
    uint64_t num_errors;
    uint32_t iterations;
+    uint64_t min_updates;
    khash_t(str_uint32) *features;
    khash_t(str_uint32) *classes;
    cstring_array *class_strings;
    // {feature_id => {class_id => class_weight_t}}
    khash_t(feature_class_weights) *weights;
+    uint64_array *update_counts;
    double_array *scores;
 } averaged_perceptron_trainer_t;

-averaged_perceptron_trainer_t *averaged_perceptron_trainer_new(void);
+averaged_perceptron_trainer_t *averaged_perceptron_trainer_new(uint64_t min_updates);

 uint32_t averaged_perceptron_trainer_predict(averaged_perceptron_trainer_t *self, cstring_array *features);