libpostal

Author	SHA1	Message	Date
Al	95015990ab	[parser] learning a sparser averaged perceptron model for the parser using the following method: - store a vector of update counts for each feature in the model - when the model updates after making a mistake, increment the update counters for the observed features in that example - after the model is finished training, keep only the features that participated in a minimum number of updates This method is described in greater detail in this paper from Yoav Goldberg: https://www.cs.bgu.ac.il/~yoavg/publications/acl2011sparse.pdf The authors there report a 4x size reduction at only a trivial cost in terms of accuracy. So far the trials on libpostal indicate roughly the same, though at lower training set sizes the accuracy cost is greater. This method is more effective than simple feature pruning as feature pruning methods are usually based on the frequency of the feature in the training set, and infrequent features can still be important. However, the perceptron's early iterations make many updates on irrelevant featuers simply because the weights for the more relevant features aren't tuned yet. The number of updates a feature participates in can be seen as a measure of its relevance to classifying examples. This commit introduces --min-features option to address_parser_train (default=5), so it can effectively be turned off by using "--min-features 0" or "--min-features 1".	2017-03-06 22:28:33 -05:00
Al	5c1c1ae0f2	[parser] moving tagger function pointer definition to a separate header so it can be used for other models	2017-03-06 21:42:06 -05:00
Al	8ea5405c20	[parser] using separate arrays for features requiring tag history and making the tagger responsible for those features so the feature function does not require passing in prev and prev2 explicitly (i.e. don't need to run the feature function multiple times if using global best-sequence prediction)	2017-02-19 14:21:58 -08:00
Al	22668945cb	[mv] Moving trie_new_from_hash to a module	2016-01-05 16:43:17 -05:00
Al	d3040036ec	[fix] moving separator definitions	2015-11-28 13:53:13 -05:00
Al	8ca22247f9	[fix] labels in averaged perceptron trainer	2015-09-29 13:07:07 -04:00
Al	8a86f7ec64	[parser] Adding context struct to feature function	2015-09-17 05:48:00 -04:00
Al	9de3029dd3	[parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize	2015-09-14 17:38:45 -04:00
Al	6a5b01b51b	[parser] Averaged perceptron training	2015-09-10 10:26:24 -07:00

9 Commits