libpostal

Author	SHA1	Message	Date
Al	1f1dbe25e1	[test] adding a number of user-contributed test cases from Moz in #21 . Almost all are working under the CRF parser trained on 10% of the data. There are a few problematic ones in the UK still that have been omitted here. We currently don't correctly format the training data for locailty + postal town pattern, which are both considered "city" by libpostal and thus one will usually get lumped in with the road or something like that. There may also be some utility in modelling comma usage (training data has commas, but they're ignored by the parser both at train and run time - might be useful to train on them but drop out randomly so the parser doesn't become too dependent on having them)	2017-03-21 03:08:09 -04:00
Al	b8a12e0517	[test] adding parser test cases in 22 countries. These may change, and I'm generlaly against putting every obscure test case in the world in here. It's better to measure accuracy in aggregate statistics instead of individual test cases (i.e. if a particular change to the parser improves overall performance but fails one test case, should we accept the improvement?) The thought here is: these represent parses that are used in documentation/examples, as well as most of those that have been brought up in Github issues from the initial release, and we want these specific tests to work from build to build. If a model fails one of these test cases, it shouldn't get pushed to our users.	2017-03-20 00:58:52 -04:00
Al	37cfe8ab3b	[test] Adding automated parser tests to the C library	2016-02-17 17:19:10 -05:00

3 Commits