Commit Graph

7 Commits

Author SHA1 Message Date
Dino Kovač
6064bc6c06 Use NEON on ARM hardware via sse2neon.h
The autoconf changes were adapted from:
https://github.com/glennrp/libpng/blob/libpng16/configure.ac
2022-06-16 15:49:01 +02:00
Kyrill Alyoshin
9fcf066e38 #511. Fixed C compilation errors for the latest versions of Mac OS X 2020-08-26 15:28:54 -04:00
Al
7218ca1316 [openaddresses] adding Chesterfield, SC 2017-03-19 16:10:29 -04:00
Al
f4a9e9d673 [fix] don't compare a double to 0 2017-03-15 14:59:33 -04:00
Al
6cf113b1df [fix] handle case of T = 0 in Viterbi decoding 2017-03-12 22:55:52 -04:00
Al
a6eaf5ebc5 [fix] had taken out a previous optimization while debugging. Don't need to repeatedly update the backpointer array in viterbi to store an argmax when a stack variable will work. Because that's in the quadratic (only in L, the number o labels, which is small) section of the algorithm, even this small change can make a pretty sizeable difference. CRF training speed is now roughly on par with the greedy model 2017-03-11 02:31:52 -05:00
Al
f9a9dc2224 [parser/crf] adding the beginnings of a linear-chain Conditional Random Field
implementation for the address parser.

One of the main issues with the greedy averaged perceptron tagger used currently
in libpostal is that it predicts left-to-right and commits to its
answers i.e. doesn't revise its previous predictions. The model can use
its own previous predictions to classify the current word, but
effectively it makes the best local decision it can and never looks back
(the YOLO approach to parsing).

This can be problematic in a multilingual setting like libpostal,
since the order of address components is language/country dependent.
It would be preferable to have a model that scores whole
_sequences_ instead of individual tagging decisions.

That's exactly what a Conditional Random Field (CRF) does. Instead of modeling
P(y_i|x_i, y_i-1), we're modeling P(y|x) where y is the whole sequence of labels
and x is the whole sequence of features. They achieve state-of-the-art results
in many tasks (or are a component in the state-of-the-art model - LSTM-CRFs
have been an interesting direction along these lines).

The crf_context module is heavily borrowed from the version in CRFSuite
(https://github.com/chokkan/crfsuite) though using libpostal's data structures and
allowing for "state-transition features." CRFSuite has state features
like "word=the", and transition features i.e. "prev tag=house", but
no notion of a feature which incorporates both local and transition
information e.g. "word=the and prev tag=house". These types of features are useful
in our setting where there are many languages and it might not make as
much sense to simply have a weight for "house_number => road" because that
highly depends on the country. This implementation introduces a T x L^2 matrix for
those state-transition scores.

For linear-chain CRFs, the Viterbi algorithm is used for computing the
most probable sequence. There are versions of Viterbi for computing the
N most probable sequences as well, which may come in handy later. This
can also compute marginal probabilities of a sequence (though it would
need to wait until a gradient-based learning method that produces
well-calibrated probabilities is implemented).

The cool thing architecturally about crf_context as a separate module is that the
weights can be learned through any method we want. As long as the state
scores, state-transition scores, and transition scores are populated on
the context struct, we have everything we need to run Viterbi inference,
etc. without really caring about which training algorithm was used to optimize
the weights, what the features are, how they're stored, etc.

So far the results have been very encouraging. While it is slower to
train a linear-chain CRF, and it will likely add several days to the
training process, it's still reasonably fast at runtime and not all that
slow at training time. In unscientific tests on a busy MacBook Pro, so far
training has been chunking through ~3k addresses / sec, which is only
about half the speed of the greedy tagger (haven't benchmarked the runtime
difference but anecdotally it's hardly noticeable). Libpostal training
runs considerably faster on Linux with gcc, so 3k might be a little low.
I'd also guess that re-computing features every iteration means there's
a limit on the performance of the greedy tagger. The differences might
be more pronounced if features were pre-computed (a possible optimization).
2017-03-10 01:10:22 -05:00