This website requires JavaScript.
Explore
Help
Sign In
tommy
/
libpostal
Watch
1
Star
0
Fork
0
You've already forked libpostal
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
5,190
Commits
2
Branches
0
Tags
999de2bf6a634fd2cb71437c4c18e0fdb65bfa40
Commit Graph
3 Commits
Author
SHA1
Message
Date
Al
434bbd4dc2
[fix] removing unused vars
2017-12-30 02:31:43 -05:00
Al
f1e6886536
[similarity/dedupe] adding options for acronym alignments and address phrase matches in Soft-TFIDF. Acronym alignments will give higher similarity to NYU vs. "New York University" whereas phrase matches would match known phrases that share the same canonical like "Cty Rd" vs. "C.R." vs. "County Road" within the Soft-TFIDF similarity calculation.
2017-12-29 02:39:49 -05:00
Al
b90c3dab4b
[similarity/dedupe] adding Soft-TFIDF implementation with several different fallback qualifiers for the max-sim function (Damerau-Levenshtein and libpostal's new bucketed affine gap method for detecting abbreviations), but keeping Jaro-Winkler as the secondary similarity function in the final distance metric. Overall this should results in higher similarity values when one of the tokens may not quite match the pure secondary threshold in terms of Jaro-Winkler but may match on one of the other criteria.
2017-12-28 04:34:46 -05:00