This website requires JavaScript.
Explore
Help
Sign In
tommy
/
libpostal
Watch
1
Star
0
Fork
0
You've already forked libpostal
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
5,343
Commits
2
Branches
0
Tags
a11f33fb3d84b0a41334e10ce8a5583f514177fd
Commit Graph
2 Commits
Author
SHA1
Message
Date
Al
26124ee72f
[near_dupes] exposing name_word_hashes directly in the API
2022-03-25 14:04:26 -04:00
Al
acfdb50d7c
[dedupe] adding near-dupe hashing function, which can be thought of as the blocking function in record linkage or as a form of locally sensitive hashing in general document deduping. The goal is, if two addresses/names are the same, they should share at least one hash. These hashes can also be used as an inverted index (DB, ES, hashtable, etc.). Uses the double metaphone for name words in Latin script (otherwise each individual token, and sequences of two tokens in the case of ideograms for e.g. Chinese, Japanese, Korean, etc.)
2017-12-24 02:47:45 -05:00