Al
|
2b4e7073c2
|
[similarity] adding a multi-word alignmnet algorithm for streets and names like "de la cruz" vs. "dela cruz" or "Oceanwalk Ter" vs. "Ocean Walk Ter"
|
2018-02-23 01:22:12 -05:00 |
|
Al
|
fbf88aee88
|
[similarity] adding possible abbreviation functions to header, making everything const char *
|
2017-11-12 04:48:26 -05:00 |
|
Al
|
751873e56b
|
[similarity] a *NEW* sequence alignment algorithm which builds on Smith-Waterman-Gotoh with affine gap penalties. Like Smith-Waterman, it performs a local alignment, and like the cost-only version of Gotoh's improvement, it needs O(mn) time and O(m) space (where m is the length of the longer string). However, this version of the algorithm stores and returns a breakdown of the number and specific types of edits it makes (matches, mismatches, gap opens, gap extensions, and transpositions) rather than rolling them up into a single cost, and without needing to return/compute the full alignment as in Needleman-Wunsch or Hirschberg's variant
|
2017-11-11 03:07:39 -05:00 |
|
Al
|
bc9f11d6e3
|
[similarity] exposing unicode versions of Damerau-Levenshtein and Jaro-Winkler distances
|
2017-10-28 02:45:48 -04:00 |
|
Al
|
4ccc2a9e9f
|
[fix] making string args const in string_similarity module
|
2017-10-21 02:45:22 -04:00 |
|
Al
|
bd477976d1
|
[similarity] string similarity measures for Damerau-Levenshtein and Jaro-Winkler distances. Both operate on unicode points internally for lengths, etc. instead of byte strings and the Levenshtein distance uses only one array instead of needing to store the full matrix of transitions.
|
2017-10-19 04:51:33 -04:00 |
|