[dedupe] for strict abbreviations (defined as sharing a prefix and a suffix, and containing matches+gaps only by the subtotaling affine gap measure), using the greater of the two scores. This accounts for cases where the abbreviated version may have a much higher weight in one string than the non-abbreviated version does in the other. Same for acronym alignments. Making sure there's a common prefix in regular abbeviation detection Capping the Soft-TFIDF similarity at 1.0.
This commit is contained in:
@@ -39,6 +39,8 @@ typedef struct soft_tfidf_options {
|
||||
size_t damerau_levenshtein_max;
|
||||
size_t damerau_levenshtein_min_length;
|
||||
bool possible_affine_gap_abbreviations;
|
||||
size_t strict_abbreviation_min_length;
|
||||
double strict_abbreviation_sim;
|
||||
} soft_tfidf_options_t;
|
||||
|
||||
soft_tfidf_options_t soft_tfidf_default_options(void);
|
||||
|
||||
Reference in New Issue
Block a user