[dedupe] adding a function to acronyms module to detect existing/known acronyms like MS for middle school, HS for high school, etc. Forms like MS have to be deined in the dictionaries specifically but any acronym written like M.S. will be detected as such by the tokenizer
This commit is contained in:
@@ -721,6 +721,10 @@ ssize_t string_next_hyphen_index(char *str, size_t len) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
inline bool string_contains(char *str, char *sub) {
|
||||
return str != NULL && sub != NULL && strstr(str, sub) != NULL;
|
||||
}
|
||||
|
||||
inline bool string_contains_hyphen_len(char *str, size_t len) {
|
||||
return string_next_hyphen_index(str, len) >= 0;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user