[normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input
This commit is contained in:
@@ -46,6 +46,7 @@ As well as normalizations for individual string tokens:
|
||||
#define NORMALIZE_STRING_TRIM 1 << 5
|
||||
#define NORMALIZE_STRING_REPLACE_HYPHENS 1 << 6
|
||||
#define NORMALIZE_STRING_COMPOSE 1 << 7
|
||||
#define NORMALIZE_STRING_SIMPLE_LATIN_ASCII 1 << 8
|
||||
|
||||
#define NORMALIZE_TOKEN_REPLACE_HYPHENS 1 << 0
|
||||
#define NORMALIZE_TOKEN_DELETE_HYPHENS 1 << 1
|
||||
|
||||
Reference in New Issue
Block a user