[normalization] Adding NORMALIZE_STRING_SIMPLE_LATIN_ASCII option so parser can normalize punctuation and HTML entities, etc. without touching the alphanumeric parts of the original input

This commit is contained in:
Al
2016-08-21 19:45:32 -04:00
parent 8b9702b43d
commit 58851a9088
2 changed files with 39 additions and 9 deletions

View File

@@ -46,6 +46,7 @@ As well as normalizations for individual string tokens:
#define NORMALIZE_STRING_TRIM 1 << 5
#define NORMALIZE_STRING_REPLACE_HYPHENS 1 << 6
#define NORMALIZE_STRING_COMPOSE 1 << 7
#define NORMALIZE_STRING_SIMPLE_LATIN_ASCII 1 << 8
#define NORMALIZE_TOKEN_REPLACE_HYPHENS 1 << 0
#define NORMALIZE_TOKEN_DELETE_HYPHENS 1 << 1