[numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement"
This commit is contained in:
@@ -33,6 +33,7 @@ As well as normalizations for individual string tokens:
|
||||
#include "string_utils.h"
|
||||
#include "utf8proc/utf8proc.h"
|
||||
#include "unicode_scripts.h"
|
||||
#include "numex.h"
|
||||
#include "transliterate.h"
|
||||
#include "trie.h"
|
||||
#include "tokens.h"
|
||||
@@ -47,6 +48,7 @@ As well as normalizations for individual string tokens:
|
||||
#define NORMALIZE_STRING_REPLACE_HYPHENS 1 << 6
|
||||
#define NORMALIZE_STRING_COMPOSE 1 << 7
|
||||
#define NORMALIZE_STRING_SIMPLE_LATIN_ASCII 1 << 8
|
||||
#define NORMALIZE_STRING_REPLACE_NUMEX 1 << 9
|
||||
|
||||
#define NORMALIZE_TOKEN_REPLACE_HYPHENS 1 << 0
|
||||
#define NORMALIZE_TOKEN_DELETE_HYPHENS 1 << 1
|
||||
|
||||
Reference in New Issue
Block a user