Commit Graph

16 Commits

Author SHA1 Message Date
Al
400ea589ef [normalize] add NORMALIZE_STRING_SIMPLE_LATIN_ASCII option to pynormalize 2017-01-02 02:08:54 -05:00
Al
80ee34cc3a [text] adding normalization with whitespace 2016-12-10 17:50:53 -05:00
Al
c0a468d7e8 [normalization] adding a normalize_token function and some token options for deleting periods 2016-12-09 17:46:26 -05:00
Al
dfa5c8e0a6 [abbreviations] Adding ability to abbreviate within hyphenated phrases e.g. Sint-Maarten => St.-Maarten 2016-08-24 18:50:24 -04:00
Al
97a2436ad7 [tokenization] Adding two more sets to token_types for punctuation and non-alphanumerics 2016-08-02 16:24:01 -04:00
Al
75d9c31395 [text] Adding NORMALIZE_STRING_COMPOSE constant in pynormalize.c 2016-07-24 03:37:43 -04:00
Al
7b3f4e9175 [text] Adding utils.py for is_numeric/is_numeric_strict 2016-07-24 03:37:11 -04:00
Al
b9ee3be806 [phrases] Using simple string encoding/decoding for default serialize/deserialize in PhraseFilter base class 2016-07-21 17:04:57 -04:00
Al
771a360a85 [phrases] Using safe_encode/safe_decode as default trie serializer/deserializer 2016-07-21 17:04:57 -04:00
Al
4a2d266230 [phrases] adding __init__ to base PhraseFilter 2016-07-21 17:04:57 -04:00
Al
ee1aa564c4 [normalization] normalize tokens should not replace digits by default 2016-07-21 17:04:57 -04:00
Al
1fd4fbb7a2 [normalization] Adding default token options for numbers so we split alpha from numeric tokens and don't normalize digits 2016-07-21 17:04:57 -04:00
Al
d5dc34ec1d [gazetteers] moving PHRASE to a token type 2016-07-21 17:04:57 -04:00
Al
2e15db06dd [text] making normalize_string directly callable from Python geodata 2016-01-21 02:07:46 -05:00
Al
fa32eacdd1 [phrases] Adding Python phrase filter from address_normalizer until a Python wrapper around libpostal's trie_search is available 2016-01-17 15:45:02 -05:00
Al
58e53cab1c [scripts] Adding the tokenize/normalize wrappers directly into the internal geodata package so pypostal can be maintained in an independent repo 2016-01-12 13:29:31 -05:00