Commit Graph

13 Commits

Author SHA1 Message Date
Al
3401045b4f [fix] changing labels in Python normalize, adding a NULL check 2015-12-14 14:59:57 -05:00
Al
cbeb08f1d1 [python/normalize] importing options from the C module 2015-10-30 12:34:07 -04:00
Al
e7f783477f [python/normalize] Adding remove parentheses options in Python normalize (would require compiling with the scanner to do it from C, but could switch) 2015-10-30 01:27:16 -04:00
Al
cee9da05d6 [fix] using tokenize_raw API 2015-10-28 21:37:44 -04:00
Al
9a92a1154d [python] Making normalized_tokens return token classes as well, mimicking the tokenize API 2015-10-27 17:07:50 -04:00
Al
9f6e1387a0 [fix] Error condition in Python tokenize 2015-10-27 13:33:28 -04:00
Al
40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) 2015-10-27 13:32:47 -04:00
Al
f6b6a17335 [python/normalization] Adding Python bindings to the normalize module for use in OSM polygon matching 2015-10-26 18:07:53 -04:00
Al
8a188903b3 [python] Using tuples in pytokenize instead of list, pre-allocating 2015-10-26 18:04:13 -04:00
Al
4f784060a3 [python] Adding word_token_types 2015-10-25 18:33:09 -04:00
Al
236737eab3 [tokenization/osm] Using utf8 encoded version of string for tokens in python tokenizer 2015-09-21 17:27:43 -04:00
Al
5b2fd0be50 [fix] pytokenize compilation on Ubuntu/gcc 2015-09-21 03:24:14 -04:00
Al
5485ea2197 [python] Adding initial pypostal bindings for tokenize so we can remove address_normalizer dependency. Not tested on Python 3. 2015-09-20 14:59:39 -04:00