Al
|
2d1c24a6e9
|
[tokenization] Adding url, email, US/international phone numbers, a separate type for ideographic numbers, more general quotes, paren types
|
2015-03-24 16:43:53 -04:00 |
|
Al
|
d2ceb5f418
|
[fix] removing struct definition from scanner.re for future generation of scanner.c
|
2015-03-17 19:46:40 -04:00 |
|
Al
|
f794ef7222
|
[tokenization] Exposing some of the scanner's methods in header for use in the Python scanner so it can avoid the additional allocation
|
2015-03-17 18:38:30 -04:00 |
|
Al
|
a446290829
|
[fix] IDEOGRAM class name
|
2015-03-11 17:33:53 -04:00 |
|
Al
|
94805fb1a7
|
[tokenization] Better scanner support for ideographic languages (Chinese, Japanese, Korean, etc.) with an IDEOGRAM token class in the scanner so we know when we're dealing with those languages vs. other random characters
|
2015-03-11 17:29:37 -04:00 |
|
Al
|
0689f936c9
|
[tokenization] scanner/tokenizer (generated with re2c)
|
2015-03-03 12:35:22 -05:00 |
|