Logo
Explore Help
Sign In
tommy/libpostal
1
0
Fork 0
You've already forked libpostal
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
72 Commits 2 Branches 0 Tags
d50d7d182e220dcbfbf0e0ddf272ee8baa329fb6
Commit Graph

6 Commits

Author SHA1 Message Date
Al
79fd7a8ded [tokenization/trie] simpler url regex reduces the scanner file size, accounting for a few more variations in word tokens, making trie suffix search use iteration instead of malloc'ing a new string 2015-04-05 16:33:14 -04:00
Al
2d1c24a6e9 [tokenization] Adding url, email, US/international phone numbers, a separate type for ideographic numbers, more general quotes, paren types 2015-03-24 16:43:53 -04:00
Al
f794ef7222 [tokenization] Exposing some of the scanner's methods in header for use in the Python scanner so it can avoid the additional allocation 2015-03-17 18:38:30 -04:00
Al
a446290829 [fix] IDEOGRAM class name 2015-03-11 17:33:53 -04:00
Al
94805fb1a7 [tokenization] Better scanner support for ideographic languages (Chinese, Japanese, Korean, etc.) with an IDEOGRAM token class in the scanner so we know when we're dealing with those languages vs. other random characters 2015-03-11 17:29:37 -04:00
Al
0689f936c9 [tokenization] scanner/tokenizer (generated with re2c) 2015-03-03 12:35:22 -05:00
Powered by Gitea Version: 1.24.6 Page: 16ms Template: 2ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API