This website requires JavaScript.
Explore
Help
Sign In
tommy
/
libpostal
Watch
1
Star
0
Fork
0
You've already forked libpostal
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
400ea589efa1e3329f7e28d541b70374e57982e5
libpostal
/
scripts
/
geodata
/
i18n
History
Al
77efcb3f89
[fix] only accept language suffixes that are valid scripts or transliterations of CJK languages. Set language to language suffix so Romaji forms get used, etc.
2016-12-24 17:17:09 -05:00
..
__init__.py
[tokenization] Script to generate TR-29 ranges for re2c scanner
2015-04-14 15:50:50 -04:00
cldr_languages.py
[fix] cldr languages dir
2015-08-11 20:04:25 -04:00
download_cldr.py
[fix] ensure CLDR dir
2015-08-11 20:04:42 -04:00
google.py
[i18n/postcodes] Fetching postcode regexes from the data source used by Google's libaddressinput, caches requests for the length of the running program (e.g. generating parser data, so the regexes will get updated over time).
2016-07-26 17:42:50 -04:00
languages.py
[osm] adding admin1 ids to the OSM country rtree
2016-10-04 23:12:15 -04:00
normalize.py
[fix] import
2015-08-22 23:19:43 -04:00
scanner.py
[cldr] simple Python scanner for creating dynamic scanners for CLDR rule parsing
2015-04-14 15:49:24 -04:00
transliteration_rules.py
[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.
2016-08-20 18:17:46 -04:00
unicode_data.py
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories
2015-09-25 23:59:38 -04:00
unicode_paths.py
[mv] Moving all repo data files to a resources dir, data is only for runtime files
2015-07-21 18:11:36 -04:00
unicode_properties.py
[fix] only accept language suffixes that are valid scripts or transliterations of CJK languages. Set language to language suffix so Romaji forms get used, etc.
2016-12-24 17:17:09 -05:00
word_breaks.py
[unicode] Wide version of word breaks
2015-09-22 18:55:33 -04:00