libpostal

Author	SHA1	Message	Date
Al	6081df0cd1	[osm] adding admin1 ids to the OSM country rtree	2016-10-04 23:12:15 -04:00
Al	cb4408fea8	[transliteration] Adding language-specific transliterators for handling umlauts in German + special transliterations in the Nordic languages. It may still result in some wrong transliterations if the language classifier is wrong, but generally it's accurate enough that its predictions can be relied upon. Also adding a Latin-ASCII-Simple transform which only does the punctuation portion of Latin-ASCII so it won't change anything substantial about the input string.	2016-08-20 18:17:46 -04:00
Al	93586c2592	[fix] aliasing all_languages	2016-08-18 02:24:59 -04:00
Al	1ef57ee7d2	[i18n/postcodes] Fetching postcode regexes from the data source used by Google's libaddressinput, caches requests for the length of the running program (e.g. generating parser data, so the regexes will get updated over time).	2016-07-26 17:42:50 -04:00
Al	cdf8829942	[fix] no longer requiring argv for unicode_properties script	2016-07-21 17:04:57 -04:00
Al	6703da8fc3	[fix] languages and disambiguation do initialization by default	2016-07-21 17:04:57 -04:00
Al	c506649252	[fix] languages_intialized	2016-07-21 17:04:57 -04:00
Al	5e2d9f371e	[numex] Moving numex script to a different subpackage, adding function for creating ordinals	2016-07-21 17:04:57 -04:00
Al	1bc92d6995	[fix] output path in numex.py	2016-03-29 11:25:36 -04:00
Al	2a2d1738a3	[fix] path for running numex.py	2016-03-29 11:15:24 -04:00
Al	da62ff309e	[transliteration] Fixing Malayalam script	2016-01-17 22:15:56 -05:00
Al	8030b235e6	[languages] Changing the definition in script languages so only languages that appear on street signs will be used	2016-01-17 22:03:41 -05:00
Al	e9e05bb929	[transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules	2015-12-23 13:07:44 -05:00
Al	e55ff54be1	[fix] Adding Korean-Latin-BGN to excluded transliterators	2015-12-21 16:24:50 -05:00
Al	682c316775	[transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either	2015-12-21 12:45:45 -05:00
Al	ccf509edb1	[fix] update to control characters for generating the transliteration rules	2015-12-20 15:40:38 -05:00
Al	b2a944830a	[transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format	2015-12-19 00:34:30 -05:00
Al	7f5cf89e84	[transliteration] Not escaping right side transliteration rules	2015-10-27 12:24:38 -04:00
Al	7dfbcce9ec	[languages] options for get_country_languages	2015-09-30 04:09:07 -04:00
Al	5417b4e602	[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories	2015-09-25 23:59:38 -04:00
Al	abfb1d4a60	[transliteration] Wide char support in transliteration data generator	2015-09-23 03:56:12 -04:00
Al	13bcc35523	[unicode] Allowing wide chars in unicode properties	2015-09-23 00:34:07 -04:00
Al	b4593b6f88	[unicode/tokenization] Using new character classes including wide chars in scanner	2015-09-23 00:33:14 -04:00
Al	a76831df7a	[unicode] Wide version of word breaks	2015-09-22 18:55:33 -04:00
Al	a916668f28	[i18n] Local file for ISO 15924	2015-09-01 23:58:36 -04:00
Al	b8e4c19146	[mv] Moving the get regional/country languages logic out of language polygons	2015-08-23 14:25:33 -04:00
Al	122a81b610	[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib	2015-08-23 02:26:06 -04:00
Al	0701bb6f08	[fix] import	2015-08-22 23:19:43 -04:00
Al	d97c725bbc	[languages] Allowing specification of multiple regional languages	2015-08-18 03:18:52 -04:00
Al	03febc7e20	[scripts] Better script code aliasing	2015-08-13 18:25:55 -04:00
Al	b54ff95ecc	[mv] csv_utils	2015-08-13 18:19:54 -04:00
Al	cf70615850	[transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps	2015-08-11 23:10:55 -04:00
Al	51addec5f2	[fix] check for local CLDR in unicode properties	2015-08-11 20:23:48 -04:00
Al	882e4c2ab8	[fix] ensure CLDR dir	2015-08-11 20:04:42 -04:00
Al	48566bf097	[fix] cldr languages dir	2015-08-11 20:04:25 -04:00
Al	dd391eabe5	[numex] Separating rules from keys for Linux gcc compilation	2015-08-09 01:00:57 -04:00
Al	1d39916aaa	[fix] Fixing warnings in unicode script data	2015-08-02 21:30:54 -06:00
Al	87566bb6a5	[numex] Adding validation checks for numex JSON	2015-07-24 15:22:07 -04:00
Al	64a63fdf51	[mv] Moving all repo data files to a resources dir, data is only for runtime files	2015-07-21 18:11:36 -04:00
Al	076c07e21f	[fix] Add minor languages to the language set	2015-07-16 00:58:58 -04:00
Al	95a6845a85	[i18n] Adding regional languages as valid country languages	2015-07-08 14:54:00 -04:00
Al	a580ed0b1b	[transliteration] Adding numeric HTML escapes e.g. '&'	2015-06-29 15:02:34 -04:00
Al	8fb6a28e9c	[fix] using empty string instead of NULL for script languages so we can use fixed length arrays	2015-06-23 15:20:09 -05:00
Al	b21c3a3a2f	[transliteration] using different struct in script data header file	2015-06-22 22:06:16 -05:00
Al	c2b4744f55	[transliteration] Using a data file instead of a header for transliteration scripts	2015-06-21 05:37:56 -05:00
Al	84b9a6ff33	[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group	2015-06-17 23:42:31 -04:00
Al	f04fad0e93	[i18n] Generating Hangul syllable classes	2015-06-16 12:50:48 -04:00
Al	67bd9f1a31	[i18n] Adding languages.py	2015-06-15 17:48:47 -04:00
Al	fc735bb5c3	[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500	2015-06-12 16:09:45 -04:00
Al	2d098fdab6	[numex] Adding ordinal_indicator rule type for CJK ordinals	2015-06-04 11:24:13 -04:00

1 2

98 Commits