libpostal

Author	SHA1	Message	Date
Al	1bc92d6995	[fix] output path in numex.py	2016-03-29 11:25:36 -04:00
Al	2a2d1738a3	[fix] path for running numex.py	2016-03-29 11:15:24 -04:00
Al	da62ff309e	[transliteration] Fixing Malayalam script	2016-01-17 22:15:56 -05:00
Al	8030b235e6	[languages] Changing the definition in script languages so only languages that appear on street signs will be used	2016-01-17 22:03:41 -05:00
Al	e9e05bb929	[transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules	2015-12-23 13:07:44 -05:00
Al	e55ff54be1	[fix] Adding Korean-Latin-BGN to excluded transliterators	2015-12-21 16:24:50 -05:00
Al	682c316775	[transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either	2015-12-21 12:45:45 -05:00
Al	ccf509edb1	[fix] update to control characters for generating the transliteration rules	2015-12-20 15:40:38 -05:00
Al	b2a944830a	[transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format	2015-12-19 00:34:30 -05:00
Al	7f5cf89e84	[transliteration] Not escaping right side transliteration rules	2015-10-27 12:24:38 -04:00
Al	7dfbcce9ec	[languages] options for get_country_languages	2015-09-30 04:09:07 -04:00
Al	5417b4e602	[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories	2015-09-25 23:59:38 -04:00
Al	abfb1d4a60	[transliteration] Wide char support in transliteration data generator	2015-09-23 03:56:12 -04:00
Al	13bcc35523	[unicode] Allowing wide chars in unicode properties	2015-09-23 00:34:07 -04:00
Al	b4593b6f88	[unicode/tokenization] Using new character classes including wide chars in scanner	2015-09-23 00:33:14 -04:00
Al	a76831df7a	[unicode] Wide version of word breaks	2015-09-22 18:55:33 -04:00
Al	a916668f28	[i18n] Local file for ISO 15924	2015-09-01 23:58:36 -04:00
Al	b8e4c19146	[mv] Moving the get regional/country languages logic out of language polygons	2015-08-23 14:25:33 -04:00
Al	122a81b610	[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib	2015-08-23 02:26:06 -04:00
Al	0701bb6f08	[fix] import	2015-08-22 23:19:43 -04:00
Al	d97c725bbc	[languages] Allowing specification of multiple regional languages	2015-08-18 03:18:52 -04:00
Al	03febc7e20	[scripts] Better script code aliasing	2015-08-13 18:25:55 -04:00
Al	b54ff95ecc	[mv] csv_utils	2015-08-13 18:19:54 -04:00
Al	cf70615850	[transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps	2015-08-11 23:10:55 -04:00
Al	51addec5f2	[fix] check for local CLDR in unicode properties	2015-08-11 20:23:48 -04:00
Al	882e4c2ab8	[fix] ensure CLDR dir	2015-08-11 20:04:42 -04:00
Al	48566bf097	[fix] cldr languages dir	2015-08-11 20:04:25 -04:00
Al	dd391eabe5	[numex] Separating rules from keys for Linux gcc compilation	2015-08-09 01:00:57 -04:00
Al	1d39916aaa	[fix] Fixing warnings in unicode script data	2015-08-02 21:30:54 -06:00
Al	87566bb6a5	[numex] Adding validation checks for numex JSON	2015-07-24 15:22:07 -04:00
Al	64a63fdf51	[mv] Moving all repo data files to a resources dir, data is only for runtime files	2015-07-21 18:11:36 -04:00
Al	076c07e21f	[fix] Add minor languages to the language set	2015-07-16 00:58:58 -04:00
Al	95a6845a85	[i18n] Adding regional languages as valid country languages	2015-07-08 14:54:00 -04:00
Al	a580ed0b1b	[transliteration] Adding numeric HTML escapes e.g. '&'	2015-06-29 15:02:34 -04:00
Al	8fb6a28e9c	[fix] using empty string instead of NULL for script languages so we can use fixed length arrays	2015-06-23 15:20:09 -05:00
Al	b21c3a3a2f	[transliteration] using different struct in script data header file	2015-06-22 22:06:16 -05:00
Al	c2b4744f55	[transliteration] Using a data file instead of a header for transliteration scripts	2015-06-21 05:37:56 -05:00
Al	84b9a6ff33	[transliteration] Adding Hangul-Latin and Jamo-Latin back into the mix with a restricted filter. Reversing all previous contexts by character group	2015-06-17 23:42:31 -04:00
Al	f04fad0e93	[i18n] Generating Hangul syllable classes	2015-06-16 12:50:48 -04:00
Al	67bd9f1a31	[i18n] Adding languages.py	2015-06-15 17:48:47 -04:00
Al	fc735bb5c3	[numex] Adding a whole words only option on numex languages e.g. for Latin so we don't match an initial D with 500	2015-06-12 16:09:45 -04:00
Al	2d098fdab6	[numex] Adding ordinal_indicator rule type for CJK ordinals	2015-06-04 11:24:13 -04:00
Al	4c49f63caf	[numex] Adding categories to numex for plurals, etc. Ordinal indicators support multiple variants (primer in Spanish can be written as 1er or 1r for instance) and longer suffixes e.g. for tracking 1=>1st but 11=>11th	2015-06-04 03:09:39 -04:00
Al	b2fe9d4db0	[transliteration] Adding uppercase umlauts and Scandinativan a-ring	2015-06-03 22:55:45 -04:00
Al	2ea21dfffb	[fix] constants	2015-06-02 13:44:25 -04:00
Al	208366af98	[fix] removing stopwords index	2015-06-02 12:43:48 -04:00
Al	9d0d83bc14	[numex] adding stopword rules with the regular numex rules	2015-06-02 12:37:22 -04:00
Al	4ad978f22c	[numex] Using the new representation for generated data	2015-06-02 12:28:07 -04:00
Al	2dc870b3da	[numex] Python script to generate numex data	2015-06-02 10:15:02 -04:00
Al	6b3d434c31	[fix] removing unnecessary definition	2015-06-01 17:13:57 -04:00

1 2

90 Commits