Commit Graph

620 Commits

Author SHA1 Message Date
Al
66a71ab70d [normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes 2015-08-11 23:36:08 -04:00
Al
87b275fcab [transliteration] Regenerating transliteration data file 2015-08-11 23:11:17 -04:00
Al
cf70615850 [transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps 2015-08-11 23:10:55 -04:00
Al
9712e0fa87 [fix] phrase start in transliteration 2015-08-11 23:09:49 -04:00
Al
562a7c243d [phrases] Fixing tail searches in trie_get_prefix* 2015-08-11 23:08:21 -04:00
Al
51addec5f2 [fix] check for local CLDR in unicode properties 2015-08-11 20:23:48 -04:00
Al
882e4c2ab8 [fix] ensure CLDR dir 2015-08-11 20:04:42 -04:00
Al
48566bf097 [fix] cldr languages dir 2015-08-11 20:04:25 -04:00
Al
e98a822661 [build] ORder-only dependencies for downloading data files, rm-ing the tarball when done extracting 2015-08-11 12:59:37 -04:00
Al
0028c2bc53 [build] Fixing tarball uploading 2015-08-11 03:18:35 -04:00
Al
f21b767696 [build] Adding tarball back to pkgdata 2015-08-10 18:44:40 -04:00
Al
c29cf5ac9a [api] Better handling of strings with multiple scripts and strings that use more than one transliterator. Reducing complexity/allocations 2015-08-10 17:51:41 -04:00
Al
4bc6adf669 [normalize] Adding the original script as an alternative in transliteration mode as well 2015-08-10 17:48:48 -04:00
Al
a13e5117b5 [utils] string_tree_num_strings method 2015-08-10 17:46:37 -04:00
Al
219947722d [cli] delete_word_hyphens as a default option 2015-08-10 16:19:54 -04:00
Al
78a80dd86e [api] Add separable or inseparable non-canonical string affixes (e.g. foobg. => fooburg, foostrasse => foostraße|foo straße, l'ensemble => l' ensemble, etc.) in expand_address 2015-08-10 16:19:03 -04:00
Al
de5d6945b5 [expansion] Adding search_address_dictionaries_prefix/suffix for concatenated prefixes/suffixes e.g. in Germanic languages. Adding a flag to the address_expansion struct and trie value to denote separability, adding prefix/suffix keys during dictionary creation 2015-08-10 16:15:01 -04:00
Al
0f77ca1213 [normalize] Adding a char_array version of normalize token 2015-08-10 16:11:34 -04:00
Al
064b6b5898 [utils] char_array_append_reversed for adding reversed strings without a malloc 2015-08-10 16:10:05 -04:00
Al
dab181a4d7 [fix] Only the exact TRIE_PREFIX_CHAR/TRIE_SUFFIX_CHAR characters are disallowed as keys 2015-08-10 16:09:10 -04:00
Al
e511eede74 [phrases] Prefix/suffix trie search using the new characters, fixing length of matched prefixes/suffixes and exiting early on falling off the the trie 2015-08-10 16:02:38 -04:00
Al
51572d6575 [phrases] Changing prefix/suffix chars so both are control characters and neither is the NUL-byte. Modifying transliteration special characters accordingly 2015-08-10 16:01:22 -04:00
Al
11a9881988 [phrases] adding _from_index_get_prefix_char/_from_index_get_suffix_char methods 2015-08-09 03:41:20 -04:00
Al
2eb67ad850 [phrases] trie_search_prefixes/trie_search_suffixes now take a length param 2015-08-09 02:01:37 -04:00
Al
bbaa302e2e [fix] NUMEX_STOPWORD_RULE define 2015-08-09 01:03:23 -04:00
Al
5383640c14 [fix] cast 2015-08-09 01:01:11 -04:00
Al
dd391eabe5 [numex] Separating rules from keys for Linux gcc compilation 2015-08-09 01:00:57 -04:00
Al
e346b831cb [build] public-read permissions when uploading to S3 2015-08-09 00:17:04 -04:00
Al
ad584671c4 [build] Not compiling with -Werror for now 2015-08-09 00:02:41 -04:00
Al
f170f70727 [build] Link to math library 2015-08-09 00:01:44 -04:00
Al
423e2c86c7 [build] builder programs are now in noinst_PROGRAMS, Makefile target to upload data tarball to S3 (with proper credentials) 2015-08-08 23:29:34 -04:00
Al
a5ce1f12dd [fix] stdint header in address expansion rule generation script 2015-08-08 23:28:11 -04:00
Al
ee982cd872 [dictionaries] Removing dictionaries/all/personal_suffixes, can add to languages as needed 2015-08-08 23:13:09 -04:00
Al
5acf7a4f3e [phrases] resetting node position when continuation falls off the trie 2015-08-08 22:18:05 -04:00
Al
a77c8e1321 [build] Adding bootstrap.sh script and removing configure from version control 2015-08-08 21:22:11 -04:00
Al
cd0f95f9e2 [fix] making transliteration path relative to data dir 2015-08-08 21:06:02 -04:00
Al
2ba0e814ad [build] better autoconf checks for time and dirent headers 2015-08-08 21:02:03 -04:00
Al
d0679450e3 [config] Including Autoconf config.h in internal config 2015-08-08 20:50:23 -04:00
Al
5df9e123af [numex] Fix to whole_tokens_only numeric experession parsing where numex was pushing a number onto the stack even on encountering a new rule context even though the token was not completely parsed 2015-08-08 20:49:54 -04:00
Al
53f54d6454 [fix] removing comment 2015-08-08 20:23:49 -04:00
Al
2106a6cfe4 [build] Adding command-line test and bench programs 2015-08-08 19:44:50 -04:00
Al
5aa2e99b92 [fix] data dir for tar extraction 2015-08-08 19:42:37 -04:00
Al
54aa6fe7df [build] Fixing runtime check/save of last updated file for package data tarball 2015-08-08 17:16:03 -04:00
Al
f38a53601b [rm] Better not to keep that file in the repo 2015-08-08 02:41:54 -04:00
Al
770f44198c [build] Adding default file to track last updated date 2015-08-08 02:30:42 -04:00
Al
c0c21b81f2 [build] Adding generated configure script 2015-08-07 17:35:44 -04:00
Al
a197d04b1a [fix] float comparison 2015-08-07 17:28:21 -04:00
Al
f161f68d53 [build] Changes to Makefile.am to build on Debian/Ubuntu, fixing downloading of the data tarball for Mac and Linux 2015-08-07 17:27:34 -04:00
Al
9b69d1f67a [fix] Removing C++ checks from all but the main API functions 2015-08-07 17:15:39 -04:00
Al
359a1efb03 [fix] Adding stdint.h include to most of the header files for portability 2015-08-07 02:43:44 -04:00