Al
|
ff75c5cc50
|
[normalize] Adding normalize_string_languages method which can use additional transliterators
|
2015-12-31 03:50:36 -05:00 |
|
Al
|
7906f5542d
|
[dictionaries] ulitsa is the proper transliteration for Russian
|
2015-12-31 03:49:51 -05:00 |
|
Al
|
9335d26fbd
|
[fix] spacing
|
2015-12-31 02:26:28 -05:00 |
|
Al
|
7bd1336b3b
|
[fix] Freeing languages in Python
|
2015-12-31 01:46:04 -05:00 |
|
Al
|
cc89b768d8
|
[dictionaries] New Japanese abbreviations from the OSM wiki
|
2015-12-31 01:32:42 -05:00 |
|
Al
|
ffe9c2a971
|
[dictionaries] Santi/SS in Italian
|
2015-12-31 01:32:21 -05:00 |
|
Al
|
ecfdbc3ec2
|
[dictionaries] New German toponym abbreviations from the OSM wiki
|
2015-12-31 01:32:00 -05:00 |
|
Al
|
a6f7924f12
|
[dictionaries] Adding service road to English
|
2015-12-31 01:31:27 -05:00 |
|
Al
|
684c238ca0
|
[dictionaries] Adding no to English ambiguous
|
2015-12-31 01:31:01 -05:00 |
|
Al
|
1b0567a881
|
[fix] Ubuntu build
|
2015-12-28 17:19:50 -05:00 |
|
Al
|
77ccd975c4
|
[fix] #endif
|
2015-12-28 17:03:12 -05:00 |
|
Al
|
d0b5985cb7
|
[build] Adding /usr/local/lib and /usr/local/include to sparkey build
|
2015-12-28 16:56:10 -05:00 |
|
Al
|
508459a9f9
|
[build] Adding -L/usr/local/lib to LDFLAGS before searching for snappy
|
2015-12-28 16:54:13 -05:00 |
|
Al
|
d6362ba0fc
|
[docs] Fleshing out parser description, correcting city name in Russian address
|
2015-12-28 15:46:56 -05:00 |
|
Al
|
45b5e2dd6f
|
[fix] array_zero
|
2015-12-28 01:24:27 -05:00 |
|
Al
|
fb4c984f15
|
[math] sparse_matrix_new_shape
|
2015-12-28 01:20:23 -05:00 |
|
Al
|
72ad01cbc3
|
[features] Using a str=>double hashtable for feature counts
|
2015-12-28 01:18:49 -05:00 |
|
Al
|
e4dba2297d
|
[mv] Moving token type checking to header
|
2015-12-28 01:17:33 -05:00 |
|
Al
|
0fa1c2389c
|
[fix] Leak in expanding strings that have a separable prefix and suffix, other than that ran through 78 million expansions with no discernable memory issues
|
2015-12-26 17:19:59 -05:00 |
|
Al
|
deeb8f007e
|
[fix] Check for result.len > 0 in false start continuation numex parsing, plus additional safety check during replacement
|
2015-12-24 02:26:53 -05:00 |
|
Al
|
507dd631f8
|
[build] Adding json_encode.c to the address parser client sources
|
2015-12-23 19:37:28 -05:00 |
|
Al
|
5e6d24ff7e
|
[unicode] Upgrading to latest utf8proc from JuliaLang (Unicode 8)
|
2015-12-23 19:33:09 -05:00 |
|
Al
|
3fbb3c587a
|
[fix] using a char_array instead of copying the string in normalize_string
|
2015-12-23 19:21:54 -05:00 |
|
Al
|
2eea999692
|
[fix] Fixing false start continuations in numex parsing
|
2015-12-23 19:19:14 -05:00 |
|
Al
|
850d82de6e
|
[fix] In trie search, moving fall-off and tail checks inside the inner character loop dding tail position as a separate variable from offset in the string
|
2015-12-23 19:16:43 -05:00 |
|
Al
|
19173d3a6e
|
[transliteration] In set match checks, use the current index, not current index - char_len
|
2015-12-23 13:12:30 -05:00 |
|
Al
|
e9e05bb929
|
[transliteration] Distinguishing between variables with numbers and backreferences in transliteration rules
|
2015-12-23 13:07:44 -05:00 |
|
Al
|
aaa1fc0387
|
[fix] Stepping through codepoints first then through chars in trie_search_prefixes_from_index (used in transliteration and numex)
|
2015-12-23 01:58:39 -05:00 |
|
Al
|
baa8e3cc3f
|
[fix] Compare the remaining part of the current UTF-8 character using simple string comparison, since it may be in the middle of a valid UTF-8 character
|
2015-12-21 20:34:15 -05:00 |
|
Al
|
57040b8733
|
[docs] README fixes
|
2015-12-21 17:45:55 -05:00 |
|
Al
|
ceda863e9f
|
[fix] Encode strings as JSON in address parser cli
|
2015-12-21 17:45:09 -05:00 |
|
Al
|
e55ff54be1
|
[fix] Adding Korean-Latin-BGN to excluded transliterators
|
2015-12-21 16:24:50 -05:00 |
|
Al
|
c7fb7f685d
|
[transliteration] Fixing group replacement in transliteration in the case of multiple groups, not adding to phrase length when checking context
|
2015-12-21 16:06:04 -05:00 |
|
Al
|
682c316775
|
[transliteration] Removing Korean-Latin-BGN, not a great transliterator and AFAICT, ICU doesn't use it either
|
2015-12-21 12:45:45 -05:00 |
|
Al
|
ab124465e6
|
[fix] regenerating transliteration data
|
2015-12-20 15:41:42 -05:00 |
|
Al
|
ccf509edb1
|
[fix] update to control characters for generating the transliteration rules
|
2015-12-20 15:40:38 -05:00 |
|
Al
|
5439f4679f
|
[fix] Special tokens like emails/urls/phone numbers bypass normalization
|
2015-12-20 03:07:36 -05:00 |
|
Al
|
cf2a0efa11
|
[fix] Prefixes and suffixes that are the same length as the original token should be handled as regular expansions
|
2015-12-19 17:29:26 -05:00 |
|
Al
|
aaecd7961a
|
[fix] Options out of order
|
2015-12-19 15:05:50 -05:00 |
|
Al
|
48cb2b5c7b
|
[api] Node was complaining about non-trivial designated initializers (probably the bit fields), so converting to old-school initializer
|
2015-12-19 02:34:31 -05:00 |
|
Al
|
97906c86a8
|
[fix] Strip punctuation in final output in cases where there are no expansions
|
2015-12-19 02:10:41 -05:00 |
|
Al
|
4497c4501e
|
[fix] do not add a token if prefix/suffix expansions are inseparable and canonical
|
2015-12-19 01:36:02 -05:00 |
|
Al
|
f8da44e8b0
|
[fix] Making a copy even on pure Latin-script transliteration since string_trim modifies in-place, occasionally causes issues
|
2015-12-19 01:31:56 -05:00 |
|
Al
|
39e83961ef
|
[fix] Bug in suffix expansion affecting inseparable suffixes like burg as well as ordinal suffixes like first=>1st
|
2015-12-19 01:30:08 -05:00 |
|
Al
|
b2a944830a
|
[transliteration] Making sure the Python script to generate transliteration data works on the new CLDR format
|
2015-12-19 00:34:30 -05:00 |
|
Al
|
b4a8a69226
|
[expansion] Fixing extra space on prefix/suffix expansions
|
2015-12-18 20:28:59 -05:00 |
|
Al
|
df47dad817
|
[fix] Partial matches, ultimate misses in concatenated suffixes
|
2015-12-18 17:37:06 -05:00 |
|
Al
|
66073c17d5
|
[fix] Handling case of concatenated suffixes like straße when they stand alone
|
2015-12-18 17:17:35 -05:00 |
|
Al
|
b71755bf7f
|
[fix] Moving Python bindings up-front in the README
|
2015-12-17 14:28:36 -05:00 |
|
Al
|
31ed88bf6a
|
[api] Adding a --json option to expand cli
|
2015-12-17 13:46:55 -05:00 |
|