Al
|
aaa1fc0387
|
[fix] Stepping through codepoints first then through chars in trie_search_prefixes_from_index (used in transliteration and numex)
|
2015-12-23 01:58:39 -05:00 |
|
Al
|
baa8e3cc3f
|
[fix] Compare the remaining part of the current UTF-8 character using simple string comparison, since it may be in the middle of a valid UTF-8 character
|
2015-12-21 20:34:15 -05:00 |
|
Al
|
ceda863e9f
|
[fix] Encode strings as JSON in address parser cli
|
2015-12-21 17:45:09 -05:00 |
|
Al
|
e55ff54be1
|
[fix] Adding Korean-Latin-BGN to excluded transliterators
|
2015-12-21 16:24:50 -05:00 |
|
Al
|
c7fb7f685d
|
[transliteration] Fixing group replacement in transliteration in the case of multiple groups, not adding to phrase length when checking context
|
2015-12-21 16:06:04 -05:00 |
|
Al
|
ab124465e6
|
[fix] regenerating transliteration data
|
2015-12-20 15:41:42 -05:00 |
|
Al
|
5439f4679f
|
[fix] Special tokens like emails/urls/phone numbers bypass normalization
|
2015-12-20 03:07:36 -05:00 |
|
Al
|
cf2a0efa11
|
[fix] Prefixes and suffixes that are the same length as the original token should be handled as regular expansions
|
2015-12-19 17:29:26 -05:00 |
|
Al
|
aaecd7961a
|
[fix] Options out of order
|
2015-12-19 15:05:50 -05:00 |
|
Al
|
48cb2b5c7b
|
[api] Node was complaining about non-trivial designated initializers (probably the bit fields), so converting to old-school initializer
|
2015-12-19 02:34:31 -05:00 |
|
Al
|
97906c86a8
|
[fix] Strip punctuation in final output in cases where there are no expansions
|
2015-12-19 02:10:41 -05:00 |
|
Al
|
4497c4501e
|
[fix] do not add a token if prefix/suffix expansions are inseparable and canonical
|
2015-12-19 01:36:02 -05:00 |
|
Al
|
f8da44e8b0
|
[fix] Making a copy even on pure Latin-script transliteration since string_trim modifies in-place, occasionally causes issues
|
2015-12-19 01:31:56 -05:00 |
|
Al
|
39e83961ef
|
[fix] Bug in suffix expansion affecting inseparable suffixes like burg as well as ordinal suffixes like first=>1st
|
2015-12-19 01:30:08 -05:00 |
|
Al
|
b4a8a69226
|
[expansion] Fixing extra space on prefix/suffix expansions
|
2015-12-18 20:28:59 -05:00 |
|
Al
|
df47dad817
|
[fix] Partial matches, ultimate misses in concatenated suffixes
|
2015-12-18 17:37:06 -05:00 |
|
Al
|
66073c17d5
|
[fix] Handling case of concatenated suffixes like straße when they stand alone
|
2015-12-18 17:17:35 -05:00 |
|
Al
|
31ed88bf6a
|
[api] Adding a --json option to expand cli
|
2015-12-17 13:46:55 -05:00 |
|
Al
|
41ea105bb4
|
[api] Simple JSON encoding for strings, UTF-8 rather than Unicode
|
2015-12-17 12:25:05 -05:00 |
|
Al
|
af78614f62
|
[fix] Print usage info on -h/--help to libpostal cli
|
2015-12-16 22:21:13 -05:00 |
|
Al
|
e0c0ed2d04
|
[numex] Return true if numex table already loaded
|
2015-12-15 14:28:40 -05:00 |
|
Al
|
b9bf5c629e
|
[fix] Moving address_parser_response_destroy into libpostal so caller can free
|
2015-12-15 00:52:24 -05:00 |
|
Al
|
b59c830ba6
|
[fix] warning about size_t
|
2015-12-14 18:17:09 -05:00 |
|
Al
|
406f9c533d
|
[api] Separating parser setup/teardown into two separate methods
|
2015-12-14 18:15:57 -05:00 |
|
Al
|
43b212a09b
|
[fix] size_t in benchmark script
|
2015-12-14 14:57:11 -05:00 |
|
Al
|
dc03c83bb2
|
[math] Adding an aligned memory allocator for vectors to help with vectorization/SIMD
|
2015-12-14 14:56:38 -05:00 |
|
Al
|
bd1e8ecaf8
|
[fix] default address parser dir
|
2015-12-12 12:55:37 -05:00 |
|
Al
|
2950358697
|
[build] address_parser client now links to libpostal, adding address_parser to download script with an "all" option
|
2015-12-12 12:49:50 -05:00 |
|
Al
|
88836e56e1
|
[api] Adding parse_address implementation to the libpostal API. GeoDB and address parser are now required. Stripping punctuation from the normalized output
|
2015-12-12 12:47:44 -05:00 |
|
Al
|
bce6ba2595
|
[fix] typedef
|
2015-12-12 11:58:41 -05:00 |
|
Al
|
a8d6cc4053
|
[api] Moving parse_address definition into libpostal.h
|
2015-12-12 03:55:31 -05:00 |
|
Al
|
fe4c528f26
|
[parser] Using different char_array for each of the potential phrases as token i
|
2015-12-12 03:23:26 -05:00 |
|
Al
|
e6303f70f3
|
[fix] removing printf
|
2015-12-11 02:53:22 -05:00 |
|
Al
|
671dd4a5d2
|
[parser] Fixing possible invalid writes in training for values beginning with a separator
|
2015-12-11 02:05:05 -05:00 |
|
Al
|
743b74aea5
|
[parser] Simplifying args in address_parser_data_set_tokenize_line
|
2015-12-10 18:48:23 -05:00 |
|
Al
|
88b8023ac8
|
[fix] Bug in address parser feature extraction, can hold onto the wrong pointer
|
2015-12-10 18:42:28 -05:00 |
|
Al
|
3de59506ae
|
[parser] Internal separators for parsing purposes include open/close parens, at sign, semicolon, etc. Ignore stray colons not internal to a word (as in Swedish abbreviations)
|
2015-12-10 18:08:51 -05:00 |
|
Al
|
71d6d3c5e1
|
[utils] Removing kvec and using similar implementation with pointers that can be passed around
|
2015-12-10 17:52:23 -05:00 |
|
Al
|
ab205eff96
|
[utils] Adding a default small size to all arrays based on a look at malloc/realloc usage
|
2015-12-09 19:46:09 -05:00 |
|
Al
|
f252869671
|
[dictionaries] adding ste to English dictionaries
|
2015-12-08 22:29:52 -05:00 |
|
Al
|
fe37286bcf
|
[fix] Fixes to matrix methods
|
2015-12-08 17:33:38 -05:00 |
|
Al
|
d9d53ce17e
|
[math] Matrix method updates
|
2015-12-08 15:39:52 -05:00 |
|
Al
|
48ee665e71
|
[scripts] Benchmark script using default options
|
2015-12-08 15:38:44 -05:00 |
|
Al
|
2fcc72ae07
|
[fix] multitoken canonical strings
|
2015-12-08 15:38:04 -05:00 |
|
Al
|
a857138d95
|
[api] Adding place name expansions by default
|
2015-12-08 15:31:36 -05:00 |
|
Al
|
beec43fe15
|
[expansion] regenerating expansion data
|
2015-12-08 15:28:54 -05:00 |
|
Al
|
e1ea2ac704
|
[expansion] Toponym dictionaries can apply to street names and place names
|
2015-12-08 02:10:22 -05:00 |
|
Al
|
cbe5cd7429
|
[expansion] The ambiguous expansions dictionary shouldn't add to the component bitset
|
2015-12-07 20:36:56 -05:00 |
|
Al
|
d35f519629
|
[expansion] Fixing case where non-ideographic tokens like # can potentially be concatenated with surrounding tokens and should normalized with whitespace in between
|
2015-12-07 19:18:46 -05:00 |
|
Al
|
f5739dd42b
|
[math] Signatures for array_exp and array_log
|
2015-12-07 18:10:04 -05:00 |
|