Al
|
f181c04e7a
|
[expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space)
|
2015-07-16 02:49:53 -04:00 |
|
Al
|
a8b2fb5b90
|
[tokenization] Regenerating scanner file
|
2015-07-14 18:16:24 -04:00 |
|
Al
|
43293d0ae3
|
[tokenization] Fixing a tokenization where mid-number characters appear in the middle of a word+numeric sequence e.g. Zigor,2 should be 3 separate tokens. Sequences like 35,37,39 are still treated as a single token for the moment.
|
2015-07-14 18:15:58 -04:00 |
|
Al
|
a9967ec9bd
|
[numex] Regenerating numex file
|
2015-07-13 01:16:39 -04:00 |
|
Al
|
86fe289320
|
[numex] Re-generated numex data file
|
2015-07-13 00:56:48 -04:00 |
|
Al
|
fbef0a15fe
|
[geodb] Adding sparkey dependency
|
2015-07-09 15:26:11 -04:00 |
|
Al
|
4f1b4756d0
|
[geodb] Adding builder program (requires 11GB disk space and ~4GB RAM to build, but only ~300MB RAM to use after building)
|
2015-07-09 15:25:29 -04:00 |
|
Al
|
8889a5c0c3
|
[geodb] GeoDB memory allocation and I/O
|
2015-07-09 15:01:06 -04:00 |
|
Al
|
2d5641892a
|
[config] lower Bloom filter error rate
|
2015-07-09 14:59:23 -04:00 |
|
Al
|
20c6436e6d
|
[geodisambig] Return success if admin1/admin2 IDs are 0
|
2015-07-09 04:19:49 -04:00 |
|
Al
|
20303ad94f
|
[geohash] Adding bounds checks from python-geohash
|
2015-07-09 04:13:53 -04:00 |
|
Al
|
722904ce59
|
[fix] geoname_clear needs to clear feature code as well
|
2015-07-09 03:08:52 -04:00 |
|
Al
|
14500f8c7e
|
[config] Adding GeoDB default bloom filter size and error rate
|
2015-07-08 20:50:52 -04:00 |
|
Al
|
0e2a0aa56d
|
[geodisambig] adding new methods to header
|
2015-07-08 19:05:08 -04:00 |
|
Al
|
ce54a2146b
|
[fix] geo disambiguation features
|
2015-07-08 19:03:39 -04:00 |
|
Al
|
fc32a66d95
|
[fix] geonames I/O
|
2015-07-08 19:02:45 -04:00 |
|
Al
|
8c02073b54
|
[geonames] Adding country_geonames_id to both geoname and postal code structs
|
2015-07-08 18:44:21 -04:00 |
|
Al
|
9af0b0ab65
|
[geodisambig] adding a few more features to geonames disambiguation
|
2015-07-08 18:43:28 -04:00 |
|
Al
|
742079cc6a
|
[geonames] Re-generating postal/geonames fields headers
|
2015-07-08 17:02:59 -04:00 |
|
Al
|
b76f9e47d1
|
[utils] max string size for int8_t and int16_t
|
2015-07-08 16:46:12 -04:00 |
|
Al
|
c0a5607f5e
|
[fix] Adding NUM_BOUNDARY_TYPES for enumeration purposes
|
2015-07-08 16:43:57 -04:00 |
|
Al
|
24835fd088
|
[geonames] namespace specificity
|
2015-07-07 03:38:48 -04:00 |
|
Al
|
af1a5f6213
|
[trie] trie_set_data_node method
|
2015-07-07 03:38:17 -04:00 |
|
Al
|
53908ac604
|
[config] Adding geonames dir as a separate #define
|
2015-07-06 17:09:02 -04:00 |
|
Al
|
c4fd48e7f7
|
[config] geodb dir
|
2015-07-06 16:55:11 -04:00 |
|
Al
|
e7a3987656
|
[geodisambig] renaming module
|
2015-07-06 16:53:53 -04:00 |
|
Al
|
d7f73e62f1
|
[utils] Adding cstring_array_clear method
|
2015-07-06 12:48:26 -04:00 |
|
Al
|
0df816fd31
|
[geodisambig] Helper methods to add features for a given geoname/postal_code
|
2015-07-06 12:41:10 -04:00 |
|
Al
|
6ff91fef6b
|
[normalization] adding a normalize_string_latin method
|
2015-07-05 23:38:01 -04:00 |
|
Al
|
a08d59c277
|
[fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string
|
2015-07-05 15:28:07 -04:00 |
|
Al
|
47ed2e58fd
|
[geodisambig] feature functions for GeoNames disambiguation
|
2015-07-04 10:35:56 -04:00 |
|
Al
|
20a8b9611d
|
[fix] Removing feature length variables from geonames.c
|
2015-07-04 10:33:08 -04:00 |
|
Al
|
3f07cc6c71
|
[geohash] Modified geohash implementation (based on python-geohash) with no mallocs
|
2015-07-04 01:30:30 -04:00 |
|
Al
|
4fd4fa7dca
|
[fix] moving int string size constants to string_utils.h
|
2015-07-02 17:50:09 -04:00 |
|
Al
|
055e6d8905
|
[fix] typo in constant
|
2015-07-02 16:12:24 -04:00 |
|
Al
|
e273caac22
|
[geonames] generated postal code TSV fields
|
2015-07-02 16:00:06 -04:00 |
|
Al
|
fd28ee27bf
|
[geonames] generated geonames TSV fields
|
2015-07-02 15:59:54 -04:00 |
|
Al
|
6cfbab9969
|
[normalization] string normalization module for tokens and full strings
|
2015-07-01 14:52:28 -04:00 |
|
Al
|
46e51ae91e
|
[transliterate] no need to strdup transliterator names if they are lowercased, breaking on NUL byte
|
2015-07-01 14:51:22 -04:00 |
|
Al
|
b58877ec6c
|
[utils] string_is_lower/string_is_upper method
|
2015-07-01 14:49:22 -04:00 |
|
Al
|
d0db015667
|
[geodisambig] Adding new fields to geonames struct, plus I/O
|
2015-07-01 13:02:00 -04:00 |
|
Al
|
af56c3cd09
|
[config] constants
|
2015-07-01 13:01:22 -04:00 |
|
Al
|
fa643f7a3a
|
[utf8] Moving language length constant
|
2015-06-30 19:17:20 -04:00 |
|
Al
|
8d64c9301e
|
[transliteration] Re-generating transliteration data file
|
2015-06-29 15:03:59 -04:00 |
|
Al
|
3279b31b09
|
[tokenization] Adding an acronym token type for things like U.N. so we can delete internal periods on those tokens
|
2015-06-29 03:00:46 -04:00 |
|
Al
|
47efce4b7e
|
[transliteration] Stopping set check loop on empty transition
|
2015-06-28 20:46:23 -04:00 |
|
Al
|
cc0401a8d1
|
[utf8] Adding a boolean struct member for string_script_t return values, set to true if the string is ASCII (no transliteration needed, should be frequent for English addresses)
|
2015-06-28 19:37:58 -04:00 |
|
Al
|
f0bf7e750c
|
[transliteration] Fixing edge case in transliteration where a naked character fails context matching but the set-wrapped version matches
|
2015-06-28 15:19:19 -04:00 |
|
Al
|
a5dacf3d2b
|
[utils] Adding method to get a particular token alternative from a string tree
|
2015-06-28 15:15:29 -04:00 |
|
Al
|
246237c1f1
|
[transliteration] Adding a get_transliteration_table() to foreach_transliterator macro since it lives in the header
|
2015-06-28 15:14:49 -04:00 |
|