Al
|
0df816fd31
|
[geodisambig] Helper methods to add features for a given geoname/postal_code
|
2015-07-06 12:41:10 -04:00 |
|
Al
|
0c5e741bb6
|
[geonames] Adding LC_ALL environment variable for utf8 sorting
|
2015-07-06 00:39:23 -04:00 |
|
Al
|
6ff91fef6b
|
[normalization] adding a normalize_string_latin method
|
2015-07-05 23:38:01 -04:00 |
|
Al
|
acd5d07d17
|
[geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together
|
2015-07-05 15:56:46 -04:00 |
|
Al
|
a08d59c277
|
[fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string
|
2015-07-05 15:28:07 -04:00 |
|
Al
|
47ed2e58fd
|
[geodisambig] feature functions for GeoNames disambiguation
|
2015-07-04 10:35:56 -04:00 |
|
Al
|
20a8b9611d
|
[fix] Removing feature length variables from geonames.c
|
2015-07-04 10:33:08 -04:00 |
|
Al
|
3f07cc6c71
|
[geohash] Modified geohash implementation (based on python-geohash) with no mallocs
|
2015-07-04 01:30:30 -04:00 |
|
Al
|
f825dcb939
|
[geonames] Fixing admin table DDL
|
2015-07-03 05:54:41 -04:00 |
|
Al
|
4fd4fa7dca
|
[fix] moving int string size constants to string_utils.h
|
2015-07-02 17:50:09 -04:00 |
|
Al
|
055e6d8905
|
[fix] typo in constant
|
2015-07-02 16:12:24 -04:00 |
|
Al
|
e273caac22
|
[geonames] generated postal code TSV fields
|
2015-07-02 16:00:06 -04:00 |
|
Al
|
fd28ee27bf
|
[geonames] generated geonames TSV fields
|
2015-07-02 15:59:54 -04:00 |
|
Al
|
86b23ecca3
|
[fix] field name
|
2015-07-02 15:59:11 -04:00 |
|
Al
|
6cfbab9969
|
[normalization] string normalization module for tokens and full strings
|
2015-07-01 14:52:28 -04:00 |
|
Al
|
46e51ae91e
|
[transliterate] no need to strdup transliterator names if they are lowercased, breaking on NUL byte
|
2015-07-01 14:51:22 -04:00 |
|
Al
|
b58877ec6c
|
[utils] string_is_lower/string_is_upper method
|
2015-07-01 14:49:22 -04:00 |
|
Al
|
58c6ff104a
|
[fix] Russian feminine ordinals
|
2015-07-01 13:57:42 -04:00 |
|
Al
|
d0db015667
|
[geodisambig] Adding new fields to geonames struct, plus I/O
|
2015-07-01 13:02:00 -04:00 |
|
Al
|
af56c3cd09
|
[config] constants
|
2015-07-01 13:01:22 -04:00 |
|
Al
|
fa643f7a3a
|
[utf8] Moving language length constant
|
2015-06-30 19:17:20 -04:00 |
|
Al
|
071d6bb392
|
[geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available
|
2015-06-30 18:00:07 -04:00 |
|
Al
|
8d64c9301e
|
[transliteration] Re-generating transliteration data file
|
2015-06-29 15:03:59 -04:00 |
|
Al
|
a580ed0b1b
|
[transliteration] Adding numeric HTML escapes e.g. '&'
|
2015-06-29 15:02:34 -04:00 |
|
Al
|
3279b31b09
|
[tokenization] Adding an acronym token type for things like U.N. so we can delete internal periods on those tokens
|
2015-06-29 03:00:46 -04:00 |
|
Al
|
47efce4b7e
|
[transliteration] Stopping set check loop on empty transition
|
2015-06-28 20:46:23 -04:00 |
|
Al
|
cc0401a8d1
|
[utf8] Adding a boolean struct member for string_script_t return values, set to true if the string is ASCII (no transliteration needed, should be frequent for English addresses)
|
2015-06-28 19:37:58 -04:00 |
|
Al
|
f0bf7e750c
|
[transliteration] Fixing edge case in transliteration where a naked character fails context matching but the set-wrapped version matches
|
2015-06-28 15:19:19 -04:00 |
|
Al
|
a5dacf3d2b
|
[utils] Adding method to get a particular token alternative from a string tree
|
2015-06-28 15:15:29 -04:00 |
|
Al
|
246237c1f1
|
[transliteration] Adding a get_transliteration_table() to foreach_transliterator macro since it lives in the header
|
2015-06-28 15:14:49 -04:00 |
|
Al
|
0f3bcaf49c
|
[dictionaries] Flatter hierarchy for dictionaries
|
2015-06-26 13:14:14 -04:00 |
|
Al
|
7c161ee5b6
|
[numex] Regenerating numex data file
|
2015-06-26 12:36:40 -04:00 |
|
Al
|
d21f8135f3
|
[numex] Adding full stop ordinal indicators to German, Danish and Polish
|
2015-06-26 12:35:53 -04:00 |
|
Al
|
6a8ab48662
|
[numex] Adding method to get ordinal suffixes, using single representation
|
2015-06-25 17:28:06 -04:00 |
|
Al
|
9337bf9aea
|
[phrases] trie_search_suffixes uses the NUL-byte prefix by default but the _from_index version can start from another node. fixing single character suffixes
|
2015-06-25 17:24:19 -04:00 |
|
Al
|
82e85732c4
|
[fix] Setting codepoint in utf8proc_iterate_reversed
|
2015-06-25 17:20:55 -04:00 |
|
Al
|
4fbcb72368
|
[fix] utf8proc option
|
2015-06-25 10:07:37 -04:00 |
|
Al
|
c376bcef3d
|
[utils] get_string_script returns a struct rather than modifying a pointer for the length
|
2015-06-25 10:06:38 -04:00 |
|
Al
|
bcee9832b3
|
[utils] cstring_array_get_token=>cstring_array_get_string
|
2015-06-25 10:05:35 -04:00 |
|
Al
|
2b69c185fa
|
[tokenization] Adding a tokenizer method for appending to an existing tokens array (e.g. can stop/start tokenizing on a script change)
|
2015-06-25 10:03:34 -04:00 |
|
Al
|
581cf406a6
|
[utf8] Adding length argument to string_script function
|
2015-06-24 13:39:09 -05:00 |
|
Al
|
5e71a9d805
|
[utf8] Adding method to get the script of a string and the length of the span (rolls Common script up with the previuos script)
|
2015-06-24 13:29:40 -05:00 |
|
Al
|
85348e1178
|
[fix] enum value conflicted with existing name
|
2015-06-23 15:38:59 -05:00 |
|
Al
|
077e7fd5e2
|
[transliteration] Adding script/language lookups and I/O
|
2015-06-23 15:35:52 -05:00 |
|
Al
|
423d9ca7b7
|
[transliteration] table builder adds script/language rules
|
2015-06-23 15:35:16 -05:00 |
|
Al
|
c3143e5291
|
[transliteration] Adding structs/header stuff for transliterator lookup by script/language
|
2015-06-23 15:34:38 -05:00 |
|
Al
|
8fb6a28e9c
|
[fix] using empty string instead of NULL for script languages so we can use fixed length arrays
|
2015-06-23 15:20:09 -05:00 |
|
Al
|
f2d03a7937
|
[fix] renaming structure
|
2015-06-23 02:12:24 -05:00 |
|
Al
|
7dd772de0f
|
[fix] implementation of cstring_array_split
|
2015-06-23 02:11:24 -05:00 |
|
Al
|
d4cae97fd3
|
[transliteration] regenerated scripts data file
|
2015-06-23 02:10:10 -05:00 |
|