Commit Graph

335 Commits

Author SHA1 Message Date
Al
4a2be72350 [geonames] Adding language priorities for sorting (official language names, canonical names, abbreviations, historical) 2015-07-08 16:42:42 -04:00
Al
95a6845a85 [i18n] Adding regional languages as valid country languages 2015-07-08 14:54:00 -04:00
Al
400c23cb5a [fix] tabs 2015-07-08 14:53:16 -04:00
Al
ef1ecb97f7 [geonames] Adding geonames_id for countries in places/postal codes. For postal codes, sorting desc by country population (10013 is a postal code in Italy but will default to US with no other information) 2015-07-08 13:30:57 -04:00
Al
6cc677ac0b [geonames] Adding defaults to schema and another index on country code 2015-07-08 13:16:01 -04:00
Al
24835fd088 [geonames] namespace specificity 2015-07-07 03:38:48 -04:00
Al
af1a5f6213 [trie] trie_set_data_node method 2015-07-07 03:38:17 -04:00
Al
53908ac604 [config] Adding geonames dir as a separate #define 2015-07-06 17:09:02 -04:00
Al
c4fd48e7f7 [config] geodb dir 2015-07-06 16:55:11 -04:00
Al
e7a3987656 [geodisambig] renaming module 2015-07-06 16:53:53 -04:00
Al
d7f73e62f1 [utils] Adding cstring_array_clear method 2015-07-06 12:48:26 -04:00
Al
0df816fd31 [geodisambig] Helper methods to add features for a given geoname/postal_code 2015-07-06 12:41:10 -04:00
Al
0c5e741bb6 [geonames] Adding LC_ALL environment variable for utf8 sorting 2015-07-06 00:39:23 -04:00
Al
6ff91fef6b [normalization] adding a normalize_string_latin method 2015-07-05 23:38:01 -04:00
Al
acd5d07d17 [geonames] Storing NFD normalized names and sorting case-insensitive in order to group everything with the same normalized name together 2015-07-05 15:56:46 -04:00
Al
a08d59c277 [fix] NFD normalization should be the default in normalize.c, not NFKD, as NFKD does some unwanted things like converting superscripts and the Latin-ASCII transliterator does a better, more thorough job while staying faithful to the original string 2015-07-05 15:28:07 -04:00
Al
47ed2e58fd [geodisambig] feature functions for GeoNames disambiguation 2015-07-04 10:35:56 -04:00
Al
20a8b9611d [fix] Removing feature length variables from geonames.c 2015-07-04 10:33:08 -04:00
Al
3f07cc6c71 [geohash] Modified geohash implementation (based on python-geohash) with no mallocs 2015-07-04 01:30:30 -04:00
Al
f825dcb939 [geonames] Fixing admin table DDL 2015-07-03 05:54:41 -04:00
Al
4fd4fa7dca [fix] moving int string size constants to string_utils.h 2015-07-02 17:50:09 -04:00
Al
055e6d8905 [fix] typo in constant 2015-07-02 16:12:24 -04:00
Al
e273caac22 [geonames] generated postal code TSV fields 2015-07-02 16:00:06 -04:00
Al
fd28ee27bf [geonames] generated geonames TSV fields 2015-07-02 15:59:54 -04:00
Al
86b23ecca3 [fix] field name 2015-07-02 15:59:11 -04:00
Al
6cfbab9969 [normalization] string normalization module for tokens and full strings 2015-07-01 14:52:28 -04:00
Al
46e51ae91e [transliterate] no need to strdup transliterator names if they are lowercased, breaking on NUL byte 2015-07-01 14:51:22 -04:00
Al
b58877ec6c [utils] string_is_lower/string_is_upper method 2015-07-01 14:49:22 -04:00
Al
58c6ff104a [fix] Russian feminine ordinals 2015-07-01 13:57:42 -04:00
Al
d0db015667 [geodisambig] Adding new fields to geonames struct, plus I/O 2015-07-01 13:02:00 -04:00
Al
af56c3cd09 [config] constants 2015-07-01 13:01:22 -04:00
Al
fa643f7a3a [utf8] Moving language length constant 2015-06-30 19:17:20 -04:00
Al
071d6bb392 [geodisambig] Adding presence of a Wikipedia link to the GeoNames output (an unqualified entry for the name in Wikipeida usually indicates a primary meaning). Ranking ambiguous entries for each term so that the top entry should be selected if no further information is available 2015-06-30 18:00:07 -04:00
Al
8d64c9301e [transliteration] Re-generating transliteration data file 2015-06-29 15:03:59 -04:00
Al
a580ed0b1b [transliteration] Adding numeric HTML escapes e.g. '&' 2015-06-29 15:02:34 -04:00
Al
3279b31b09 [tokenization] Adding an acronym token type for things like U.N. so we can delete internal periods on those tokens 2015-06-29 03:00:46 -04:00
Al
47efce4b7e [transliteration] Stopping set check loop on empty transition 2015-06-28 20:46:23 -04:00
Al
cc0401a8d1 [utf8] Adding a boolean struct member for string_script_t return values, set to true if the string is ASCII (no transliteration needed, should be frequent for English addresses) 2015-06-28 19:37:58 -04:00
Al
f0bf7e750c [transliteration] Fixing edge case in transliteration where a naked character fails context matching but the set-wrapped version matches 2015-06-28 15:19:19 -04:00
Al
a5dacf3d2b [utils] Adding method to get a particular token alternative from a string tree 2015-06-28 15:15:29 -04:00
Al
246237c1f1 [transliteration] Adding a get_transliteration_table() to foreach_transliterator macro since it lives in the header 2015-06-28 15:14:49 -04:00
Al
0f3bcaf49c [dictionaries] Flatter hierarchy for dictionaries 2015-06-26 13:14:14 -04:00
Al
7c161ee5b6 [numex] Regenerating numex data file 2015-06-26 12:36:40 -04:00
Al
d21f8135f3 [numex] Adding full stop ordinal indicators to German, Danish and Polish 2015-06-26 12:35:53 -04:00
Al
6a8ab48662 [numex] Adding method to get ordinal suffixes, using single representation 2015-06-25 17:28:06 -04:00
Al
9337bf9aea [phrases] trie_search_suffixes uses the NUL-byte prefix by default but the _from_index version can start from another node. fixing single character suffixes 2015-06-25 17:24:19 -04:00
Al
82e85732c4 [fix] Setting codepoint in utf8proc_iterate_reversed 2015-06-25 17:20:55 -04:00
Al
4fbcb72368 [fix] utf8proc option 2015-06-25 10:07:37 -04:00
Al
c376bcef3d [utils] get_string_script returns a struct rather than modifying a pointer for the length 2015-06-25 10:06:38 -04:00
Al
bcee9832b3 [utils] cstring_array_get_token=>cstring_array_get_string 2015-06-25 10:05:35 -04:00