libpostal

Author	SHA1	Message	Date
Al	71ffdf9cbc	[expansion] tokenized version of search_address_dictionaries	2015-07-25 13:50:53 -04:00
Al	ee96dab93c	[fix] unnecessary headers	2015-07-25 13:49:42 -04:00
Al	e549e76806	[utils] string_tree_iterator_foreach_token	2015-07-25 13:49:02 -04:00
Al	2adaf475c2	[utils] cstring_array (contiguous) to array of malloc'd strings	2015-07-25 12:14:01 -04:00
Al	e9277d7339	[utils] vector extend method	2015-07-25 01:33:45 -04:00
Al	9fb1eae877	[expansion] Regenerating address data file	2015-07-24 16:09:22 -04:00
Al	351c7c8c2e	[expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace	2015-07-24 16:02:47 -04:00
Al	90a91cadd0	[search] Modifying trie_search_prefixes to use the new key schema	2015-07-24 15:59:49 -04:00
Al	bb7688d8d1	[phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc.	2015-07-24 15:56:19 -04:00
Al	359cd62e20	[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"	2015-07-24 15:31:05 -04:00
Al	12959aa483	[numex] Re-generating numex data	2015-07-24 15:24:03 -04:00
Al	5239c365d0	[docs] Adding some documentation for normalize.h options	2015-07-24 15:23:25 -04:00
Al	96538469dd	[utils] Adding a cstring_array_foreach macro	2015-07-23 15:57:12 -04:00
Al	27af28eacf	[expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array	2015-07-22 20:35:29 -04:00
Al	454be89121	[expansion] generated header and data files	2015-07-22 20:31:54 -04:00
Al	0a9e92f11f	[expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary	2015-07-22 17:21:09 -04:00
Al	09004aa5f1	[expansion] Constant for the "all" dictionary	2015-07-22 17:18:19 -04:00
Al	f61d993157	[expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language	2015-07-22 03:51:28 -04:00
Al	3da4b5d8c2	[numex] New numex generated data file	2015-07-22 02:24:16 -04:00
Al	ba8ff2b0c6	[expansion] Language prefixed keys	2015-07-22 02:16:22 -04:00
Al	157727d249	[fix] method name, strlen and fclose	2015-07-22 02:15:45 -04:00
Al	a38b924c5d	[fix] add_token_alternatives	2015-07-21 17:26:59 -04:00
Al	71be52275d	[tokenization] Adding a version which of tokenize which keeps whitespace tokens	2015-07-21 17:26:20 -04:00
Al	5d21cb1604	[expansion] Address dictionary builder	2015-07-21 16:46:57 -04:00
Al	6eccde0df8	[fix] trie_set_data_at_index	2015-07-21 16:46:38 -04:00
Al	c798876b3d	[expansion] Address dictionary allocation, I/O, get/set	2015-07-21 16:46:15 -04:00
Al	3509b203f8	[gazetteers] Moving data out of the header file	2015-07-21 16:06:49 -04:00
Al	179918917a	[fix] header guard and include	2015-07-21 15:38:45 -04:00
Al	f99a90d64e	[expansion] Generated data file for address expansions	2015-07-21 15:38:10 -04:00
Al	68a6d8ee33	[fix] return NULL from transliterator_read on failure	2015-07-21 00:58:01 -04:00
Al	9360ff2c4b	[geodb] geodb_builder using new trie_get/set_data_at_index methds	2015-07-20 16:53:48 -04:00
Al	9374745140	[fix] var name and placement	2015-07-20 16:53:19 -04:00
Al	9f697e0256	[transliteration] transliterate now using the new trie_get_data_at_index API	2015-07-20 16:47:56 -04:00
Al	7f96726e82	[phrases] Adding trie_get_data/trie_set_data + at_index methods	2015-07-20 16:39:58 -04:00
Al	b9771921fc	[fix] Path joins in geodb_builder use new char_array methods	2015-07-20 16:31:43 -04:00
Al	d55d505329	[phrases] trie_get_data and trie_set_data interface for simpler dictionary-style trie get/set	2015-07-20 16:29:48 -04:00
Al	1d7247d7e1	[polygons] Adding Belgium regional languages	2015-07-17 00:53:25 -04:00
Al	5f2be3022b	[expansion] dictionary_type_t enum instead of uint64_t	2015-07-16 03:49:37 -04:00
Al	f713c53993	[utils] Adding an option to char_array_add_joined to strip separators for path manipulation	2015-07-16 03:49:00 -04:00
Al	f181c04e7a	[expansion] expansion rule structs and Python script to generate rules from dictionaries tree. Note that a canonical_index of -1 indicates that a given phrase is the canonical (saves space)	2015-07-16 02:49:53 -04:00
Al	a8b2fb5b90	[tokenization] Regenerating scanner file	2015-07-14 18:16:24 -04:00
Al	43293d0ae3	[tokenization] Fixing a tokenization where mid-number characters appear in the middle of a word+numeric sequence e.g. Zigor,2 should be 3 separate tokens. Sequences like 35,37,39 are still treated as a single token for the moment.	2015-07-14 18:15:58 -04:00
Al	a9967ec9bd	[numex] Regenerating numex file	2015-07-13 01:16:39 -04:00
Al	86fe289320	[numex] Re-generated numex data file	2015-07-13 00:56:48 -04:00
Al	fbef0a15fe	[geodb] Adding sparkey dependency	2015-07-09 15:26:11 -04:00
Al	4f1b4756d0	[geodb] Adding builder program (requires 11GB disk space and ~4GB RAM to build, but only ~300MB RAM to use after building)	2015-07-09 15:25:29 -04:00
Al	8889a5c0c3	[geodb] GeoDB memory allocation and I/O	2015-07-09 15:01:06 -04:00
Al	2d5641892a	[config] lower Bloom filter error rate	2015-07-09 14:59:23 -04:00
Al	20c6436e6d	[geodisambig] Return success if admin1/admin2 IDs are 0	2015-07-09 04:19:49 -04:00
Al	20303ad94f	[geohash] Adding bounds checks from python-geohash	2015-07-09 04:13:53 -04:00

1 2 3 4 5 ...

285 Commits