Commit Graph

  • bbaa302e2e [fix] NUMEX_STOPWORD_RULE define Al 2015-08-09 01:03:23 -04:00
  • 5383640c14 [fix] cast Al 2015-08-09 01:01:11 -04:00
  • dd391eabe5 [numex] Separating rules from keys for Linux gcc compilation Al 2015-08-09 01:00:57 -04:00
  • e346b831cb [build] public-read permissions when uploading to S3 Al 2015-08-09 00:17:04 -04:00
  • ad584671c4 [build] Not compiling with -Werror for now Al 2015-08-09 00:02:41 -04:00
  • f170f70727 [build] Link to math library Al 2015-08-09 00:01:44 -04:00
  • 423e2c86c7 [build] builder programs are now in noinst_PROGRAMS, Makefile target to upload data tarball to S3 (with proper credentials) Al 2015-08-08 23:29:34 -04:00
  • a5ce1f12dd [fix] stdint header in address expansion rule generation script Al 2015-08-08 23:28:11 -04:00
  • ee982cd872 [dictionaries] Removing dictionaries/all/personal_suffixes, can add to languages as needed Al 2015-08-08 23:13:02 -04:00
  • 5acf7a4f3e [phrases] resetting node position when continuation falls off the trie Al 2015-08-08 22:17:58 -04:00
  • a77c8e1321 [build] Adding bootstrap.sh script and removing configure from version control Al 2015-08-08 21:22:11 -04:00
  • cd0f95f9e2 [fix] making transliteration path relative to data dir Al 2015-08-08 21:05:10 -04:00
  • 2ba0e814ad [build] better autoconf checks for time and dirent headers Al 2015-08-08 21:01:51 -04:00
  • d0679450e3 [config] Including Autoconf config.h in internal config Al 2015-08-08 20:50:23 -04:00
  • 5df9e123af [numex] Fix to whole_tokens_only numeric experession parsing where numex was pushing a number onto the stack even on encountering a new rule context even though the token was not completely parsed Al 2015-08-08 20:49:54 -04:00
  • 53f54d6454 [fix] removing comment Al 2015-08-08 20:23:49 -04:00
  • 2106a6cfe4 [build] Adding command-line test and bench programs Al 2015-08-08 19:44:50 -04:00
  • 5aa2e99b92 [fix] data dir for tar extraction Al 2015-08-08 19:42:37 -04:00
  • 54aa6fe7df [build] Fixing runtime check/save of last updated file for package data tarball Al 2015-08-08 17:15:57 -04:00
  • f38a53601b [rm] Better not to keep that file in the repo Al 2015-08-08 02:41:54 -04:00
  • 770f44198c [build] Adding default file to track last updated date Al 2015-08-08 02:30:42 -04:00
  • c0c21b81f2 [build] Adding generated configure script Al 2015-08-07 17:35:40 -04:00
  • a197d04b1a [fix] float comparison Al 2015-08-07 17:28:15 -04:00
  • f161f68d53 [build] Changes to Makefile.am to build on Debian/Ubuntu, fixing downloading of the data tarball for Mac and Linux Al 2015-08-07 17:27:34 -04:00
  • 9b69d1f67a [fix] Removing C++ checks from all but the main API functions Al 2015-08-07 16:30:07 -04:00
  • 359a1efb03 [fix] Adding stdint.h include to most of the header files for portability Al 2015-08-07 02:43:27 -04:00
  • 0738a57caa [fix] restoring ctype.h include Al 2015-08-07 01:52:01 -04:00
  • 06d2e916a1 [fix] includes, matters on GCC/Linux Al 2015-08-07 01:51:34 -04:00
  • ae9825b9f9 [build] Fixing data dir download in Automake file Al 2015-08-07 01:51:06 -04:00
  • d7ebcd046e [fix] includes Al 2015-08-07 01:00:26 -04:00
  • f246c2ee95 [api] Adding address component constants to libpostal.h, returning char ** instead of a cstring_array to simplify API/dependencies Al 2015-08-06 17:52:54 -04:00
  • 61d586fa1d [config] config.h=>libpostal_config.h so as not to conflict with autoconf Al 2015-08-06 17:49:35 -04:00
  • 2bedb695a2 [build] adding Automake file in src, including rule to download data dir tarball Al 2015-08-06 17:48:37 -04:00
  • 4b9f11eca5 [build] Main Automake file and modified version of Sparkey's Automake file Al 2015-08-06 02:14:33 -04:00
  • fe078cff66 [build] Adding Autoconf file Al 2015-08-06 02:13:43 -04:00
  • 1d39916aaa [fix] Fixing warnings in unicode script data Al 2015-08-02 21:30:54 -06:00
  • 770ce4256f [expansion] Re-generating address expansion data file Al 2015-08-02 21:30:19 -06:00
  • 90cde298dd [dictionaries] condensed forms of sin numero in various languages Al 2015-08-02 21:19:55 -06:00
  • 753c6efb1d [api] Initial libpostal API, combining string normalization, transliteration, numex and address dictionaries Al 2015-08-02 14:38:10 -06:00
  • b27030e39f [fix] tokenized trie search was skipping tokens in some cases Al 2015-08-02 14:36:21 -06:00
  • 3178eda501 [utils] string_contains_hyphen method Al 2015-08-02 14:35:18 -06:00
  • 46141a6c36 [normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion Al 2015-08-02 14:34:32 -06:00
  • f10dd49c58 [expansion] NULL_CANONICAL_INDEX constant Al 2015-08-01 23:59:16 -06:00
  • 6bf563ca89 [dictionaries] Italian abbreviations for strada Al 2015-07-28 19:15:30 -04:00
  • fe4789a665 [fix] compiler warnings Al 2015-07-28 19:14:00 -04:00
  • 551904d202 [normalize] cstring_array instead of string_tree for token-based normalization Al 2015-07-28 19:09:50 -04:00
  • 90d4da9e72 [geodb] Adding an is_canonical bit field to geodb trie values Al 2015-07-28 19:08:24 -04:00
  • 9bc902f575 [numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization Al 2015-07-28 18:12:09 -04:00
  • df1410da8c [numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre Al 2015-07-28 18:11:19 -04:00
  • a16f0dabcb [numex] Fixing hyphen-initial numeric phrases that end the string Al 2015-07-28 03:28:44 -04:00
  • 3dc6115a4e [dictionaries] Updates to English and Spanish dictionaries on looking through a data set of real test addresses Al 2015-07-27 16:42:06 -04:00
  • 0f5b69c06b [fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node Al 2015-07-27 16:35:18 -04:00
  • 243f327928 [fix] NULL check Al 2015-07-27 16:31:55 -04:00
  • 7aee159c0c [utils] string_tree_num_tokens Al 2015-07-27 12:36:34 -04:00
  • b812d90c59 [fix] specifying numex dir with cross-platform PATH_SEPARATOR Al 2015-07-27 12:36:06 -04:00
  • 7ff9a6054d [geodb] trim strings in geodb builder Al 2015-07-27 02:37:20 -04:00
  • 053b987d58 [normalize] adding an option for string trimming in normalize Al 2015-07-27 01:59:14 -04:00
  • b94526a27b [utils] Making string_trim handle all kinds of UTF-8 whitespace/separators Al 2015-07-27 01:55:46 -04:00
  • eab4c554d6 [numex] Regenerating numex data file Al 2015-07-27 01:53:13 -04:00
  • 0ab1434f20 [numex] Making all languages except the ideographic writing systems (CJK) whole_tokens_only for numex. Otherwise non-number prefixes may accidentally get converted into numbers. May add some more options around this in the future. Al 2015-07-27 01:52:41 -04:00
  • d2539f5b57 [numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals Al 2015-07-27 01:44:33 -04:00
  • 8ff4ace63b [phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly Al 2015-07-26 23:41:57 -04:00
  • 38b10b9dd0 [fix] Clearing paths before reuse in geodb_builder Al 2015-07-26 23:36:34 -04:00
  • 93042761ac [fix] warnings in string_utils.c Al 2015-07-26 23:36:03 -04:00
  • 50ee95ff7d [geodb] Adding a msgpack'd list of ids for naked string keys in geodb builder Al 2015-07-25 18:42:13 -04:00
  • a67ec44a08 [utils] cstring_array_terminate, moving msgpack_utils to separate file Al 2015-07-25 18:39:57 -04:00
  • 42f6be7434 [fix] county road Al 2015-07-25 14:19:38 -04:00
  • 2ff8c0fd1e [transliteration] fixing length-based transliteration Al 2015-07-25 13:53:28 -04:00
  • 71ffdf9cbc [expansion] tokenized version of search_address_dictionaries Al 2015-07-25 13:50:53 -04:00
  • ee96dab93c [fix] unnecessary headers Al 2015-07-25 13:49:42 -04:00
  • e549e76806 [utils] string_tree_iterator_foreach_token Al 2015-07-25 13:49:02 -04:00
  • 2adaf475c2 [utils] cstring_array (contiguous) to array of malloc'd strings Al 2015-07-25 12:13:50 -04:00
  • e9277d7339 [utils] vector extend method Al 2015-07-25 01:33:45 -04:00
  • cdb9afddd3 [fix] address training data carriage returns Al 2015-07-25 00:35:27 -04:00
  • 9fb1eae877 [expansion] Regenerating address data file Al 2015-07-24 16:09:22 -04:00
  • cff72a0cb3 [dictionaries] Adding a few versions of the phrase "centro commerical" in French, Spanish and Italian after a review of addresses in those languages Al 2015-07-24 16:04:43 -04:00
  • 351c7c8c2e [expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace Al 2015-07-24 16:02:47 -04:00
  • 90a91cadd0 [search] Modifying trie_search_prefixes to use the new key schema Al 2015-07-24 15:59:49 -04:00
  • bb7688d8d1 [phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc. Al 2015-07-24 15:56:04 -04:00
  • 359cd62e20 [numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty" Al 2015-07-24 15:30:58 -04:00
  • 12959aa483 [numex] Re-generating numex data Al 2015-07-24 15:24:03 -04:00
  • 5239c365d0 [docs] Adding some documentation for normalize.h options Al 2015-07-24 15:23:18 -04:00
  • caf714f06f [fix] typo and frivolous key Al 2015-07-24 15:22:57 -04:00
  • 87566bb6a5 [numex] Adding validation checks for numex JSON Al 2015-07-24 15:21:52 -04:00
  • 96538469dd [utils] Adding a cstring_array_foreach macro Al 2015-07-23 15:57:12 -04:00
  • 27af28eacf [expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array Al 2015-07-22 20:35:29 -04:00
  • 454be89121 [expansion] generated header and data files Al 2015-07-22 20:31:54 -04:00
  • b27af13f8a [expansion] Adding an array of dictionaries to each (phrase, canonical) pair Al 2015-07-22 20:24:14 -04:00
  • 0a9e92f11f [expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary Al 2015-07-22 17:21:09 -04:00
  • 09004aa5f1 [expansion] Constant for the "all" dictionary Al 2015-07-22 17:17:55 -04:00
  • f61d993157 [expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language Al 2015-07-22 03:51:14 -04:00
  • 3da4b5d8c2 [numex] New numex generated data file Al 2015-07-22 02:24:16 -04:00
  • ba8ff2b0c6 [expansion] Language prefixed keys Al 2015-07-22 02:16:22 -04:00
  • 157727d249 [fix] method name, strlen and fclose Al 2015-07-22 02:15:45 -04:00
  • 64a63fdf51 [mv] Moving all repo data files to a resources dir, data is only for runtime files Al 2015-07-21 18:10:39 -04:00
  • a38b924c5d [fix] add_token_alternatives Al 2015-07-21 17:26:59 -04:00
  • 71be52275d [tokenization] Adding a version which of tokenize which keeps whitespace tokens Al 2015-07-21 17:26:20 -04:00
  • 5d21cb1604 [expansion] Address dictionary builder Al 2015-07-21 16:46:57 -04:00
  • 6eccde0df8 [fix] trie_set_data_at_index Al 2015-07-21 16:46:38 -04:00
  • c798876b3d [expansion] Address dictionary allocation, I/O, get/set Al 2015-07-21 16:46:15 -04:00