Commit Graph

  • 2940cc15b8 [fix] tokenized string destroy frees original string Al 2015-09-19 01:40:41 -04:00
  • 2b13871341 [constants] max country code length Al 2015-09-19 01:39:58 -04:00
  • 0396823772 [fix] geodb path separator Al 2015-09-19 01:39:31 -04:00
  • 17cfdb0625 [fix] adding char_array_append_* methods to header Al 2015-09-18 13:19:42 -04:00
  • f2f7db92ff [fix] phrases Al 2015-09-18 13:19:18 -04:00
  • b74e92adad [fix] include Al 2015-09-18 13:18:49 -04:00
  • 2a869894d9 [fix] geodb Al 2015-09-18 13:18:26 -04:00
  • 9e9131bda0 [parser] Averaged perceptron tagger Al 2015-09-17 05:51:24 -04:00
  • 8a86f7ec64 [parser] Adding context struct to feature function Al 2015-09-17 05:48:00 -04:00
  • 87ed7d9a0f [geodb] Adding trie search methods for finding geodb phrases Al 2015-09-16 22:11:10 -04:00
  • e62c75b9c6 [phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases Al 2015-09-16 21:24:23 -04:00
  • 23103a21d4 [phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases Al 2015-09-16 21:23:34 -04:00
  • d5ec005787 [transliteration] Similar init method for transliteration Al 2015-09-16 21:14:02 -04:00
  • b11362ab98 [numex] using module init method for building, otherwise passing NULL path uses the default path Al 2015-09-16 21:13:05 -04:00
  • 3cba2e8df3 [api] Using default setup methods for submodules in libpostal setup Al 2015-09-15 14:01:33 -04:00
  • e122824448 [expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language Al 2015-09-15 13:24:53 -04:00
  • c47ff1b113 [utils] Adding source string to tokenized_string struct Al 2015-09-15 13:21:51 -04:00
  • b2f690b6f6 [api] Error logging if modules can't be found Al 2015-09-15 13:21:11 -04:00
  • 9de3029dd3 [parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize Al 2015-09-14 17:38:45 -04:00
  • a5b5f80b04 [fix] new_copy Al 2015-09-14 16:50:23 -04:00
  • 3ea6358f77 [fix] vector zeros allocation Al 2015-09-14 16:50:08 -04:00
  • c21f61b9b4 [parser] Default address parser path Al 2015-09-11 15:05:07 -07:00
  • 32c180528f [tokens] Adding a copy_tokens option for tokenized_string Al 2015-09-11 15:03:09 -07:00
  • 9ce658b7a3 [collections] Adding string_array for an array of char pointers Al 2015-09-10 16:34:16 -07:00
  • 35b9122a1a [utils] inlining a few functions Al 2015-09-10 16:33:54 -07:00
  • 35f1c02caf [polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately Al 2015-09-10 12:44:13 -07:00
  • 440a8158b6 [polygons] Adding in country languages for regional polygons without a default language Al 2015-09-10 12:34:26 -07:00
  • 22c16b43cf [languages] Italian is also the regional default in Valle D'Aosta and Trentino-Alto Adige Al 2015-09-10 11:09:13 -07:00
  • fca7f21b1d [polygons] Making simplify_tolerance and preserve_topology for polygon simplification configurable per class Al 2015-09-10 11:06:18 -07:00
  • 6a5b01b51b [parser] Averaged perceptron training Al 2015-09-10 10:25:52 -07:00
  • 0ddf50cb5f [utils] add to feature array with printf syntax Al 2015-09-10 10:24:51 -07:00
  • b3f89a207a [utils] Version of string_split for single character delimiters which modifies the input string directly rather than creating (essentially) a copy Al 2015-09-09 17:41:23 -07:00
  • c1da2fa94b [dictionaries] Adding 'Rang' to French dictionaries Al 2015-09-09 17:21:26 -07:00
  • b85fe50fad [osm] Training data for toponyms only cares about valid languages for name field Al 2015-09-08 16:38:05 -07:00
  • 607a607b71 [doc] documentation fix for averaged perceptron Al 2015-09-08 16:37:23 -07:00
  • c80d8b8067 [parsing] Averaged perceptron model data structure for storing the finalized, averaged, sparse weights Al 2015-09-08 12:02:15 -07:00
  • 8d642b45b9 [fix] trie was returning early on add_at_index and not incrementing the num_keys Al 2015-09-08 11:41:46 -07:00
  • e566063343 [osm] Doing an all-to-nodes conversion and an additional filter on the borders data set Al 2015-09-08 09:18:05 -07:00
  • ae7e30634b [features] Adding counter/bag-of-words representation of features Al 2015-09-08 00:17:26 -07:00
  • 49d389b9d8 [refactor] changing names in int-valued hash tables Al 2015-09-08 00:15:14 -07:00
  • 2fffd76af8 [fix] typo Al 2015-09-07 23:58:34 -07:00
  • aa454c4430 [fix] removing char_array_copy from header Al 2015-09-07 23:58:05 -07:00
  • 3fd6552b44 [fix] void not void * in vector *_copy Al 2015-09-07 23:57:44 -07:00
  • cddffdb65f [math] Adding column and row sums to sparse matrices Al 2015-09-07 00:34:00 -07:00
  • 8525529968 [osm] Not requiring qualified name tags to process OSM toponyms Al 2015-09-06 21:03:01 -07:00
  • 9d2ca08fc2 [utils] Adding _copy and _new_copy methods to vectors (the former copies data to a pre-allocated vector, the latter allocates a new vector) Al 2015-09-06 20:46:29 -07:00
  • 49fe504201 [math] Matrix get value at row, column index Al 2015-09-06 12:37:10 -07:00
  • ec3ab7234a [utils] Adding index to cstring_array_foreach, similar to Python's enumerate Al 2015-09-04 19:34:00 -04:00
  • df20e2cbc0 [osm] Including toponyms in the training data for countries where the unqualified place names can be assumed to be examples of a given language Al 2015-09-04 14:13:26 -04:00
  • 17fcfa8b59 [fix] adding house to ignore keys rather than aliasing it Al 2015-09-04 12:40:08 -04:00
  • d64a27bc57 [osm] Converting relations to nodes in borders training data Al 2015-09-04 12:32:25 -04:00
  • 168b7f59da [fix] default indices in strip_component Al 2015-09-04 12:29:47 -04:00
  • 64db63e3eb [osm] Removing house tag Al 2015-09-04 12:23:47 -04:00
  • 6a20ce5e85 [language_id] Adding formatted addresses and toponyms to language training data Al 2015-09-04 01:46:49 -04:00
  • 4ebdca0ea7 [fix] var Al 2015-09-03 21:01:20 -04:00
  • 8345afbcd0 [fix] exclude country toponyms where the default languages is well represented Al 2015-09-03 20:56:58 -04:00
  • 20bb191624 [fix] chaining Al 2015-09-03 20:52:00 -04:00
  • e7cf5000fe [fix] Exclude polygons with > 1 regional language Al 2015-09-03 20:48:04 -04:00
  • 9a9530c1b9 [fix] unqualified names Al 2015-09-03 20:37:19 -04:00
  • a5fdd911d8 [fix] only use name key for default names Al 2015-09-03 20:35:08 -04:00
  • d8e1432533 [osm] Adding unqualified names in single-language countries Al 2015-09-03 20:31:49 -04:00
  • d13d4d7d28 [dictionaries] Adding English gazetteers as non-default to Georgia Al 2015-09-03 20:25:38 -04:00
  • b15d2d70aa [fix] top language Al 2015-09-03 20:09:46 -04:00
  • 44bf94a158 [osm] Better borders training data set (only need the metadata, not the polygons) Al 2015-09-03 20:09:03 -04:00
  • 55af9b0a0c [fix] OSM address tagged training data formatting Al 2015-09-03 18:35:19 -04:00
  • c6bfc0e021 [osm] Postponing punctuation stripping until after address template rendering Al 2015-09-03 18:13:37 -04:00
  • d54fb25e45 [osm] don't bother with the R-tree check if there are no name:* tags in border data set Al 2015-09-03 17:54:35 -04:00
  • 33af61095b [fix] var Al 2015-09-03 17:49:52 -04:00
  • 294101ad80 [osm] Treating components that are all punctuation as blank in address parsing (e.g. a single comma) Al 2015-09-03 17:46:57 -04:00
  • e1e5c16637 [osm] Not adding unqualified name tags to toponym data set, throwing out a few cases of language ambiguity Al 2015-09-03 16:50:30 -04:00
  • 040a26a6f2 [fix] import Al 2015-09-03 13:54:23 -04:00
  • 7787427c58 [fix] typo Al 2015-09-03 13:53:18 -04:00
  • 23633e95dd [osm] Only adding country default language toponyms to training data Al 2015-09-03 13:44:41 -04:00
  • 11c01f64d2 [osm] OrderedDict of attrs in OSM training data Al 2015-09-03 11:11:18 -04:00
  • 27eb4e4aed [osm] Adding a toponym language training set using planet-borders.osm (all admin borders) Al 2015-09-03 10:19:11 -04:00
  • db57855c95 [osm] Switching formatter repo to the OpenVenues fork, with fixes and several dozen new countries added Al 2015-09-03 10:06:54 -04:00
  • a916668f28 [i18n] Local file for ISO 15924 Al 2015-09-01 23:58:36 -04:00
  • ee4d73c65d [math] sparse matrix I/O methods Al 2015-09-01 00:29:11 -04:00
  • a8f6617294 [phrases] Adding num_keys attribute to trie Al 2015-08-31 21:41:34 -04:00
  • aac5b37e76 [fix] Removing default dirent include Al 2015-08-31 21:38:29 -04:00
  • bb50c7ea2c [math] Adding sigmoid and softmax functions Al 2015-08-31 20:27:58 -04:00
  • a090a22bca [math] Adding compressed sparse row (CSR) format sparse matrix, designed for dynamic construction, just the methods needed for logistic regression for now i.e. no sparse dot products Al 2015-08-31 15:52:28 -04:00
  • 0f617454d3 [math] Dense matrices Al 2015-08-31 14:57:11 -04:00
  • 0ee72b8dfb [math] can only use memset for *_array_new_zeros Al 2015-08-31 14:44:43 -04:00
  • c566eaecf1 [dictionaries] Rebuilding address expansion data and uploading new files to S3 Al 2015-08-31 14:33:28 -04:00
  • 789150ae33 [math] Using regular C arrays instead of vectors for vector_math.h Al 2015-08-30 02:41:31 -04:00
  • 07b0bed602 [math] Only float vectors have *_array_log, *_array_exp, etc. Al 2015-08-26 17:58:01 -04:00
  • a2ec8001b0 [osm] Removing postal code keys in formatted language training data Al 2015-08-24 14:08:36 -04:00
  • 8bbcb60aee [languages] Moving search_suffix and search_prefix into methods Al 2015-08-24 14:04:28 -04:00
  • c68f56e61d [fix] paths Al 2015-08-24 12:58:27 -04:00
  • d620cb6fc3 [fix] Calculating splits in Python rather than bash Al 2015-08-24 12:47:51 -04:00
  • c754d275af [fix] str Al 2015-08-24 12:24:55 -04:00
  • 96cb289b79 [languages] Script to create language training/cross-validation/test data splits Al 2015-08-24 12:18:23 -04:00
  • fa7b855ecb [languages] Earlier exit on finding ambiguous script spans Al 2015-08-24 03:07:45 -04:00
  • 90f333b16c [languages] Adding English non-default dictionaries to a number of countries where English can be found in OSM Al 2015-08-24 02:49:49 -04:00
  • e1d336716c [languages] Non-default language canonicals, more test cases Al 2015-08-24 02:21:53 -04:00
  • c1ce91abbf [languages] Better handling of non-default langauge canonicals in default langauge text Al 2015-08-24 01:26:01 -04:00
  • 96d7b990b5 [fix] .items() Al 2015-08-23 23:39:30 -04:00
  • 9f6f4feea1 [dictionaries/languages] Adding English gazetteers for Bahrain, pas abbreviation for paseo Al 2015-08-23 23:32:34 -04:00
  • 84e0982cbc [languages] Allow stopwords to help disambiguate if they can, otherwise ignore them Al 2015-08-23 23:04:17 -04:00