Commit Graph

  • 094a5bf5f4 [dictionaries] adding Jnr and Snr forms for generational suffixes Al 2015-10-28 00:00:11 -04:00
  • c2d112f4fc [fix] compile flags in Makefile.am Al 2015-10-27 19:01:37 -04:00
  • 9a92a1154d [python] Making normalized_tokens return token classes as well, mimicking the tokenize API Al 2015-10-27 14:13:49 -04:00
  • 7f5f056105 [python] don't need -O0 any more for normalization extension Al 2015-10-27 13:34:00 -04:00
  • 9f6e1387a0 [fix] Error condition in Python tokenize Al 2015-10-27 13:33:28 -04:00
  • 6aaa08c220 [fix] Usage on libpostal_data script Al 2015-10-27 13:33:03 -04:00
  • 40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) Al 2015-10-27 13:32:47 -04:00
  • 3fe2365234 [fix] signed size_t in trie_set_tail Al 2015-10-27 13:21:26 -04:00
  • ad59ba7a7b [fix] Re-generating transliteration tables Al 2015-10-27 12:28:08 -04:00
  • 7f5cf89e84 [transliteration] Not escaping right side transliteration rules Al 2015-10-27 12:24:38 -04:00
  • 1a1d74785c [fix] Compiler warnings for casts/printf Al 2015-10-26 18:52:18 -04:00
  • 6b456025b4 [fix] warnings in klib/ksort.h Al 2015-10-26 18:50:22 -04:00
  • 3b3513ffe3 [fix] warnings in collections.h/vector_math.h Al 2015-10-26 18:49:58 -04:00
  • 83c6a87ab1 [build] substitution for use of LIBPOSTAL_DATA_DIR in Makefile.am Al 2015-10-26 18:35:23 -04:00
  • f6b6a17335 [python/normalization] Adding Python bindings to the normalize module for use in OSM polygon matching Al 2015-10-26 18:07:37 -04:00
  • a319c1f6a0 [build] defining LIBPOSTAL_DATA_DIR in Autoconf rather than Automake, becomes part of config.h Al 2015-10-26 18:06:01 -04:00
  • 8a188903b3 [python] Using tuples in pytokenize instead of list, pre-allocating Al 2015-10-26 18:04:13 -04:00
  • 309d41a652 [math] adding matrix_zero method Al 2015-10-25 21:38:59 -04:00
  • e3b80534ba [polygons] Adding methods for fixing polygons in base RTreePolygonIndex, moving the current polygon index to an instance variable and adding ability to import from a general GeoJSON-like structure instead of just shapefiles Al 2015-10-25 18:36:12 -04:00
  • 4f784060a3 [python] Adding word_token_types Al 2015-10-25 18:33:09 -04:00
  • 2d4b3a6e2f [parser/formatting] Appendinge suburb between the road line and any subsequent lines for all bottom-up address formats. Effectively inserts neighborhoods into our version without making the OpenCage formats overly verbose. Also fixing post-format replace with group capture Al 2015-10-24 01:29:21 -04:00
  • da53d7ebac [osm] Adding an OSM neighborhoods/suburbs data set for matching with Quattroshapes boundaries, updating definitions for admin boundaries Al 2015-10-22 11:37:06 -04:00
  • 6478e65a06 [osm] Moving Wikipedia title normalization to osm.extract Al 2015-10-22 11:35:38 -04:00
  • ff3a3c2201 [fix] disambiguation tokenizer to pypostal Al 2015-10-21 16:35:55 -04:00
  • 6f6d04966b [fix] role in OSM polygon extraction Al 2015-10-21 16:35:25 -04:00
  • 336bfe32ca [osm/formatter] Switching back to OpenCageData repo Al 2015-10-21 16:34:24 -04:00
  • d0aa3b9109 [polygons/osm] alt_name Al 2015-10-20 09:27:55 -04:00
  • 218eae548c [fix] logger Al 2015-10-20 06:43:51 -04:00
  • 1e8e592e0b [fix] import Al 2015-10-19 23:30:12 -04:00
  • e5129957f8 [osm/polygons] Add relation id to OSM reverse geocoder Al 2015-10-19 18:00:45 -04:00
  • 5187e6073a [fix] admin boundary imports Al 2015-10-19 17:14:48 -04:00
  • 0d213e426a [fix] logger Al 2015-10-19 15:55:19 -04:00
  • f5bd9b8371 [polygons/osm] logging during reverse geocoder construction Al 2015-10-19 15:54:07 -04:00
  • 8609ccbb1d [polygons/osm] lon, lat Al 2015-10-19 15:40:43 -04:00
  • ef94f1b712 [doc] Adding some comments to fetch_osm_address_data.sh Al 2015-10-19 15:39:31 -04:00
  • 83295b1b34 [polygons/osm] Adding in-memory OSM reverse geocoder for all admin boundaries Al 2015-10-18 22:29:52 -04:00
  • 4a3994c65e [polygons/osm] Construct polygons from OSM relations using a number of space-saving optimizations in order to process planet in a reasonable amount of memory. Builds a graph of connected ways such that forming polygons is equivalent to finding strongly connected components. Al 2015-10-18 20:53:38 -04:00
  • b44a72588f [polygons/osm] Connecting OSM polygons from their constituent ways is an instance of finding strongly connected components in a graph, adding implementation Al 2015-10-18 18:23:27 -04:00
  • a2ad829d52 [math] matrix scalar arithmetic Al 2015-10-16 16:26:27 -04:00
  • ade0e2dc1f [osm] Adding final .osm file variable for borders output Al 2015-10-16 00:46:40 -04:00
  • b5f8b696bf [osm] Moving parse_osm to a separate module, adding option to list dependencies Al 2015-10-15 12:08:32 -04:00
  • ca629e295d [osm] Adding admin boundaries filter in OSM data Al 2015-10-15 12:06:08 -04:00
  • 153c8c9cc4 [coordinates] Better handling for float DMS coordinates Al 2015-10-14 15:10:58 -04:00
  • e584745061 [formatting] Adding STATE_DISTRICT to formatter for things like counties Al 2015-10-14 15:10:18 -04:00
  • efba7987b5 [coordinates] sticking latlon_to_decimal in its own module, adding missing methods Al 2015-10-14 13:38:09 -04:00
  • 9d9568b8c8 [polygons] Adding quattroshapes files to download script Al 2015-10-14 01:15:27 -04:00
  • f5bf7b8f2d [fix] ordering and reverse geocoder fields Al 2015-10-13 22:26:30 -04:00
  • e7b1040a47 [polygons] Admin level constants, using transformed name as the sort key Al 2015-10-13 21:20:58 -04:00
  • d151445dc3 [fix] class var Al 2015-10-13 20:46:12 -04:00
  • 4e34575ed2 [fix] name Al 2015-10-13 20:45:31 -04:00
  • 16e7046f7c [polygons] Aliasing names for various polygons types Al 2015-10-13 20:44:59 -04:00
  • cc853345fb [fix] json.loads Al 2015-10-13 19:58:03 -04:00
  • 9ee2a7a474 [polygons] Saving line-delimited GeoJSON to reduce memory consumption when loading Al 2015-10-13 19:14:35 -04:00
  • 646ad64af8 [fix] A few Quattroshapes fixes Al 2015-10-13 17:52:21 -04:00
  • e6fc405eb9 [fix] conversion errors Al 2015-10-13 12:47:50 -04:00
  • ec8f06155d [fix] set Al 2015-10-13 11:44:53 -04:00
  • 2a7bf82951 [fix] filename Al 2015-10-13 11:43:48 -04:00
  • b90bf19133 [fix] include properties Al 2015-10-13 10:58:42 -04:00
  • 5b1447684d [fix] import Al 2015-10-13 10:56:51 -04:00
  • 28d1c471a7 [polygons] Property transforms/validation in Quattroshapes reverse geocoder Al 2015-10-13 10:55:17 -04:00
  • 09beae845e [fix] missing fields in local admin polygons Al 2015-10-13 03:15:25 -04:00
  • ea3fa2f09f [fix] neighborhoods sort order Al 2015-10-12 16:17:48 -05:00
  • 689edf2cbc [fix] init_languages not needed for reverse geocoder Al 2015-10-12 15:43:54 -05:00
  • 0fd65c0dea [fix] include properties Al 2015-10-12 15:41:53 -05:00
  • 20567bf9a3 [polygons] Adding full quattroshapes-backed reverse geocoder to add to OSM training data Al 2015-10-12 15:37:17 -05:00
  • 1b2642fe58 [polygons] Addindg ability to specify include properties by filename Al 2015-10-12 15:36:24 -05:00
  • 080ccf0ddd [fix] logging warnings in transliterate Al 2015-10-12 13:50:42 -05:00
  • baef090793 [logging] Wrapping logging statements in a do while (0) so the compiler always at least sees debug code Al 2015-10-12 13:42:10 -05:00
  • b88f237d82 [build] Adding separate Makefile target for downloading geodb Al 2015-10-11 22:26:07 -05:00
  • 588cf1df86 [build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3 Al 2015-10-11 22:25:37 -05:00
  • 39d3af20cf [build] Checking for shuf/gshuf Al 2015-10-11 11:13:53 -05:00
  • 372e952cd3 [geodb] Adding some logging to geodb Al 2015-10-11 01:00:08 -05:00
  • cb334b9fb1 [geodisambig] Shaving a few hundred more megabytes off of the geodb by only adding a single geohash prefix and not indexing the neighbors (query can use its neighbors) Al 2015-10-11 00:45:26 -05:00
  • 2394f817e4 [phrases] Fixing fallback at the end of a string in trie search Al 2015-10-11 00:13:13 -05:00
  • 29bc0fd11e [build] Makefile changes for the new geodb Al 2015-10-09 15:54:44 -04:00
  • a6fbd48bec [geodb] geodb builder changes to support the new, more compact geodb Al 2015-10-09 15:53:56 -04:00
  • bf596b9184 [utils] integer string sizes Al 2015-10-09 15:40:47 -04:00
  • 4dad121334 [fix] Initializing booleans in postal code constructor Al 2015-10-09 15:40:28 -04:00
  • 44da2e446b [geodb] Additional filenames and struct members in geodb.h Al 2015-10-09 15:37:10 -04:00
  • 67d128c386 [graph] graph_load and graph_save Al 2015-10-09 15:36:14 -04:00
  • 9fe2250521 [geodb] Using a trie for geo disambiguation features rather than the sparkey hashtable, sparkey simply contains the ids or code/country pairs in the case of postal codes Al 2015-10-09 15:35:50 -04:00
  • cd6a0ab90b [geodb] Prefixing features with name for geo disambiguation (better trie compression) and removing the longer geohash prefix features Al 2015-10-09 15:15:58 -04:00
  • 77c4bb10c6 [utils] Adding kh_foreach_key Al 2015-10-09 11:51:32 -04:00
  • 151161cab3 [fix] Raising error in geonames output if a country cannot be localized Al 2015-10-07 03:45:56 -04:00
  • 1917816b80 [countries] Not relying on pycountry alpha 2 codes for localized country names as it doesn't contain Kosovo which was causing problems Al 2015-10-07 03:44:47 -04:00
  • 1e98932b82 [fix] setting array->n after reading in both graph and sparse_matrix implementations Al 2015-10-06 19:28:28 -04:00
  • 5a231fb709 [graph] Builder for graphs not constructed in vertex-sorted order Al 2015-10-06 19:03:10 -04:00
  • 4984352eda [graph] Simple sparse graph implementation, essentially a sparse matrix with no values array Al 2015-10-06 18:58:18 -04:00
  • 3084fc929b [geodb] Was missing country boundary type in GeoDB causing some misses in parsing Al 2015-10-06 16:01:18 -04:00
  • 5af6dc77d1 [dictionaries] Adding a few additional abbreviated names of political leaders that come up, a missing abbreviation Al 2015-10-06 15:09:50 -04:00
  • 5f03bc9369 [fix] Unit dictionaries apply to ADDRESS_UNIT component Al 2015-10-06 12:04:24 -04:00
  • 91f4e477ad [fix] typo Al 2015-10-06 12:04:07 -04:00
  • 0eb9ef5bdf [tokenization] Regenerating scanner.c Al 2015-10-05 01:41:48 -04:00
  • 50a36cc595 [parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training Al 2015-10-04 18:31:16 -04:00
  • ff8986a287 [phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order Al 2015-10-04 18:28:17 -04:00
  • 55a5a79b4b [tokenization] tokenized string with source Al 2015-10-04 18:27:04 -04:00
  • aa39c45b87 [tokenization] skipping control characters in tokenization, comes up in OSM surprisingly Al 2015-10-04 18:25:44 -04:00
  • d6480d2902 [utils] Adding ksort for strings by default in collections.h Al 2015-10-04 18:23:42 -04:00
  • db63e6dbc3 [fix] making ksort methods static Al 2015-10-04 18:23:09 -04:00
  • ed51fce291 [fix] Safe to assume Bokmål for Norwegian street addresses Al 2015-10-04 11:19:43 -04:00