094a5bf5f4[dictionaries] adding Jnr and Snr forms for generational suffixes
Al
2015-10-28 00:00:11 -04:00
c2d112f4fc[fix] compile flags in Makefile.am
Al
2015-10-27 19:01:37 -04:00
9a92a1154d[python] Making normalized_tokens return token classes as well, mimicking the tokenize API
Al
2015-10-27 14:13:49 -04:00
7f5f056105[python] don't need -O0 any more for normalization extension
Al
2015-10-27 13:34:00 -04:00
9f6e1387a0[fix] Error condition in Python tokenize
Al
2015-10-27 13:33:28 -04:00
6aaa08c220[fix] Usage on libpostal_data script
Al
2015-10-27 13:33:03 -04:00
40918812e2[normalize] Adding hyphen elimination as a string option (changes tokenization)
Al
2015-10-27 13:32:47 -04:00
3fe2365234[fix] signed size_t in trie_set_tail
Al
2015-10-27 13:21:26 -04:00
ad59ba7a7b[fix] Re-generating transliteration tables
Al
2015-10-27 12:28:08 -04:00
7f5cf89e84[transliteration] Not escaping right side transliteration rules
Al
2015-10-27 12:24:38 -04:00
1a1d74785c[fix] Compiler warnings for casts/printf
Al
2015-10-26 18:52:18 -04:00
6b456025b4[fix] warnings in klib/ksort.h
Al
2015-10-26 18:50:22 -04:00
3b3513ffe3[fix] warnings in collections.h/vector_math.h
Al
2015-10-26 18:49:58 -04:00
83c6a87ab1[build] substitution for use of LIBPOSTAL_DATA_DIR in Makefile.am
Al
2015-10-26 18:35:23 -04:00
f6b6a17335[python/normalization] Adding Python bindings to the normalize module for use in OSM polygon matching
Al
2015-10-26 18:07:37 -04:00
a319c1f6a0[build] defining LIBPOSTAL_DATA_DIR in Autoconf rather than Automake, becomes part of config.h
Al
2015-10-26 18:06:01 -04:00
8a188903b3[python] Using tuples in pytokenize instead of list, pre-allocating
Al
2015-10-26 18:04:13 -04:00
309d41a652[math] adding matrix_zero method
Al
2015-10-25 21:38:59 -04:00
e3b80534ba[polygons] Adding methods for fixing polygons in base RTreePolygonIndex, moving the current polygon index to an instance variable and adding ability to import from a general GeoJSON-like structure instead of just shapefiles
Al
2015-10-25 18:36:12 -04:00
4f784060a3[python] Adding word_token_types
Al
2015-10-25 18:33:09 -04:00
2d4b3a6e2f[parser/formatting] Appendinge suburb between the road line and any subsequent lines for all bottom-up address formats. Effectively inserts neighborhoods into our version without making the OpenCage formats overly verbose. Also fixing post-format replace with group capture
Al
2015-10-24 01:29:21 -04:00
da53d7ebac[osm] Adding an OSM neighborhoods/suburbs data set for matching with Quattroshapes boundaries, updating definitions for admin boundaries
Al
2015-10-22 11:37:06 -04:00
6478e65a06[osm] Moving Wikipedia title normalization to osm.extract
Al
2015-10-22 11:35:38 -04:00
ff3a3c2201[fix] disambiguation tokenizer to pypostal
Al
2015-10-21 16:35:55 -04:00
6f6d04966b[fix] role in OSM polygon extraction
Al
2015-10-21 16:35:25 -04:00
336bfe32ca[osm/formatter] Switching back to OpenCageData repo
Al
2015-10-21 16:34:24 -04:00
d0aa3b9109[polygons/osm] alt_name
Al
2015-10-20 09:27:55 -04:00
218eae548c[fix] logger
Al
2015-10-20 06:43:51 -04:00
1e8e592e0b[fix] import
Al
2015-10-19 23:30:12 -04:00
e5129957f8[osm/polygons] Add relation id to OSM reverse geocoder
Al
2015-10-19 18:00:45 -04:00
5187e6073a[fix] admin boundary imports
Al
2015-10-19 17:14:48 -04:00
0d213e426a[fix] logger
Al
2015-10-19 15:55:19 -04:00
f5bd9b8371[polygons/osm] logging during reverse geocoder construction
Al
2015-10-19 15:54:07 -04:00
8609ccbb1d[polygons/osm] lon, lat
Al
2015-10-19 15:40:43 -04:00
ef94f1b712[doc] Adding some comments to fetch_osm_address_data.sh
Al
2015-10-19 15:39:31 -04:00
83295b1b34[polygons/osm] Adding in-memory OSM reverse geocoder for all admin boundaries
Al
2015-10-18 22:29:52 -04:00
4a3994c65e[polygons/osm] Construct polygons from OSM relations using a number of space-saving optimizations in order to process planet in a reasonable amount of memory. Builds a graph of connected ways such that forming polygons is equivalent to finding strongly connected components.
Al
2015-10-18 20:53:38 -04:00
b44a72588f[polygons/osm] Connecting OSM polygons from their constituent ways is an instance of finding strongly connected components in a graph, adding implementation
Al
2015-10-18 18:23:27 -04:00
a2ad829d52[math] matrix scalar arithmetic
Al
2015-10-16 16:26:27 -04:00
ade0e2dc1f[osm] Adding final .osm file variable for borders output
Al
2015-10-16 00:46:40 -04:00
b5f8b696bf[osm] Moving parse_osm to a separate module, adding option to list dependencies
Al
2015-10-15 12:08:32 -04:00
ca629e295d[osm] Adding admin boundaries filter in OSM data
Al
2015-10-15 12:06:08 -04:00
153c8c9cc4[coordinates] Better handling for float DMS coordinates
Al
2015-10-14 15:10:58 -04:00
e584745061[formatting] Adding STATE_DISTRICT to formatter for things like counties
Al
2015-10-14 15:10:18 -04:00
efba7987b5[coordinates] sticking latlon_to_decimal in its own module, adding missing methods
Al
2015-10-14 13:38:09 -04:00
9d9568b8c8[polygons] Adding quattroshapes files to download script
Al
2015-10-14 01:15:27 -04:00
f5bf7b8f2d[fix] ordering and reverse geocoder fields
Al
2015-10-13 22:26:30 -04:00
e7b1040a47[polygons] Admin level constants, using transformed name as the sort key
Al
2015-10-13 21:20:58 -04:00
d151445dc3[fix] class var
Al
2015-10-13 20:46:12 -04:00
4e34575ed2[fix] name
Al
2015-10-13 20:45:31 -04:00
16e7046f7c[polygons] Aliasing names for various polygons types
Al
2015-10-13 20:44:59 -04:00
cc853345fb[fix] json.loads
Al
2015-10-13 19:58:03 -04:00
9ee2a7a474[polygons] Saving line-delimited GeoJSON to reduce memory consumption when loading
Al
2015-10-13 19:14:35 -04:00
646ad64af8[fix] A few Quattroshapes fixes
Al
2015-10-13 17:52:21 -04:00
e6fc405eb9[fix] conversion errors
Al
2015-10-13 12:47:50 -04:00
2a7bf82951[fix] filename
Al
2015-10-13 11:43:48 -04:00
b90bf19133[fix] include properties
Al
2015-10-13 10:58:42 -04:00
5b1447684d[fix] import
Al
2015-10-13 10:56:51 -04:00
28d1c471a7[polygons] Property transforms/validation in Quattroshapes reverse geocoder
Al
2015-10-13 10:55:17 -04:00
09beae845e[fix] missing fields in local admin polygons
Al
2015-10-13 03:15:25 -04:00
ea3fa2f09f[fix] neighborhoods sort order
Al
2015-10-12 16:17:48 -05:00
689edf2cbc[fix] init_languages not needed for reverse geocoder
Al
2015-10-12 15:43:54 -05:00
0fd65c0dea[fix] include properties
Al
2015-10-12 15:41:53 -05:00
20567bf9a3[polygons] Adding full quattroshapes-backed reverse geocoder to add to OSM training data
Al
2015-10-12 15:37:17 -05:00
1b2642fe58[polygons] Addindg ability to specify include properties by filename
Al
2015-10-12 15:36:24 -05:00
080ccf0ddd[fix] logging warnings in transliterate
Al
2015-10-12 13:50:42 -05:00
baef090793[logging] Wrapping logging statements in a do while (0) so the compiler always at least sees debug code
Al
2015-10-12 13:42:10 -05:00
b88f237d82[build] Adding separate Makefile target for downloading geodb
Al
2015-10-11 22:26:07 -05:00
588cf1df86[build] Changing options to libpostal_data script to allow downloading geodb, uploaded first version to S3
Al
2015-10-11 22:25:37 -05:00
39d3af20cf[build] Checking for shuf/gshuf
Al
2015-10-11 11:13:53 -05:00
372e952cd3[geodb] Adding some logging to geodb
Al
2015-10-11 01:00:08 -05:00
cb334b9fb1[geodisambig] Shaving a few hundred more megabytes off of the geodb by only adding a single geohash prefix and not indexing the neighbors (query can use its neighbors)
Al
2015-10-11 00:45:26 -05:00
2394f817e4[phrases] Fixing fallback at the end of a string in trie search
Al
2015-10-11 00:13:13 -05:00
29bc0fd11e[build] Makefile changes for the new geodb
Al
2015-10-09 15:54:44 -04:00
a6fbd48bec[geodb] geodb builder changes to support the new, more compact geodb
Al
2015-10-09 15:53:56 -04:00
bf596b9184[utils] integer string sizes
Al
2015-10-09 15:40:47 -04:00
4dad121334[fix] Initializing booleans in postal code constructor
Al
2015-10-09 15:40:28 -04:00
44da2e446b[geodb] Additional filenames and struct members in geodb.h
Al
2015-10-09 15:37:10 -04:00
67d128c386[graph] graph_load and graph_save
Al
2015-10-09 15:36:14 -04:00
9fe2250521[geodb] Using a trie for geo disambiguation features rather than the sparkey hashtable, sparkey simply contains the ids or code/country pairs in the case of postal codes
Al
2015-10-09 15:35:50 -04:00
cd6a0ab90b[geodb] Prefixing features with name for geo disambiguation (better trie compression) and removing the longer geohash prefix features
Al
2015-10-09 15:15:58 -04:00
77c4bb10c6[utils] Adding kh_foreach_key
Al
2015-10-09 11:51:32 -04:00
151161cab3[fix] Raising error in geonames output if a country cannot be localized
Al
2015-10-07 03:45:56 -04:00
1917816b80[countries] Not relying on pycountry alpha 2 codes for localized country names as it doesn't contain Kosovo which was causing problems
Al
2015-10-07 03:44:47 -04:00
1e98932b82[fix] setting array->n after reading in both graph and sparse_matrix implementations
Al
2015-10-06 19:28:28 -04:00
5a231fb709[graph] Builder for graphs not constructed in vertex-sorted order
Al
2015-10-06 19:03:10 -04:00
4984352eda[graph] Simple sparse graph implementation, essentially a sparse matrix with no values array
Al
2015-10-06 18:58:18 -04:00
3084fc929b[geodb] Was missing country boundary type in GeoDB causing some misses in parsing
Al
2015-10-06 16:01:18 -04:00
5af6dc77d1[dictionaries] Adding a few additional abbreviated names of political leaders that come up, a missing abbreviation
Al
2015-10-06 15:09:50 -04:00
5f03bc9369[fix] Unit dictionaries apply to ADDRESS_UNIT component
Al
2015-10-06 12:04:24 -04:00
91f4e477ad[fix] typo
Al
2015-10-06 12:04:07 -04:00
0eb9ef5bdf[tokenization] Regenerating scanner.c
Al
2015-10-05 01:41:48 -04:00
50a36cc595[parser] using trie_new_from_hash instead of an inline implemention in averaged perceptron training
Al
2015-10-04 18:31:16 -04:00
ff8986a287[phrases] trie_new_from_hash compresses a {str: uint32_t} hashtable into a trie in sorted order
Al
2015-10-04 18:28:17 -04:00
55a5a79b4b[tokenization] tokenized string with source
Al
2015-10-04 18:27:04 -04:00
aa39c45b87[tokenization] skipping control characters in tokenization, comes up in OSM surprisingly
Al
2015-10-04 18:25:44 -04:00
d6480d2902[utils] Adding ksort for strings by default in collections.h
Al
2015-10-04 18:23:42 -04:00
db63e6dbc3[fix] making ksort methods static
Al
2015-10-04 18:23:09 -04:00
ed51fce291[fix] Safe to assume Bokmål for Norwegian street addresses
Al
2015-10-04 11:19:43 -04:00