Commit Graph

1007 Commits

Author SHA1 Message Date
Al
cbeb08f1d1 [python/normalize] importing options from the C module 2015-10-30 12:34:07 -04:00
Al
cccc3e9cf5 [similarity] Using Soft-TFIDF for approximate name matching. Soft-TFIDF is a hybrid string distance metric which balances local token similarities (using Jaro-Winkler similarity by default) allowing for slight spelling errors with global TFIDF statistics so that very frequent words don't affect the score as much 2015-10-30 02:48:16 -04:00
Al
e7f783477f [python/normalize] Adding remove parentheses options in Python normalize (would require compiling with the scanner to do it from C, but could switch) 2015-10-30 01:27:16 -04:00
Al
5076c0409b [similarity] Adding an in-memory IDF index for weighted similarities 2015-10-29 13:33:01 -04:00
Al
1c543a5271 [osm/formatting] Adding is_in tags to the address formatter as they're common in OSM, aliasing addr:district to state_district instead of suburb 2015-10-29 12:30:56 -04:00
Al
c7df3fcb3a [osm] Adding a list of various OSM name tags obtained from Nominatim 2015-10-29 11:44:56 -04:00
Al
cee9da05d6 [fix] using tokenize_raw API 2015-10-28 21:37:44 -04:00
Al
bbd10e97bd [fix] imports 2015-10-28 21:32:09 -04:00
Al
110451d6d6 [polygons] Polygon area calculations 2015-10-28 21:19:35 -04:00
Al
e946e63222 [polygons] Changing language polygon index to use new index_polygon method 2015-10-28 21:18:27 -04:00
Al
5fdbb7e832 [polygons] Adding a geohash polygon index which selects a prefix size based on the area of the polygon's bounding box 2015-10-28 21:17:33 -04:00
Al
094a5bf5f4 [dictionaries] adding Jnr and Snr forms for generational suffixes 2015-10-28 00:00:34 -04:00
Al
c2d112f4fc [fix] compile flags in Makefile.am 2015-10-27 19:01:37 -04:00
Al
9a92a1154d [python] Making normalized_tokens return token classes as well, mimicking the tokenize API 2015-10-27 17:07:50 -04:00
Al
7f5f056105 [python] don't need -O0 any more for normalization extension 2015-10-27 13:34:00 -04:00
Al
9f6e1387a0 [fix] Error condition in Python tokenize 2015-10-27 13:33:28 -04:00
Al
6aaa08c220 [fix] Usage on libpostal_data script 2015-10-27 13:33:03 -04:00
Al
40918812e2 [normalize] Adding hyphen elimination as a string option (changes tokenization) 2015-10-27 13:32:47 -04:00
Al
3fe2365234 [fix] signed size_t in trie_set_tail 2015-10-27 13:21:26 -04:00
Al
ad59ba7a7b [fix] Re-generating transliteration tables 2015-10-27 12:28:08 -04:00
Al
7f5cf89e84 [transliteration] Not escaping right side transliteration rules 2015-10-27 12:24:38 -04:00
Al
1a1d74785c [fix] Compiler warnings for casts/printf 2015-10-26 18:52:18 -04:00
Al
6b456025b4 [fix] warnings in klib/ksort.h 2015-10-26 18:50:22 -04:00
Al
3b3513ffe3 [fix] warnings in collections.h/vector_math.h 2015-10-26 18:49:58 -04:00
Al
83c6a87ab1 [build] substitution for use of LIBPOSTAL_DATA_DIR in Makefile.am 2015-10-26 18:47:07 -04:00
Al
f6b6a17335 [python/normalization] Adding Python bindings to the normalize module for use in OSM polygon matching 2015-10-26 18:07:53 -04:00
Al
a319c1f6a0 [build] defining LIBPOSTAL_DATA_DIR in Autoconf rather than Automake, becomes part of config.h 2015-10-26 18:06:05 -04:00
Al
8a188903b3 [python] Using tuples in pytokenize instead of list, pre-allocating 2015-10-26 18:04:13 -04:00
Al
309d41a652 [math] adding matrix_zero method 2015-10-25 21:38:59 -04:00
Al
e3b80534ba [polygons] Adding methods for fixing polygons in base RTreePolygonIndex, moving the current polygon index to an instance variable and adding ability to import from a general GeoJSON-like structure instead of just shapefiles 2015-10-25 18:36:12 -04:00
Al
4f784060a3 [python] Adding word_token_types 2015-10-25 18:33:09 -04:00
Al
2d4b3a6e2f [parser/formatting] Appendinge suburb between the road line and any subsequent lines for all bottom-up address formats. Effectively inserts neighborhoods into our version without making the OpenCage formats overly verbose. Also fixing post-format replace with group capture 2015-10-24 01:31:35 -04:00
Al
da53d7ebac [osm] Adding an OSM neighborhoods/suburbs data set for matching with Quattroshapes boundaries, updating definitions for admin boundaries 2015-10-22 11:37:11 -04:00
Al
6478e65a06 [osm] Moving Wikipedia title normalization to osm.extract 2015-10-22 11:35:38 -04:00
Al
ff3a3c2201 [fix] disambiguation tokenizer to pypostal 2015-10-21 16:35:55 -04:00
Al
6f6d04966b [fix] role in OSM polygon extraction 2015-10-21 16:35:25 -04:00
Al
336bfe32ca [osm/formatter] Switching back to OpenCageData repo 2015-10-21 16:34:24 -04:00
Al
d0aa3b9109 [polygons/osm] alt_name 2015-10-20 09:27:55 -04:00
Al
218eae548c [fix] logger 2015-10-20 06:43:51 -04:00
Al
1e8e592e0b [fix] import 2015-10-19 23:30:12 -04:00
Al
e5129957f8 [osm/polygons] Add relation id to OSM reverse geocoder 2015-10-19 18:00:45 -04:00
Al
5187e6073a [fix] admin boundary imports 2015-10-19 17:14:48 -04:00
Al
0d213e426a [fix] logger 2015-10-19 15:55:19 -04:00
Al
f5bd9b8371 [polygons/osm] logging during reverse geocoder construction 2015-10-19 15:54:07 -04:00
Al
8609ccbb1d [polygons/osm] lon, lat 2015-10-19 15:40:43 -04:00
Al
ef94f1b712 [doc] Adding some comments to fetch_osm_address_data.sh 2015-10-19 15:39:31 -04:00
Al
83295b1b34 [polygons/osm] Adding in-memory OSM reverse geocoder for all admin boundaries 2015-10-19 15:38:23 -04:00
Al
4a3994c65e [polygons/osm] Construct polygons from OSM relations using a number of space-saving optimizations in order to process planet in a reasonable amount of memory. Builds a graph of connected ways such that forming polygons is equivalent to finding strongly connected components. 2015-10-18 20:53:49 -04:00
Al
b44a72588f [polygons/osm] Connecting OSM polygons from their constituent ways is an instance of finding strongly connected components in a graph, adding implementation 2015-10-18 18:23:27 -04:00
Al
a2ad829d52 [math] matrix scalar arithmetic 2015-10-16 16:26:27 -04:00