Commit Graph

4 Commits

Author SHA1 Message Date
Al
34fe7ec305 [expand] adding a few of the address phrase checks to the expand header 2017-12-30 02:34:06 -05:00
Al
152761fcbc [expand] adding improvements to root expansions (using possible phrase roots even if they're abbreviated e.g. "E Ctr St", adding special valid components check for root expansions beyond what's stored in the build address dictionaries), removing spaces before checking unique strings, only splitting numeric from alpha in the case of non-ordinals, using cstring_array internally and char ** in the public API 2017-12-25 01:37:42 -05:00
Al
3f7abd5b24 [expand] adding a method that allows hash/equality comparisons of addresses like "100 Main" with "100 S Main St." or units like "Apt 101" vs. "#101". Instead of expanding the phrase abbreviations, this version tries its best to delete all but the root words in a string for a specific component. It's probably not perfect, but does handle a number of edge cases related to pre/post directionals in English e.g. "E St" will have a root word of simply "E", "Avenue E" => "E", etc. Also handles a variety of cases where the phrase could be a thoroughfare type but is really a root word such as "Park Pl" or the famous "Avenue Rd". This can be used for near dupe hashing to catch possible dupes for later analysis. Note that it will normalize "St Marks Pl" and "St Marks Ave" to the same thing, which is sometimes warranted (if the user typed the wrong thoroughfare), but can also be reconciled at deduping time. 2017-12-17 15:48:11 -05:00
Al
8968a6c966 [expand] moving expand to its own module so the internal methods can be exposed, calling from libpostal.c 2017-12-08 16:26:13 -05:00