Commit Graph

1201 Commits

Author SHA1 Message Date
Al
6cccc3ee46 [fix] README addition 2015-12-15 16:07:21 -05:00
Al
d1833a8f8f [docs] Updating README with parsing info/examples 2015-12-15 16:00:58 -05:00
Al
83ba053373 [build] Removing setup.py fanciness. Install the C library first, then run setup.py or pip install 2015-12-15 14:31:58 -05:00
Al
e0c0ed2d04 [numex] Return true if numex table already loaded 2015-12-15 14:28:40 -05:00
Al
7e04017851 [fix] default for libdir 2015-12-15 12:21:49 -05:00
Al
40641209ee [build] Build shared lib in site-packages 2015-12-15 12:19:40 -05:00
Al
04430f1a8e [fix] var 2015-12-15 10:51:56 -05:00
Al
d8f731b672 [build] setup.py include/library dirs 2015-12-15 10:50:57 -05:00
Al
faf8b00596 [python] libpostal includes 2015-12-15 02:56:02 -05:00
Al
d2426d3777 [build] build_ext 2015-12-15 02:31:48 -05:00
Al
cb648b63da [build] Adding include and library dirs based on autoconf prefix 2015-12-15 02:21:15 -05:00
Al
7cf48acd20 [fix] standard headers in new extensions 2015-12-15 01:18:33 -05:00
Al
bec43750d5 [build] bumping Python version 2015-12-15 00:58:11 -05:00
Al
33fdb912b6 [build] setup.py changes for parser extension 2015-12-15 00:56:53 -05:00
Al
c40ab06dd6 [python] Forgot expand.py 2015-12-15 00:56:34 -05:00
Al
842ef4526b [python] Adding address parser Python API 2015-12-15 00:55:41 -05:00
Al
b9bf5c629e [fix] Moving address_parser_response_destroy into libpostal so caller can free 2015-12-15 00:52:24 -05:00
Al
ab3ba249d7 [python/build] Modified install command for setup.py allowing --datadir and --prefix to be passed in. If there's a virtualenv active and nothing else is specified, install libpostal and its data files there by default 2015-12-14 18:21:21 -05:00
Al
7af0e2d967 [python] Adding Python bindings to the expand API 2015-12-14 18:18:16 -05:00
Al
b59c830ba6 [fix] warning about size_t 2015-12-14 18:17:09 -05:00
Al
406f9c533d [api] Separating parser setup/teardown into two separate methods 2015-12-14 18:15:57 -05:00
Al
0f52f97621 [fix] Python 3 version of tokenize/normalize 2015-12-14 18:14:57 -05:00
Al
3401045b4f [fix] changing labels in Python normalize, adding a NULL check 2015-12-14 14:59:57 -05:00
Al
43b212a09b [fix] size_t in benchmark script 2015-12-14 14:57:11 -05:00
Al
dc03c83bb2 [math] Adding an aligned memory allocator for vectors to help with vectorization/SIMD 2015-12-14 14:56:38 -05:00
Al
bd1e8ecaf8 [fix] default address parser dir 2015-12-12 12:55:37 -05:00
Al
2950358697 [build] address_parser client now links to libpostal, adding address_parser to download script with an "all" option 2015-12-12 12:49:50 -05:00
Al
88836e56e1 [api] Adding parse_address implementation to the libpostal API. GeoDB and address parser are now required. Stripping punctuation from the normalized output 2015-12-12 12:47:44 -05:00
Al
bce6ba2595 [fix] typedef 2015-12-12 11:58:41 -05:00
Al
a8d6cc4053 [api] Moving parse_address definition into libpostal.h 2015-12-12 03:55:31 -05:00
Al
fe4c528f26 [parser] Using different char_array for each of the potential phrases as token i 2015-12-12 03:23:26 -05:00
Al
e6303f70f3 [fix] removing printf 2015-12-11 02:53:22 -05:00
Al
671dd4a5d2 [parser] Fixing possible invalid writes in training for values beginning with a separator 2015-12-11 02:05:05 -05:00
Al
743b74aea5 [parser] Simplifying args in address_parser_data_set_tokenize_line 2015-12-10 18:48:23 -05:00
Al
1d288954d7 [osm] Fixing an issue in the training data with house numbers in OSM (seen mostly in Uruguay) where a comma separated list of house numbers is entered. 2015-12-10 18:46:28 -05:00
Al
88b8023ac8 [fix] Bug in address parser feature extraction, can hold onto the wrong pointer 2015-12-10 18:42:28 -05:00
Al
3de59506ae [parser] Internal separators for parsing purposes include open/close parens, at sign, semicolon, etc. Ignore stray colons not internal to a word (as in Swedish abbreviations) 2015-12-10 18:08:51 -05:00
Al
71d6d3c5e1 [utils] Removing kvec and using similar implementation with pointers that can be passed around 2015-12-10 17:52:23 -05:00
Al
ab205eff96 [utils] Adding a default small size to all arrays based on a look at malloc/realloc usage 2015-12-09 19:46:09 -05:00
Al
779298360c [osm] In cases with more than one official language and where the address language can be determined, use it for looking up language-specific OSM polygons 2015-12-09 01:00:59 -05:00
Al
aeb72d7d26 [osm] Randomly select up to n components for state_district OSM boundaries. For all other fields select one name at random 2015-12-09 00:20:20 -05:00
Al
2c254ebc5e [fix] Belgium cities again 2015-12-08 23:09:28 -05:00
Al
f252869671 [dictionaries] adding ste to English dictionaries 2015-12-08 22:29:52 -05:00
Al
69a469d9d3 [osm] Choosing a language at random in countries with multilingual addresses for the parser training data so we get some monolingual examples 2015-12-08 20:38:32 -05:00
Al
fe37286bcf [fix] Fixes to matrix methods 2015-12-08 17:33:38 -05:00
Al
d9d53ce17e [math] Matrix method updates 2015-12-08 15:39:52 -05:00
Al
48ee665e71 [scripts] Benchmark script using default options 2015-12-08 15:38:44 -05:00
Al
2fcc72ae07 [fix] multitoken canonical strings 2015-12-08 15:38:04 -05:00
Al
a857138d95 [api] Adding place name expansions by default 2015-12-08 15:31:36 -05:00
Al
beec43fe15 [expansion] regenerating expansion data 2015-12-08 15:28:54 -05:00