Commit Graph

4743 Commits

Author SHA1 Message Date
Al
e3cff74908 [openaddresses] add Tillamook County, OR 2017-02-28 11:57:43 -05:00
Al
a7813dda16 [openaddresses] adding Clayton County, GA 2017-02-27 12:53:49 -05:00
Al
f507f2bb3e [addresses] fix for Colombian house number formatting if the second regex group is not found 2017-02-25 23:24:06 -05:00
Al
64d0783e73 [addresses] Chinese and Colombian house number regex changes 2017-02-25 23:19:12 -05:00
Al
7d699c52b8 [openaddresses] add Chinese name for Wuhan, OSM uses Chinese / English for the name 2017-02-25 22:27:55 -05:00
Al
68afed1658 [fix] typo 2017-02-25 17:52:20 -05:00
Al
fdb07d7898 [openaddresses] add Laval, QC 2017-02-25 17:23:33 -05:00
Al
c744edce12 [openaddresses] add Moore and Montgomerey counties, TX 2017-02-25 14:21:36 -05:00
Al
49fe1db613 [openaddresses] adding Vernon County, MO 2017-02-24 16:31:31 -05:00
Al
d4de170c94 [openaddresses] adding city of Monroe, MI 2017-02-24 13:57:57 -05:00
Al
d0679294bf [openaddresses] adding positional args so OpenAddresses ingestion can be run only for specific countries, subdirs, or individual files. 2017-02-24 03:40:09 -05:00
Al
e39d4d2f00 [parser] check for non-null prev/prev2 before creating tag-based features 2017-02-24 02:57:16 -05:00
Al
182d60b623 [fix] removing include 2017-02-23 22:45:03 -05:00
Al
6097eacfef [fix] ignore fields in Kauai containing \n 2017-02-23 16:34:34 -05:00
Al
033e8dbb58 [openaddresses] adding Kauai and some component additions for Maui 2017-02-23 16:26:50 -05:00
Al
fa7446deb6 [fix] district field for Wuhan data set 2017-02-23 02:15:55 -05:00
Al
f006bba345 [openaddresses] adding city of Medellín, Colombia 2017-02-22 19:01:26 -08:00
Al
2d59450a51 [openaddresses] adding new Oregon counties 2017-02-22 09:59:20 -08:00
Al
79c2429bba [addresses] strip phrases like "# 123" off of English street names if they follow a thoroughfare/post-directional phrase whose expansion does not contain highway/route 2017-02-22 09:51:43 -08:00
Al
de05292b66 [openaddresses] Del Norte Couty, CA 2017-02-21 01:19:46 -08:00
Al
93768b7ba5 [openaddresses] Eaton County and Tecumseh, MI 2017-02-21 01:17:54 -08:00
Al
08c6831729 [openaddresses] LBC 2017-02-21 01:12:50 -08:00
Al
a2fcac4909 [openaddresses] city of Flower Mound, TX 2017-02-21 01:09:06 -08:00
Al
1d705e80da [openaddresses] adding new BC district data sets 2017-02-21 01:07:47 -08:00
Al
6a079e86b3 [fix] using size_t instead of int in address_parser/address_parser_train 2017-02-20 19:22:13 -08:00
Al
8ea5405c20 [parser] using separate arrays for features requiring tag history and making the tagger responsible for those features so the feature function does not require passing in prev and prev2 explicitly (i.e. don't need to run the feature function multiple times if using global best-sequence prediction) 2017-02-19 14:21:58 -08:00
Al
ae85e3c0a0 [openaddresses] adding Warren County, OH 2017-02-19 14:03:24 -08:00
Al
715520f681 [parser] using new zeros API in averaged_perceptron.c 2017-02-19 14:02:54 -08:00
Al
5444b722cb [addresses] do not exclude # from sampling in Spanish 2017-02-18 12:04:09 -08:00
Al
f76faafd8c [openaddresses] adding a few house number phrases as well in Colombia 2017-02-18 12:03:02 -08:00
Al
adfdc06d14 [addresses] using the number dictionary for abbreviations in house number phrases as well 2017-02-18 12:00:27 -08:00
Al
7cab675809 [openaddresses] adding random formatting to Colombian house numbers that match the {calle}-{building number} format 2017-02-18 11:28:47 -08:00
Al
146412f4f8 [openaddresses] adding country-specific validators and doing no validation on house numbers in Colombia 2017-02-18 11:04:02 -08:00
Al
0e10aa6f46 [openaddresses] adding OSM boundaries for Stearns County, MN 2017-02-18 10:18:09 -08:00
Al
5a31513092 [openaddresses] Adding city of Sioux Falls, SD 2017-02-18 10:13:56 -08:00
Al
64e62cac32 [openaddresses] adding Bogotá, Colombia 2017-02-18 10:13:31 -08:00
Al
4f128579d6 [openaddresses] adding Commerce City, CO and creating an alias for the simple unit regex for reuse 2017-02-17 14:07:00 -05:00
Al
b88487f633 [utils] string_replace_char does single byte/character replacement, new string_replace to do full string replacement, again using char_array for safety, string_replace_with_array function for memory reuse 2017-02-17 13:58:51 -05:00
Al
da856ea5c3 [parser] adding phrase features for category, unit, level, entrance, staircase, and po_box phrases from the libpostal dictionaries, excluding phrases which match the toponyms dictionary (e.g. US states that can also be found in street/venue names, useful for expansion but not here), if the current token is part of both an address dictionary phrase and a component phrase derived from the training data, use the longer of the two, or both if they are the same length 2017-02-17 03:00:48 -05:00
Al
5b616dfb57 [addresses] allowing neighborhood components to be passed in 2017-02-17 02:11:56 -05:00
Al
e7d8577ad7 [openaddresses] add city of San Luis Obispo 2017-02-16 16:00:23 -05:00
Al
d6281648dc [openaddresses] add Cumberland County, NC 2017-02-16 14:49:00 -05:00
Al
1631c25ad0 [openaddresses] add city of O'Fallon, IL 2017-02-16 14:48:40 -05:00
Al
4c4147f465 [openaddresses] add city of Scotsdale, AZ 2017-02-16 14:48:17 -05:00
Al
df76cde1e7 [openaddresses] adding Pickens County, SC 2017-02-16 03:34:49 -05:00
Al
c380b3e91b [parser] phrase search with address dictionaries should not use the language given at training time since it's not currently available at runtime (without pulling in the language classifier, which may be warranted at some point, especially if the model can be made smaller/sparser) 2017-02-15 22:32:30 -05:00
Al
a3e51db32d [api] include some of the new components in default address_components for the libpostal expansion API 2017-02-15 22:29:22 -05:00
Al
32fb483e96 [gazetteers] adding ADDRESS_PO_BOX component 2017-02-15 22:23:28 -05:00
Al
ba0ccc82a3 [fix] var name in address_parser_train 2017-02-15 22:22:33 -05:00
Al
0196fe8736 [utils] fixing key_type in hash_get, adding int64_double map 2017-02-15 22:20:36 -05:00