Commit Graph

4777 Commits

Author SHA1 Message Date
Al
5113a1bc32 [utils] tracking keys added in trie construction from hash 2017-03-06 15:28:26 -05:00
Al
dd4f3eb84c [parser] simpler feature names for the state-transition features 2017-03-06 15:25:10 -05:00
Al
39fa8ff1a5 [parser] counting num classes in address parser init for models where it is needed a priori 2017-03-06 15:17:52 -05:00
Al
5f19e63cbe [parser] more logging in init 2017-03-06 15:11:39 -05:00
Al
4d2f77b3f3 [openaddresses] add city of Alexandria, LA 2017-03-06 14:30:25 -05:00
Al
bb922e4ce4 [parser] adding log message 2017-03-06 12:25:22 -05:00
Al
b97de96ab4 [parser] fixing chunked shuffle, making awk splitting work on Mac 2017-03-05 15:06:02 -05:00
Al
0e49fc580a [parser] uint64_t chunk size, no warning if gshuf is available 2017-03-05 14:50:47 -05:00
Al
d99f83b84a [openaddresses] add unit phrases in Cape Girardeau, MO 2017-03-05 04:00:41 -05:00
Al
d1bcced706 [openaddresses] adding some of the new Mississippi sources and city of Cape Girardeau, MO 2017-03-05 03:59:07 -05:00
Al
5d73aa1295 [fix] don't write formatted addresses in the ways-only data set unless the formatter returns non-None value 2017-03-05 03:50:00 -05:00
Al
b76b7b8527 [parser] adding chunked shuffle as a C function (writes each line to one of n random files, runs shuf on each file and concatenates the result). Adding a version which allows specifying a specific chunk size, and using a 2GB limit for address parser training. Allowing gshuf again for Mac as it seems the only problem there was not having enough memory when testing on a Mac laptop. The new limited-memory version should be fast enough. 2017-03-05 02:15:11 -05:00
Al
ba4052c9ba [openaddresses] add Muskogee, OK 2017-03-03 14:57:36 -05:00
Al
2704708f47 [openaddresses] add Orange County, NY 2017-03-03 14:27:05 -05:00
Al
da62fb62ba [openaddresses] adding Polk County, NC 2017-03-03 13:45:58 -05:00
Al
ce21635b00 [openaddresses] adding city of Salina, KS 2017-03-03 13:45:25 -05:00
Al
b4437848c4 [fix] override_country_dir 2017-03-02 14:31:53 -05:00
Al
69351cad98 [openaddresses] add Tippecanoe County, IN 2017-03-02 13:36:22 -05:00
Al
6b8b6982aa [addresses] more classmethods 2017-03-02 04:23:09 -05:00
Al
f7c8a63093 [addresses] making most of the methods on AddressComponents classmethods if possible so they can be accessed easily for sources not using OSM polygon lookup, etc. 2017-03-01 15:51:56 -05:00
Al
702901608b [openaddresses_uk] adding OpenAddresses UK as a data set. No lat/lons but it does have addresses, cities and postcodes 2017-03-01 15:44:25 -05:00
Al
375f7b1684 [addresses] making postcode before {suburb,city} more likely in the UK for #39 2017-03-01 15:43:26 -05:00
Al
a5d8700df3 [openaddresses] use override_country_dir config option in OA address formatter 2017-03-01 13:52:07 -05:00
Al
0890c712e2 [openaddresses] adding override_country_dir and country codes for Puerto Rico and French dependencies 2017-03-01 13:48:04 -05:00
Al
c80b771f94 [openaddresses] add override_country_dir in Puerto Rico 2017-03-01 13:45:44 -05:00
Al
dbc5d6b866 [openaddresses] remove OSM boundaries from East Peoria 2017-03-01 13:45:12 -05:00
Al
0d4c08d536 [openaddresses] ignore unit containing Fl in DeKalb county 2017-03-01 02:54:37 -05:00
Al
45e71a21bb [openaddresses] adding Kalamaria, Thessaloniki, Greece 2017-03-01 01:11:46 -05:00
Al
26f5c403d3 [openaddresses] add Henry County, GA 2017-02-28 23:01:28 -05:00
Al
f6e9cbf8a0 [openaddresses] adding Gwinnett County, GA 2017-02-28 22:52:11 -05:00
Al
b9424c6c69 [openaddresses] adding Cobb County, GA 2017-02-28 22:49:23 -05:00
Al
357af3d465 [openaddresses] adding unit to Fayette County, GA and adding a field map for the cities + no OSM boundaries 2017-02-28 22:43:19 -05:00
Al
c71fe9afbf [openaddresses] adding DeKalb County, GA 2017-02-28 18:53:51 -05:00
Al
412dd65d87 [openaddresses] adding Fayette County, GA 2017-02-28 18:45:42 -05:00
Al
e3cff74908 [openaddresses] add Tillamook County, OR 2017-02-28 11:57:43 -05:00
Al
a7813dda16 [openaddresses] adding Clayton County, GA 2017-02-27 12:53:49 -05:00
Al
f507f2bb3e [addresses] fix for Colombian house number formatting if the second regex group is not found 2017-02-25 23:24:06 -05:00
Al
64d0783e73 [addresses] Chinese and Colombian house number regex changes 2017-02-25 23:19:12 -05:00
Al
7d699c52b8 [openaddresses] add Chinese name for Wuhan, OSM uses Chinese / English for the name 2017-02-25 22:27:55 -05:00
Al
68afed1658 [fix] typo 2017-02-25 17:52:20 -05:00
Al
fdb07d7898 [openaddresses] add Laval, QC 2017-02-25 17:23:33 -05:00
Al
c744edce12 [openaddresses] add Moore and Montgomerey counties, TX 2017-02-25 14:21:36 -05:00
Al
49fe1db613 [openaddresses] adding Vernon County, MO 2017-02-24 16:31:31 -05:00
Al
d4de170c94 [openaddresses] adding city of Monroe, MI 2017-02-24 13:57:57 -05:00
Al
d0679294bf [openaddresses] adding positional args so OpenAddresses ingestion can be run only for specific countries, subdirs, or individual files. 2017-02-24 03:40:09 -05:00
Al
e39d4d2f00 [parser] check for non-null prev/prev2 before creating tag-based features 2017-02-24 02:57:16 -05:00
Al
182d60b623 [fix] removing include 2017-02-23 22:45:03 -05:00
Al
6097eacfef [fix] ignore fields in Kauai containing \n 2017-02-23 16:34:34 -05:00
Al
033e8dbb58 [openaddresses] adding Kauai and some component additions for Maui 2017-02-23 16:26:50 -05:00
Al
fa7446deb6 [fix] district field for Wuhan data set 2017-02-23 02:15:55 -05:00