Commit Graph

4985 Commits

Author SHA1 Message Date
Al
754f22c79a [parser] moving feature printing to averaged perceptron tagger, taking advantage of trie prefix-sharing in feature incorporating previous tags 2017-03-06 20:32:50 -05:00
Al
839a13577d [parser] fixing affix-related valgrind errors in address parser features 2017-03-06 20:28:42 -05:00
Al
c3581557a1 [parser] counting classes instead of keeping a set 2017-03-06 20:05:01 -05:00
Al
a5283cb313 [fix] trie_new_from_hash 2017-03-06 15:57:42 -05:00
Al
23ed916f09 [openaddresses] adding Hattiesburg, MS 2017-03-06 15:45:23 -05:00
Al
90cb4d904d [openaddresses] adding Longueuil, QC, Canada 2017-03-06 15:43:51 -05:00
Al
5113a1bc32 [utils] tracking keys added in trie construction from hash 2017-03-06 15:28:26 -05:00
Al
dd4f3eb84c [parser] simpler feature names for the state-transition features 2017-03-06 15:25:10 -05:00
Al
39fa8ff1a5 [parser] counting num classes in address parser init for models where it is needed a priori 2017-03-06 15:17:52 -05:00
Al
5f19e63cbe [parser] more logging in init 2017-03-06 15:11:39 -05:00
Al
4d2f77b3f3 [openaddresses] add city of Alexandria, LA 2017-03-06 14:30:25 -05:00
Al
bb922e4ce4 [parser] adding log message 2017-03-06 12:25:22 -05:00
Al
b97de96ab4 [parser] fixing chunked shuffle, making awk splitting work on Mac 2017-03-05 15:06:02 -05:00
Al
0e49fc580a [parser] uint64_t chunk size, no warning if gshuf is available 2017-03-05 14:50:47 -05:00
Al
d99f83b84a [openaddresses] add unit phrases in Cape Girardeau, MO 2017-03-05 04:00:41 -05:00
Al
d1bcced706 [openaddresses] adding some of the new Mississippi sources and city of Cape Girardeau, MO 2017-03-05 03:59:07 -05:00
Al
5d73aa1295 [fix] don't write formatted addresses in the ways-only data set unless the formatter returns non-None value 2017-03-05 03:50:00 -05:00
Al
b76b7b8527 [parser] adding chunked shuffle as a C function (writes each line to one of n random files, runs shuf on each file and concatenates the result). Adding a version which allows specifying a specific chunk size, and using a 2GB limit for address parser training. Allowing gshuf again for Mac as it seems the only problem there was not having enough memory when testing on a Mac laptop. The new limited-memory version should be fast enough. 2017-03-05 02:15:11 -05:00
Al
ba4052c9ba [openaddresses] add Muskogee, OK 2017-03-03 14:57:36 -05:00
Al
2704708f47 [openaddresses] add Orange County, NY 2017-03-03 14:27:05 -05:00
Al
da62fb62ba [openaddresses] adding Polk County, NC 2017-03-03 13:45:58 -05:00
Al
ce21635b00 [openaddresses] adding city of Salina, KS 2017-03-03 13:45:25 -05:00
Al
b4437848c4 [fix] override_country_dir 2017-03-02 14:31:53 -05:00
Al
69351cad98 [openaddresses] add Tippecanoe County, IN 2017-03-02 13:36:22 -05:00
Al
6b8b6982aa [addresses] more classmethods 2017-03-02 04:23:09 -05:00
Al
f7c8a63093 [addresses] making most of the methods on AddressComponents classmethods if possible so they can be accessed easily for sources not using OSM polygon lookup, etc. 2017-03-01 15:51:56 -05:00
Al
702901608b [openaddresses_uk] adding OpenAddresses UK as a data set. No lat/lons but it does have addresses, cities and postcodes 2017-03-01 15:44:25 -05:00
Al
375f7b1684 [addresses] making postcode before {suburb,city} more likely in the UK for #39 2017-03-01 15:43:26 -05:00
Al
a5d8700df3 [openaddresses] use override_country_dir config option in OA address formatter 2017-03-01 13:52:07 -05:00
Al
0890c712e2 [openaddresses] adding override_country_dir and country codes for Puerto Rico and French dependencies 2017-03-01 13:48:04 -05:00
Al
c80b771f94 [openaddresses] add override_country_dir in Puerto Rico 2017-03-01 13:45:44 -05:00
Al
dbc5d6b866 [openaddresses] remove OSM boundaries from East Peoria 2017-03-01 13:45:12 -05:00
Al
0d4c08d536 [openaddresses] ignore unit containing Fl in DeKalb county 2017-03-01 02:54:37 -05:00
Al
45e71a21bb [openaddresses] adding Kalamaria, Thessaloniki, Greece 2017-03-01 01:11:46 -05:00
Al
26f5c403d3 [openaddresses] add Henry County, GA 2017-02-28 23:01:28 -05:00
Al
f6e9cbf8a0 [openaddresses] adding Gwinnett County, GA 2017-02-28 22:52:11 -05:00
Al
b9424c6c69 [openaddresses] adding Cobb County, GA 2017-02-28 22:49:23 -05:00
Al
357af3d465 [openaddresses] adding unit to Fayette County, GA and adding a field map for the cities + no OSM boundaries 2017-02-28 22:43:19 -05:00
Al
c71fe9afbf [openaddresses] adding DeKalb County, GA 2017-02-28 18:53:51 -05:00
Al
412dd65d87 [openaddresses] adding Fayette County, GA 2017-02-28 18:45:42 -05:00
Al
e3cff74908 [openaddresses] add Tillamook County, OR 2017-02-28 11:57:43 -05:00
Al
a7813dda16 [openaddresses] adding Clayton County, GA 2017-02-27 12:53:49 -05:00
Al
f507f2bb3e [addresses] fix for Colombian house number formatting if the second regex group is not found 2017-02-25 23:24:06 -05:00
Al
64d0783e73 [addresses] Chinese and Colombian house number regex changes 2017-02-25 23:19:12 -05:00
Al
7d699c52b8 [openaddresses] add Chinese name for Wuhan, OSM uses Chinese / English for the name 2017-02-25 22:27:55 -05:00
Al
68afed1658 [fix] typo 2017-02-25 17:52:20 -05:00
Al
fdb07d7898 [openaddresses] add Laval, QC 2017-02-25 17:23:33 -05:00
Al
c744edce12 [openaddresses] add Moore and Montgomerey counties, TX 2017-02-25 14:21:36 -05:00
Al
49fe1db613 [openaddresses] adding Vernon County, MO 2017-02-24 16:31:31 -05:00
Al
d4de170c94 [openaddresses] adding city of Monroe, MI 2017-02-24 13:57:57 -05:00