Al
|
8ea5405c20
|
[parser] using separate arrays for features requiring tag history and making the tagger responsible for those features so the feature function does not require passing in prev and prev2 explicitly (i.e. don't need to run the feature function multiple times if using global best-sequence prediction)
|
2017-02-19 14:21:58 -08:00 |
|
Al
|
ae85e3c0a0
|
[openaddresses] adding Warren County, OH
|
2017-02-19 14:03:24 -08:00 |
|
Al
|
715520f681
|
[parser] using new zeros API in averaged_perceptron.c
|
2017-02-19 14:02:54 -08:00 |
|
Al
|
5444b722cb
|
[addresses] do not exclude # from sampling in Spanish
|
2017-02-18 12:04:09 -08:00 |
|
Al
|
f76faafd8c
|
[openaddresses] adding a few house number phrases as well in Colombia
|
2017-02-18 12:03:02 -08:00 |
|
Al
|
adfdc06d14
|
[addresses] using the number dictionary for abbreviations in house number phrases as well
|
2017-02-18 12:00:27 -08:00 |
|
Al
|
7cab675809
|
[openaddresses] adding random formatting to Colombian house numbers that match the {calle}-{building number} format
|
2017-02-18 11:28:47 -08:00 |
|
Al
|
146412f4f8
|
[openaddresses] adding country-specific validators and doing no validation on house numbers in Colombia
|
2017-02-18 11:04:02 -08:00 |
|
Al
|
0e10aa6f46
|
[openaddresses] adding OSM boundaries for Stearns County, MN
|
2017-02-18 10:18:09 -08:00 |
|
Al
|
5a31513092
|
[openaddresses] Adding city of Sioux Falls, SD
|
2017-02-18 10:13:56 -08:00 |
|
Al
|
64e62cac32
|
[openaddresses] adding Bogotá, Colombia
|
2017-02-18 10:13:31 -08:00 |
|
Al
|
4f128579d6
|
[openaddresses] adding Commerce City, CO and creating an alias for the simple unit regex for reuse
|
2017-02-17 14:07:00 -05:00 |
|
Al
|
b88487f633
|
[utils] string_replace_char does single byte/character replacement, new string_replace to do full string replacement, again using char_array for safety, string_replace_with_array function for memory reuse
|
2017-02-17 13:58:51 -05:00 |
|
Al
|
da856ea5c3
|
[parser] adding phrase features for category, unit, level, entrance, staircase, and po_box phrases from the libpostal dictionaries, excluding phrases which match the toponyms dictionary (e.g. US states that can also be found in street/venue names, useful for expansion but not here), if the current token is part of both an address dictionary phrase and a component phrase derived from the training data, use the longer of the two, or both if they are the same length
|
2017-02-17 03:00:48 -05:00 |
|
Al
|
5b616dfb57
|
[addresses] allowing neighborhood components to be passed in
|
2017-02-17 02:11:56 -05:00 |
|
Al
|
e7d8577ad7
|
[openaddresses] add city of San Luis Obispo
|
2017-02-16 16:00:23 -05:00 |
|
Al
|
d6281648dc
|
[openaddresses] add Cumberland County, NC
|
2017-02-16 14:49:00 -05:00 |
|
Al
|
1631c25ad0
|
[openaddresses] add city of O'Fallon, IL
|
2017-02-16 14:48:40 -05:00 |
|
Al
|
4c4147f465
|
[openaddresses] add city of Scotsdale, AZ
|
2017-02-16 14:48:17 -05:00 |
|
Al
|
df76cde1e7
|
[openaddresses] adding Pickens County, SC
|
2017-02-16 03:34:49 -05:00 |
|
Al
|
c380b3e91b
|
[parser] phrase search with address dictionaries should not use the language given at training time since it's not currently available at runtime (without pulling in the language classifier, which may be warranted at some point, especially if the model can be made smaller/sparser)
|
2017-02-15 22:32:30 -05:00 |
|
Al
|
a3e51db32d
|
[api] include some of the new components in default address_components for the libpostal expansion API
|
2017-02-15 22:29:22 -05:00 |
|
Al
|
32fb483e96
|
[gazetteers] adding ADDRESS_PO_BOX component
|
2017-02-15 22:23:28 -05:00 |
|
Al
|
ba0ccc82a3
|
[fix] var name in address_parser_train
|
2017-02-15 22:22:33 -05:00 |
|
Al
|
0196fe8736
|
[utils] fixing key_type in hash_get, adding int64_double map
|
2017-02-15 22:20:36 -05:00 |
|
Al
|
be6f48f109
|
[fix] that didn't work, set log level to CRITICAL
|
2017-02-15 14:06:57 -05:00 |
|
Al
|
26bf617a06
|
[fix] prevent Shapely from logging to console
|
2017-02-15 14:00:51 -05:00 |
|
Al
|
a0b508caf6
|
[transliteration] adding no-args option for transliteration_rules script
|
2017-02-15 13:22:33 -05:00 |
|
Al
|
8abfa766fd
|
[fix] paren
|
2017-02-15 02:26:18 -05:00 |
|
Al
|
06003dfbb0
|
[fix] lower probability of name:prefix
|
2017-02-14 18:57:31 -05:00 |
|
Al
|
92b34f6af4
|
[fix] var name
|
2017-02-14 18:53:53 -05:00 |
|
Al
|
ca79342636
|
[fix] config
|
2017-02-14 18:50:51 -05:00 |
|
Al
|
8eafc5730b
|
[parser] adding long-context features which help classify the first token in the string by finding the relative positions of a) the first numeric token and b) the first street-level phrase like "Ave" or "Calle"
|
2017-02-14 18:42:51 -05:00 |
|
Al
|
08976c772e
|
[neighborhoods] base parser config changes for new prefix/first_match options
|
2017-02-14 18:19:15 -05:00 |
|
Al
|
64673c2875
|
[neighborhoods] add neighborhoods that are not the top match occasionally
|
2017-02-14 18:17:48 -05:00 |
|
Al
|
b99e31ca17
|
[neighborhoods] add name:prefix in admin boundaries and neighborhoods (used often in e.g. Germany), use alternative/language keys as well
|
2017-02-14 18:07:13 -05:00 |
|
Al
|
614479aee1
|
[neighborhoods] don't add point with same name as existing OSM polygon
|
2017-02-14 16:40:08 -05:00 |
|
Al
|
854ac853a9
|
[fix] OSM neighborhoods index check
|
2017-02-14 03:53:22 -05:00 |
|
Al
|
a416a314fa
|
[fix] var
|
2017-02-14 03:38:53 -05:00 |
|
Al
|
56f68e4399
|
[phrases] fixing trie suffix search
|
2017-02-14 03:36:29 -05:00 |
|
Al
|
5bbc0e15d7
|
[fix] poly.context
|
2017-02-14 03:09:02 -05:00 |
|
Al
|
1ee0f1fe0d
|
[osm] mapping admin_level=11 to suburb in Germany, admin_level=10 to suburb in Berlin
|
2017-02-14 02:34:47 -05:00 |
|
Al
|
b9c24867d7
|
[fix] set component to suburb on OSM neighborhoods index
|
2017-02-14 02:19:53 -05:00 |
|
Al
|
072a838fde
|
[neighborhoods] components are now pre-calculated by CTH index
|
2017-02-14 02:04:07 -05:00 |
|
Al
|
c91a0bdb91
|
[fix] rm
|
2017-02-14 01:56:24 -05:00 |
|
Al
|
949c10ab22
|
[fix] remove print
|
2017-02-14 01:53:54 -05:00 |
|
Al
|
738bd7b525
|
[neighborhoods] logging, moving OSM/CTH before Quattroshapes for easier testing
|
2017-02-14 01:52:59 -05:00 |
|
Al
|
67f69ce6ce
|
[fix] move
|
2017-02-14 01:51:23 -05:00 |
|
Al
|
6d580f4c87
|
[osm] neighborhood polygon reader
|
2017-02-14 01:50:04 -05:00 |
|
Al
|
6c68d446a0
|
[neighborhoods] adding ClickThatHood config to whitelist/specify what kind of polygon is specified in each file. Adding OSM neighborhoods (ways/relations where place=neighbourhood to reduce ambiguity) as the highest priority, followed by CTH/OSM, CTH, Quattro/OSM, Quattro
|
2017-02-14 01:48:43 -05:00 |
|