Commit Graph

11 Commits

Author SHA1 Message Date
Al
44908ff95a [parser] No digit normalization in training data-derived parser phrases (for postcodes, etc.), phrases include the new island type, house number phrases if any are valid. Adjacent words are now full phrases if they are part of a multiword token like a city name. For hyphenated names like Carmel-by-the-Sea, adding a version to the phrase dictionary where the hyphens are replaced with spaces 2016-07-21 17:04:57 -04:00
Al
0a8f46bdc3 [parser] Using new geonames designations in parser features 2016-07-21 17:04:57 -04:00
Al
1b94727871 [fix] Check that parser is loaded in parse_address, log and return NULL instead of segfaulting 2016-03-21 18:04:26 -04:00
Al
d4143c1685 [parsing] Adding an optimization to the parser API where, if the entire input is a single known geographic phrase like New York, it returns the most likely label from the training data. That way e.g. a search for 'Florida' doesn't get tagged as 'house.' This doesn't affect training, only prediction. 2016-01-15 20:07:21 -05:00
Al
b9bf5c629e [fix] Moving address_parser_response_destroy into libpostal so caller can free 2015-12-15 00:52:24 -05:00
Al
fe4c528f26 [parser] Using different char_array for each of the potential phrases as token i 2015-12-12 03:23:26 -05:00
Al
e6303f70f3 [fix] removing printf 2015-12-11 02:53:22 -05:00
Al
88b8023ac8 [fix] Bug in address parser feature extraction, can hold onto the wrong pointer 2015-12-10 18:42:28 -05:00
Al
cfd0dc69f2 [parsing] Using the entire phrase as the ith word 2015-12-07 01:19:38 -05:00
Al
24208c209f [parsing] Adding a training data derived index of complete phrases from suburb up to country. Only adding bias and word features for non phrases, using UNKNOWN_WORD and UNKNOWN_NUMERIC for infrequent tokens (not meeting minimum vocab count threshold). 2015-12-05 14:34:19 -05:00
Al
89677d94a3 [parsing] Initial commit of the address parser, training/testing, feature function, I/O 2015-11-30 14:48:13 -05:00