[parsing] Adding an optimization to the parser API where, if the entire input is a single known geographic phrase like New York, it returns the most likely label from the training data. That way e.g. a search for 'Florida' doesn't get tagged as 'house.' This doesn't affect training, only prediction.

This commit is contained in:
Al
2016-01-15 20:07:21 -05:00
parent 24b4a680c3
commit d4143c1685
2 changed files with 58 additions and 3 deletions

View File

@@ -96,6 +96,17 @@ typedef enum {
NUM_ADDRESS_PARSER_TYPES
} address_parser_components;
#define ADDRESS_PARSER_LABEL_HOUSE "house"
#define ADDRESS_PARSER_LABEL_HOUSE_NUMBER "house_number"
#define ADDRESS_PARSER_LABEL_ROAD "road"
#define ADDRESS_PARSER_LABEL_SUBURB "suburb"
#define ADDRESS_PARSER_LABEL_CITY_DISTRICT "city_district"
#define ADDRESS_PARSER_LABEL_CITY "city"
#define ADDRESS_PARSER_LABEL_STATE_DISTRICT "state_district"
#define ADDRESS_PARSER_LABEL_STATE "state"
#define ADDRESS_PARSER_LABEL_POSTAL_CODE "postal_code"
#define ADDRESS_PARSER_LABEL_COUNTRY "country"
typedef union address_parser_types {
uint32_t value;
struct {