libpostal

Author	SHA1	Message	Date
Al	8562c7a5cb	[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.	2015-09-23 00:37:59 -04:00
Al	19e5457a0f	[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness	2015-09-23 00:36:29 -04:00
Al	4ad3fac627	[unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes)	2015-09-23 00:35:08 -04:00
Al	13bcc35523	[unicode] Allowing wide chars in unicode properties	2015-09-23 00:34:07 -04:00
Al	f13e9fad90	[tokenization] Regenerated scanner.c	2015-09-23 00:33:27 -04:00
Al	b4593b6f88	[unicode/tokenization] Using new character classes including wide chars in scanner	2015-09-23 00:33:14 -04:00
Al	a76831df7a	[unicode] Wide version of word breaks	2015-09-22 18:55:33 -04:00
Al	25917cfb17	[fix] scripts	2015-09-22 15:15:30 -04:00
Al	b405a53fe1	[fix] chars out of range in get_string_script Python version	2015-09-22 08:14:27 -04:00
Al	ca25b48687	[fix] Not writing empty fields in formatted addresses	2015-09-22 08:13:55 -04:00
Al	747de1944b	[fix] Accounting for unknown scripts in disambiguation	2015-09-21 18:05:28 -04:00
Al	3ac89d7ed9	[setup] fixing packaging	2015-09-21 17:31:15 -04:00
Al	236737eab3	[tokenization/osm] Using utf8 encoded version of string for tokens in python tokenizer	2015-09-21 17:27:43 -04:00
Al	134cf616d6	[osm] Using street for language disambiguation in training data	2015-09-21 04:09:15 -04:00
Al	ccac4a5a90	[fix] package directory	2015-09-21 04:01:36 -04:00
Al	5f912ddcd3	[fix] std=c99	2015-09-21 03:25:32 -04:00
Al	5b2fd0be50	[fix] pytokenize compilation on Ubuntu/gcc	2015-09-21 03:24:14 -04:00
Al	cffa5a4a20	[fix] stdint include	2015-09-20 20:10:47 -04:00
Al	25b3338600	[setup] setup.py for pypostal so it can be installed from the Github url	2015-09-20 20:07:59 -04:00
Al	84cf21df88	[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples	2015-09-20 20:05:46 -04:00
Al	5485ea2197	[python] Adding initial pypostal bindings for tokenize so we can remove address_normalizer dependency. Not tested on Python 3.	2015-09-20 14:59:39 -04:00
Al	3fab0f984f	[fix] fixing some compiler warnings, using type-specific abs functions for vector_math	2015-09-19 16:11:09 -04:00
Al	6731395ca0	[osm] Separating tagged from untagged output	2015-09-19 14:11:47 -04:00
Al	2940cc15b8	[fix] tokenized string destroy frees original string	2015-09-19 01:40:41 -04:00
Al	2b13871341	[constants] max country code length	2015-09-19 01:39:58 -04:00
Al	0396823772	[fix] geodb path separator	2015-09-19 01:39:31 -04:00
Al	17cfdb0625	[fix] adding char_array_append_* methods to header	2015-09-18 13:19:42 -04:00
Al	f2f7db92ff	[fix] phrases	2015-09-18 13:19:18 -04:00
Al	b74e92adad	[fix] include	2015-09-18 13:18:49 -04:00
Al	2a869894d9	[fix] geodb	2015-09-18 13:18:26 -04:00
Al	9e9131bda0	[parser] Averaged perceptron tagger	2015-09-17 05:51:24 -04:00
Al	8a86f7ec64	[parser] Adding context struct to feature function	2015-09-17 05:48:00 -04:00
Al	87ed7d9a0f	[geodb] Adding trie search methods for finding geodb phrases	2015-09-16 22:11:10 -04:00
Al	e62c75b9c6	[phrases] Adding _with_phrases versions of address dictionary methods for pre-allocated phrases	2015-09-16 21:24:28 -04:00
Al	23103a21d4	[phrases] Adding with_phrases versions of trie search methods for pre-allocated phrases	2015-09-16 21:23:34 -04:00
Al	d5ec005787	[transliteration] Similar init method for transliteration	2015-09-16 21:14:02 -04:00
Al	b11362ab98	[numex] using module init method for building, otherwise passing NULL path uses the default path	2015-09-16 21:13:05 -04:00
Al	3cba2e8df3	[api] Using default setup methods for submodules in libpostal setup	2015-09-15 14:01:33 -04:00
Al	e122824448	[expansion] Adding the ability to search address dictionary phrases with a NULL language, will return phrases in any language	2015-09-15 14:00:26 -04:00
Al	c47ff1b113	[utils] Adding source string to tokenized_string struct	2015-09-15 13:21:51 -04:00
Al	b2f690b6f6	[api] Error logging if modules can't be found	2015-09-15 13:21:15 -04:00
Al	9de3029dd3	[parser] Averaged perceptron training does full examples (greedily). During training, features are a hashtable, sorted and converted to a trie during finalize	2015-09-14 17:38:45 -04:00
Al	a5b5f80b04	[fix] new_copy	2015-09-14 16:50:23 -04:00
Al	3ea6358f77	[fix] vector zeros allocation	2015-09-14 16:50:08 -04:00
Al	c21f61b9b4	[parser] Default address parser path	2015-09-11 15:05:38 -07:00
Al	32c180528f	[tokens] Adding a copy_tokens option for tokenized_string	2015-09-11 15:04:29 -07:00
Al	9ce658b7a3	[collections] Adding string_array for an array of char pointers	2015-09-10 16:34:16 -07:00
Al	35b9122a1a	[utils] inlining a few functions	2015-09-10 16:33:54 -07:00
Al	35f1c02caf	[polygons] Reducing simplify tolerance for language polys now that regional languages are handled separately	2015-09-10 12:44:13 -07:00
Al	440a8158b6	[polygons] Adding in country languages for regional polygons without a default language	2015-09-10 12:34:26 -07:00

1 2 3 4 5 ...

819 Commits