[docs] adding note about the newly-trained language classifier trained with FTRL-Proximal (now 1/10th the size), which keeps its high accuracy while maintaining a sparse solution. This commit will trigger a build with the freshly uploaded model.

This commit is contained in:
Al
2017-04-06 11:43:54 -04:00
parent 5a96be5d5c
commit 5605ba3185

View File

@@ -433,8 +433,8 @@ tagged traning examples for every inhabited country in the world. Many types of
are performed to make the training data resemble real messy geocoder input as closely as possible. are performed to make the training data resemble real messy geocoder input as closely as possible.
- **Language classification**: multinomial logistic regression - **Language classification**: multinomial logistic regression
trained on all of OpenStreetMap ways, addr:* tags, toponyms and formatted trained (using the [FTRL-Proximal](https://research.google.com/pubs/archive/41159.pdf) method to induce sparsity) on all of OpenStreetMap ways, addr:* tags, toponyms and formatted
addresses. Labels are derived using point-in-polygon tests in Quattroshapes addresses. Labels are derived using point-in-polygon tests for both OSM countries
and official/regional languages for countries and admin 1 boundaries and official/regional languages for countries and admin 1 boundaries
respectively. So, for example, Spanish is the default language in Spain but respectively. So, for example, Spanish is the default language in Spain but
in different regions e.g. Catalunya, Galicia, the Basque region, the respective in different regions e.g. Catalunya, Galicia, the Basque region, the respective