Dataworks Summit: Text Classification with R, Apache Solr and D3.js

Dataworks Summit Europe 2017, April 5 to 6, ICM Munich
Talk on April 5 at 11:30am in room 11

From April 5 to 6, Hortonworks hosts the Dataworks Summit Europe 2017 in Munich. Stephanie Fischer and Christian Winkler from mgm are at the scene and give a talk in the “Apache Spark and Data Science” track. Their lecture “Classifying Unstructured Text – A Hybrid Deterministic/ML Approach” provides a practical introduction to the automatic classification of text as it emerges day by day in social media or in content-driven web portals. Starting from exemplary visualizations, the experts demonstrate how text classification works with machine learning and how training sets can be extended deterministically. All examples use data that is freely available and pre-categorized and thus invites to own experiments. As software tools, R, Apache Solr and D3.js are used as well as other Apache tools for natural language processing and machine learning.