Analýza a kategorizácia veľkého množstva dopytov vyhľadávania pomocou Big Data technológií
Termín prednášky bol 09.10.2014 od
10:50 - 11:10
V rámci podujatia Big data - veľké očakávania. A realita?
Classification of short text into pre-defined hierarchy of categories is a challenge. Need to categorize short texts comes from multiple domains: keywords and queries in online advertising, improvement of results of searches, analysis of tweets or messages in social networks, etc. Our session will share strategies and successes we have realized to categorize large volume of Queries and Keywords by using open collections of documents (Wikipedia, DBPedia, Freebase), Hadoop and Solr. Our approach reuses the knowledge encoded in Wikipedia articles to build a classification model. We will describe dataflow we built at Magnetic to process large sets of queries and keywords and categorize them at scale