
University of Grenoble
CNRS, LIG
Grenoble, France
Date
July 25 (Tuesday)
Duration
09:00 - 12:00 (Half-day)
Abstract
Application traces contain valuable information for developers. They can be analyzed to extract workflows, discover usage patterns, or characterize anomalies. The typical toolbox of a data scientist comprises clustering, pattern mining, and classifier algorithms to perform this analysis. To extract more fine-grained and reliable information, it is often necessary to process increasingly large amounts of data, to the point that many existing implementations struggle to obtain results. In this tutorial, we will see how Apache Spark, an open source platform for big data processing, can be used to alleviate this issue. Developing an application using Spark allows the analyst to perform small-scale analysis on a laptop, and then deploy the same code on clusters or clouds to benefit from more processing power when dealing with a large-scale dataset. We will start by presenting Spark’s architecture and computing paradigm. Then, we will describe several analysis scenarios on real application traces and do a code walkthrough of the applications used to mine these traces.
About the Speaker
Vincent Leroy holds an associate professor position at the University
of Grenoble with a research chair from CNRS. He is a permanent member of the
Scalable Information Discovery and Exploitation (SLIDE) research group.
He earned a PhD degree on large-scale distributed systems for social applications
from Inria Rennes, France, in 2010. From 2010 to 2012, he worked on distributed
search engines at Yahoo! Research Labs in Barcelona, Spain. Vincent’s
research interests lie at the intersection of distributed system and
large-scale data management. Currently, he is working on the design of
algorithms and architectures to efficiently perform large-scale data
mining.
[Back]
