Author: Matteo Riondato (Two Sigma)
Presented at: IEEE MIT Undergraduate Technology Research Conference, Massachusetts Institute of Technology, Cambridge, MA
Abstract: Obtaining actionable insights from large datasets requires the use methods that must be, at once, fast, scalable, and statistically sound. This is the field of study of algorithmic data science, a discipline at the border of computer science and statistics. In this talk I outline the fundamental questions that motivate research in this area, present a general framework to formulate many problems in this field, introduce the challenges in balancing theoretical and statistical correctness with practical efficiency, and I show how sampling-based algorithms are extremely effective at striking the correct balance in many situations, giving examples from social network analysis and pattern mining. I will conclude with some research directions and areas for future explorations.