Technology

Presentation on "A Scalable Bootstrap for Massive Data"

By Two Sigma on June 4, 2016

A Two Sigma quantitative engineer walks through a paper on the history of formulas for computing meta-statistics on any distribution.

At Two Sigma, we look for inspiration wherever we can, and academic papers are some of the richest sources out there. We love papers so much that we’re a Platinum Sponsor of the New York chapter of Papers We Love.

In addition to sponsoring, multiple Two Sigma employees have presented some of their favorite papers at Papers We Love meetups. Matt Adereth, one of our quantitative engineers, recently presented at the San Francisco chapter on a paper entitled A Scalable Bootstrap for Massive Data.

A key ingredient to our methodology is using statistics to understand the systems that we build and run, but few people consider the goodness of those statistics. In statistics, we are interested in calculating quantitative estimates of some (often unknown) distribution where we only have a small set of observed data. We use an estimator to calculate this estimate from the data. It is important to understand the quality of this estimator. Estimator Quality Estimators are a form of statistics about statistics that can be used to gain insight into the accuracy and precision of results.

The formulas for computing these statistics involve some complex mathematics that only works for a select few statistics on a select few distributions. However, with the advent of computers, it turned out that there are some simple algorithms that use only basic algebra and can be used to compute them for any statistic on any distribution!

In this talk, Matt walks through the history of the solutions all the way from earliest algorithms from the 1940’s to the modern distributed approaches that can run on “big data.”

scalable bootstrap for massive Data presentation Feb 18th 2016

This article is not an endorsement by Two Sigma of the papers discussed, their viewpoints or the companies discussed. The views expressed above reflect those of the authors and are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.

Presentation on "A Scalable Bootstrap for Massive Data"

Technical Report: "Introduction to Compiler Generation Using HACS"

Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms

An AI Approach to Fed Watching

Tags:

Technical Report: "Introduction to Compiler Generation Using HACS"

Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms

An AI Approach to Fed Watching