TRIÈST: Counting Triangles in Edge Streams

By Two Sigma on August 15, 2016
The Pregolya River in Kaliningrad (formerly Königsberg), seven of whose bridges inspired 18th-century Swiss mathematician Leonhard Euler to formalize the discipline now known as graph theory.

TRIÈST is a suite of sampling-based, one-pass algorithms for approximate triangle counting from fully-dynamic edge streams.

research article co-authored by a Two Sigma scientist was recently accepted as a full paper—and won Best Student Paper Award (Research Track)—at the 22nd ACM International Conference on Knowledge Discovery and Data Mining (ACM KDD2016). The conference is the main venue for novel research in data mining and data science.

TRIÈST, the suite of algorithms introduced in the paper, is used for approximate triangle counting in large graphs from fully-dynamic edge streams.

The value of high-quality approximations

Graphs are a natural representation for many human and artificial phenomena. Since available graph datasets can be extremely large, a fast and exact computation of characteristic quantities of such datasets is often impractical or infeasible. Additionally, these datasets are by nature very dynamic, with edges and nodes constantly being added and removed. Such datasets can then be seen as a stream of edge additions and deletions.

Characteristic quantities in these graphs are intrinsically volatile, so there is limited added value in computing them exactly. As a practical matter, the goal is rather to keep track, at all times, of a high-quality approximation of these quantities. For efficiency, the algorithms performing this task should aim to exploit the available memory space as much as possible, and they should require only one pass over the stream.

Particularly useful characteristic quantities are the graph-global and vertex-local numbers of triangles in the graph. These quantities are fundamental primitives with many applications, including community detection, topic mining, span and anomaly detection, and protein interaction network analysis.

A brief introduction to TRIÈST

Together with co-authors Lorenzo De Stefani and Eli Upfal from Brown University, and Alessandro Epasto from Google, Two Sigma Labs research scientist Matteo Riondato has developed TRIÈST, a suite of sampling-based, one-pass algorithms for approximate triangle counting from fully-dynamic edge streams. TRIÈST works by storing in main memory a small, fixed-size subset of the edges seen on the stream (i.e., a sample), and it uses this sample to update the triangle estimates continuously.

The estimations computed by TRIÈST are extremely accurate and fast to compute: TRIÈST can process up to 100,000 edges per second, and the estimation error is within 20% of the true value (a nine-fold reduction compared to the previous state of the art).

The views expressed above are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).  The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.