Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms

By Two Sigma on June 4, 2016
A Picture of servers with amazing cable management

Comparing big graph centrality measures, approximation algorithm quality guarantees, and the trade-offs and scalability behaviors of distributed algorithms.

Relationships between entities, such as friendships among people and similarities between objects, can naturally be represented as a graph (or network), where the objects are the nodes (or vertices) of the graph, and edges connect the nodes that are related. Social networks, both online and offline, are a great example of graphs, as are the Web—with its interlinked webpages, and the Internet –as a collection of machines and routers.

Identifying the “important” nodes or edges in a graph is a fundamental task in network analysis, with many diverse applications in fields such as economics, biology, security, and sociology. Several measures of importance, known as centrality indices, have been proposed over the years, formalizing the concept of importance in different ways.

Measuring centrality

Centrality measures rely on graph properties to quantify importance.  For example, “betweenness centrality,” one of the most commonly used centrality indices, counts the fraction of shortest paths going through a node, while the “closeness centrality” of a node is the average sum of the inverse of the distance to other nodes. Even PageRank, Google’s original algorithm for ranking web pages, is a centrality measure.

With the proliferation of huge networks with millions of nodes and billions of edges, the importance of having scalable algorithms for computing centrality indices has become more and more important. A number of contributions have been recently proposed, ranging from heuristics that perform extremely well in practice, to approximation algorithms offering strong probabilistic guarantees, to scalable algorithms for the MapReduce platform.

Two Sigma Labs’ Matteo Riondato, together with coauthors Francesco Bonchi (ISI Foundation, Turin, Italy) and Gianmarco De Francisci Morales (Qatar Computing Research Institute, Doha, Qatar), presented the tutorial Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms at the 25th World Wide Web Conference, the main scientific event related to the Web, which took place in Montreal, Quebec, in April.

This tutorial presents, in a unified framework, some of the many measures of centrality. It also discusses the algorithms to compute them, both in an exact and in an approximate way, both in-memory and in a distributed fashion in MapReduce. The goal of this unified presentation is to ease the comparison between different measures of centrality, the different quality guarantees offered by approximation algorithms, and the different trade-offs and scalability behaviors characterizing distributed algorithms.

Outline and slides are available on the tutorial mini-website.

The views expressed above are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).  The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.