Relationships between entities, such as friendships among people and similarities between objects, can naturally be represented as a graph (or network), where the objects are the nodes (or vertices) of the graph, and edges connect the nodes that are related. Social networks, both online and offline, are a great example of graphs, as are the Web—with its interlinked webpages, and the Internet –as a collection of machines and routers.
Identifying the “important” nodes or edges in a graph is a fundamental task in network analysis, with many diverse applications in fields such as economics, biology, security, and sociology. Several measures of importance, known as centrality indices, have been proposed over the years, formalizing the concept of importance in different ways.
Measuring centrality
Centrality measures rely on graph properties to quantify importance. For example, “betweenness centrality,” one of the most commonly used centrality indices, counts the fraction of shortest paths going through a node, while the “closeness centrality” of a node is the average sum of the inverse of the distance to other nodes. Even PageRank, Google’s original algorithm for ranking web pages, is a centrality measure.
With the proliferation of huge networks with millions of nodes and billions of edges, the importance of having scalable algorithms for computing centrality indices has become more and more important. A number of contributions have been recently proposed, ranging from heuristics that perform extremely well in practice, to approximation algorithms offering strong probabilistic guarantees, to scalable algorithms for the MapReduce platform.
Two Sigma Labs’ Matteo Riondato, together with coauthors Francesco Bonchi (ISI Foundation, Turin, Italy) and Gianmarco De Francisci Morales (Qatar Computing Research Institute, Doha, Qatar), presented the tutorial Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms at the 25th World Wide Web Conference, the main scientific event related to the Web, which took place in Montreal, Quebec, in April.
This tutorial presents, in a unified framework, some of the many measures of centrality. It also discusses the algorithms to compute them, both in an exact and in an approximate way, both in-memory and in a distributed fashion in MapReduce. The goal of this unified presentation is to ease the comparison between different measures of centrality, the different quality guarantees offered by approximation algorithms, and the different trade-offs and scalability behaviors characterizing distributed algorithms.
Outline and slides are available on the tutorial mini-website.