Faculty Candidate Seminar
How Accurate are our Maps of the Internet? The Bias of Traceroute Sampling
Add to Google Calendar
Assoc. Prof. Cris Moore is from the University of New Mexico and the Santa Fe Institute
A great deal of effort has been spent measuring the topology of the Internet. However, it was recently argued by Lakhina et al. that sampling based on traceroutes from a small number of sources introduces a fundamental bias in the observed degree distribution. My Ph.D. student Aaron Clauset and I were able to prove this bias analytically: even if a graph is completely random, traceroute sampling gives the illusion of a power-law degree distribution. For graphs whose degree distributions really do have power-law tails P(k) ~ k^{-alpha}, traceroute sampling can significantly underestimate the value of the exponent alpha, unless we use a large number of sources. In joint work with Dimitris Achlioptas and David Kempe, we extended these results to more general graphs by answering the following mathematical question: if I have a random graph with a given degree distribution and I grow a breadth-first spanning tree on it, what is the likely degree distribution of that tree? I will discuss several analytic techniques we found useful, including differential equations for processes on random graphs.