Theory Seminar

Max-stable Random Sketches: Estimation of Distances, Norms and Dominance Norms

Stillian Stoev
SHARE:

Consider a non-negative signal f(i), defined over a very large domain 1 <= i <= N. In many applications involving streaming data (e.g. Internet traffic monitoring, on-line transactions, telephone call monitoring, large data bases, etc.), one can rapidly accumulate large amounts of data. It is not feasible to store and/or search such large sets of data, and yet one needs at least some approximate answers to queries on the data. Random sketches have become an important tool in obtaining efficiently (in small time and space) approximate answers to queries on large data sets.

We introduce a novel type of non-linear random sketches called max-stable sketches. That is, as the signal f(i) is observed, we keep only

          
Ej(f) = max{ f(i)Zj(i), 1<= i <=N },
for j=1,…,K, where Zj(i) are KN independent random variables with standard alpha-Frechet distribution and where K << N. We show by using the max-stability property of the Frechet distribution that one can estimate the L-alpha-norm of the signal f by using max-stable sketches. This can be done for an arbitrary alpha > 0. We also show that certain distances can be approximated efficiently. As a by product, we show that one can recover large values of the signal exactly, with high probability, which may be of independent interest.

Max-stable sketches, it turns out, are well suited to dealing with
dominance norms. We show how a result of Cormode and Muthukrishnan on the approximation of the norm of the maxima of several large data sets may be improved.

Stilian Stoev graduated from Boston University in 2005. He is currently an Assistant Professor in the Department of Statistics at the University of Michigan. His interests are in probability theory and statistics, focusing on long-range dependence, stable infinite variance processes, self-similar processes and their applications.

Sponsored by

Stillian Stoev