Dissertation Defense
Streaming Sketching: Mathematical Theory and Practical Algorithms
This event is free and open to the publicAdd to Google Calendar

Hybrid Event: 3725 BBB / Zoom
Abstract: Streaming sketches are data structures that can approximately answer queries of a data stream, whose size is much smaller than the stream length or the domain size. This thesis studies a wide range of topics on streaming sketching including approximate counting, cardinality estimation, tractability of queries, universality of sketches, sequential sketches, as well as sampling. The investigation leads to progress in both theoretical understanding of streaming sketches as well as the development of practical algorithms.
For the theoretical direction I bring new tools to help understanding of streaming computation with suitable simplifying assumptions. I connect streaming sketches with the theory of Lévy processes. The Lévy-Khintchine representation theorem shades new light on the tractability of queries. It also implies the way that fresh randomness and indexed randomness should be mixed for obtaining streaming $G$-samplers with precisely correct sampling probabilities. I also construct a new universal sketch whose error is determined by the harmonic structure of a query. Furthermore, I study cardinality sketches through two information measures—Shannon’s entropy and Fisher information. A natural goodness metric of such sketches can be defined based on these two measures.