CSE Seminar

Coding Theory in the Big Data Era

Mahdi CheraghchiAssistant ProfessorImperial College London

The pioneering work of Shannon in 1948 introduced information and coding theory as a mathematical tool for studying communications systems. However, in recent decades, the theory has evolved into an indispensable tool in theoretical computer science and for studying virtually all aspects of information processing. In this talk, I will highlight recent progress in both combinatorial and algorithmic aspects.

The deletion channel is a classical model of point-to-point communication in which a random subsequence of a transmitted sequence is delivered to the receiver. This model has been a subject of study in information theory since the 1960s, originally for the design of communications systems in the presence of synchronization errors. However, over the past decade, the problem has garnered major interest within the theoretical computer science community due to its rich combinatorial structure (in particular, its connection with the Edit Distance) and significance in emerging DNA storage applications. On the other hand, determining the information capacity of the deletion channel is generally regarded as one of the most challenging open problems in information theory. Until recently, the known nontrivial bounds on the capacity for positive deletion probabilities relied on extensive computer search that reveals little about the mathematical structure of the problem.

I will give an overview of the recent successful and fully analytic attempts (starting [Che., STOC'18, J. ACM]) towards an eventual characterization of the deletion capacity. Rather unexpectedly, this line of work suggests that understanding the deletion channel may require a better understanding of basic models for fiber optics communication, also among the most challenging open problems in information theory since the 1960s. In fact, the exact same techniques have been used to significantly advance the state of the art on both fronts. This showcases an interplay between techniques from different areas.

I will continue the discussion of how the role of coding theory has evolved in the big data era by giving an example in algorithm design. I will present a recent algorithm for deterministically estimating the Fourier transform of high-dimensional data up to exponentially faster than the classical FFT method [Che. and Indyk, SODA'16, T. ALG]. This demonstrates how error-correcting codes crucially work "under the hood" as a main technical tool for the design of algorithms for big data.

This talk is aimed at a general audience; no technical background in information and coding theory will be assumed.
Mahdi Cheraghchi is a Lecturer (US equivalent: Assistant Professor) of Computing at Imperial College London, UK. Before joining Imperial in 2015, he held post-doctoral appointments at UC Berkeley (the Simons Institute for the Theory of Computing), MIT (with Piotr Indyk), CMU (with Venkatesan Guruswami), the University of Texas at Austin (with David Zuckerman), as well as a visiting engineer position at Qualcomm (with Michael Luby). He obtained his Ph.D. in 2010 from EPFL (with Amin Shokrollahi), where he received the Patrick Denantes Memorial Prize for outstanding doctoral thesis in the School of Computer and Communication Sciences. Mahdi is broadly interested in all theoretical aspects of computer science, especially the role of information and coding theory in cryptography, complexity, algorithms, and high-dimensional geometry. He is a senior member of the ACM and IEEE.

Sponsored by