Theory Seminar

Population Recovery in polynomial time

Mike SaksProfessorRutgers

The population recovery problem is an appealing idealized problem of learning in the presence of noise that was proposed in a 2012 paper
of Dvir,Rao, Wigderson and Yehudayoff (DRWY). In this problem
we have an unknown distribution D on binary strings of length n and our goal is to estimate the probability D(s) of a particular string s with some small additive error. We observe samples taken from the distribution, but the catch is that each sample is randomly corrupted. What this means is that for each sample, there is a process that randomly selects each coordinate independently with probability 1-p, for some p in (0,1). In the lossy version of the problem each selected bit is replaced by " ?" and in the noisy version, each selected coordinate is replaced by a random bit.

DRWY asked whether for each fixed p>0, there is an algorithm that estimates D(s) (in either the lossy or noisy version) in time polynomial in n and 1/b (where b is the allowed additive error), possibly under some reasonable assumptions on the distribution.

For the lossy version,
this was shown in a paper by Ankur Moitra and myself, under no assumptions on the distribution, and for the (harder) noisy version, this was shown in a recent joint work of mine with Anindya De and Sijian Tang, under the assumption that there is a known lower bound on the probability assigned to any string in the support of the distribution.

The solution of the noisy version builds on the lossy version, and previous work of DRWY, of Wigderson and Yehudayoff, and of Lovett and Zhang, and involves a number of
techniques: linear programming duality, complex analysis, and discrete fourier analysis. In this talk I'll survey this work and give some hints of the proof.

Sponsored by