Theory Seminar
Efficient Density Evaluation for Smooth Kernels
Given a kernel function k and a dataset P (of points from R^d), the kernel density function of P at a point q from R^d is equal to KDF_P(q) := 1/|P| sum_{p in P} k(p,q). Kernel density evaluation has numerous applications, in scientific computing, statistics, computer vision, machine learning and other fields. In all of them it is necessary to evaluate KDF_P(q) quickly, often for many inputs q and large point-sets P.
In this paper we present a collection of algorithms for efficient KDF evaluation under the assumptions that the kernel k is "smooth" , i.e., the value changes at most polynomially with the distance. This assumption is satisfied by several well-studied kernels, including the (generalized) t-student kernel and rational quadratic kernel. For smooth kernels, we give a data structure that, after O(dn log(Phi n)/eps^2) preprocessing, estimates KDF_P(q) up to a factor of 1+eps in O(d log (Phi n)/eps^2) time, where Phi is the aspect ratio. The log(Phi n) term can be further replaced by log n under an additional decay condition on k, which is satisfied by the aforementioned examples.
We further extend the results in two ways. First, we use low-distortion embeddings to extend the results to kernels defined for spaces other than l_2. The key feature of this reduction is that the distortion of the embedding affects only the running time of the algorithm, not the accuracy of the estimation. As a result, we obtain (1+eps)-approximate estimation algorithms for kernels over other l_p norms, Earth-Mover Distance, and other metric spaces. Second, for smooth kernels that are decreasing with distance, we present a general reduction from density estimation to approximate near neighbor in the underlying space. This allows us to construct algorithms for general doubling metrics, as well as alternative algorithms for l_p norms and other spaces.
Joint work with Moses Charikar, Piotr Indyk and Paris Siminelakis.