Faculty Candidate Seminar

Large-Scale Visual Recognition Powered by Big Data

Jia DengPost DocStanford

Having machines recognize everything in our visual world is one of the
grand challenges of computer vision. This entails building a system
capable of distinguishing tens of thousands, if not millions, of
fine-grained visual classes across a wide range of domains (for
example, distinguishing different breeds of terriers or different
Toyota models). The past several decades of computer vision research
has mostly focused on recognizing a few dozen basic-level object
categories (such as distinguishing dogs from cars). However, the
problem of visual recognition is necessarily large-scale and
algorithms must tackle an entirely new level of scale and complexity
in both visual and semantic space. The key challenges include
harvesting data, incorporating domain knowledge that enables
fine-grained distinction, and handling the large, richly structured
output space.

In this talk I will present my research that takes a big data approach
to scaling up recognition. I will start with an overview of the
ImageNet project, which harvests big visual data — tens of millions
of images for tens of thousands of visual classes — through
large-scale crowdsourcing. Next, I will demonstrate how to recognize
fine-grained, sub-ordinate categories via a human-machine
collaboration framework that couples a new image representation with a
computer game that collects novel forms of data from the crowd. Third,
I will explore ways to tackle the large label space, one with tens of
thousands of visual categories organized in a large taxonomy. Here I
will present a provably optimal approach to optimizing the trade-offs
between accuracy and specificity, which leads to a reliable
recognition engine operating on 10K+ object categories. Finally, I
will discuss future directions that hold promise for unleashing the
full power of big data toward large-scale, real-world computer vision.
Jia Deng is a postdoctoral scholar in the Vision Lab at Stanford
University. He received his PhD in Computer Science from Princeton
University in 2012. His research centers around harvesting,
understanding, and harnessing big visual data. He has built datasets
and tools used by more than 1,000 researchers worldwide and his work
has appeared in popular press such as the New York Times. He has been
the lead student organizer of the ImageNet Large Scale Visual
Recognition Challenges since 2010. He is also the lead organizer of
the first BigVision workshop at NIPS 2012.

Sponsored by