Scaling Up Object Recognition: What Have We Done? And Where Are We Going?
Add to Google Calendar
A cornerstone ability of an advanced intelligence system is to recognize hundreds of thousands of objects, the building blocks of our visual world. To achieve this goal, researchers need to work with millions and billions of images annotated with accurate labels for training and benchmarking different algorithms. Up till only a few years ago, limited by labor and money resource, computer vision scientists had been working with datasets consisted of only thousands of images annotated across a few dozen object classes. In 2008, our lab started a project called ImageNet, aiming to build the largest annotated image dataset in computer vision research. Our goal was to put together a dataset of tens of millions of images annotated with object classes found in the English dictionary (about twenty thousand of them!), an impossible mission using traditional ways of hiring subjects in university campuses. Instead we used a crowdsourcing technology to recruit tens of thousands of online workers to help us labeling more than half billion images using the Amazon Mechanical Turk platform. In the first part of this talk, we will give a brief account on our experience of this exciting and emerging technology and what we have done to build the largest image dataset in our research community (Deng et al. CVPR 2009, ECCV 2010). In the second part, we explore a new paradigm we call "human-machine collaboration" for developing large-scale and fine-grained object recognition algorithms. We claim that it is critical to inject human knowledge when tackling the problem of object recognition. We will show a recent work on using ontological knowledge for object recognition (Deng et al. CVPR 2012), as well as an unpublished work in using a cleverly designed crowd-game in a human-machine collaboration framework for fine-grained object recognition.
Prof. Fei-Fei Li is an associate professor in the Computer Science Dept. at Stanford University. Her main research interest is in vision, particularly high-level visual recognition. In computer vision, Fei-Fei's interests span from object and natural scene categorization to human activity categorizations in both videos and still images. In human vision, she has studied the interaction of attention and natural scene and object recognition, and decoding the human brain fMRI activities involved in natural scene categorization by using pattern recognition algorithms. Fei-Fei graduated from Princeton University in 1999 with a physics degree. She received PhD in electrical engineering from the California Institute of Technology in 2005. From 2005 to August 2009, Fei-Fei was an assistant professor in the Electrical and Computer Engineering Department at University of Illinois Urbana-Champaign and Computer Science Department at Princeton University, respectively. Fei-Fei is a recipient of a Microsoft Research New Faculty award, the Alfred Sloan Fellowship, a number of Google Research Award, an NSF CAREER award, IEEE CVPR 2010 Best Paper Honorable Mention, and winner of a number of international visual computing competitions (AAAI-SVRC 2007, PASCAL VOC 2011). (Fei-Fei publishes using the name L. Fei-Fei.)