AI Seminar

How should a robot perceive the world?

Ashutosh SaxenaAssistant ProfessorCornell University

In order for a robot to perform tasks in the human environments, it first needs to figure out "what" to perceive. While for some robotic tasks (such as an object finding robot) this is relatively straightforward (e.g., infer the object labels from RGB-D data), many other robotic tasks require a robot to be more creative about what to perceive. For example, for a robot to arrange a disorganized room, it would need to perceive the human preferences about the usage of objects as well as the low-level manipulation strategies. In this talk, I will illustrate the issues surrounding "what to perceive" through a few examples.

The key to figuring out "how" to perceive lies in being able to model the underlying "structure" in the problem. I propose that for reasoning about the human environments, it is the humans that are the true underlying structure in the problem. This is not only true for tasks that involve humans explicitly (such as human activity detection), but also true for tasks in which a human was never observed! In this talk, I will present learning algorithms that model such underlying structure in the problem.

Finally, I will present several robotic applications to answer "why" a robot should perceive the human environment. These applications range from single-image based aerial vehicle navigation to personal robots performing tasks of unloading items from a dishwasher, loading a fridge, arranging a disorganized room, and performing assistive tasks in response to human activities.
Ashutosh Saxena is an assistant professor in computer science department at Cornell University. His research interests include machine learning, robotics perception and computer vision. He received his MS in 2006 and Ph.D. in 2009 from Stanford University, and his B.Tech. in 2004 from Indian Institute of Technology (IIT) Kanpur. He was a recipient of National Talent Scholar award in India and Google Faculty award in 2011. He was also named a Alfred P. Sloan Research Fellow in 2011 and a Microsoft Faculty Fellow in 2012.

Ashutosh has developed Make3D (, an algorithm that converts a single photograph into a 3D model. Tens of thousands of users used this technology to convert their pictures to 3D. He has also developed algorithms that enable robots (such as STAIR, POLAR, see to perform household chores such as unload items from a dishwasher, place items in a fridge, etc. His work has received substantial amount of attention in popular press, including the front-page of New York Times, BBC, ABC, New Scientist, Discovery Science, and Wired Magazine. He has won best paper awards in 3DRR and IEEE ACE, and was named a co-chair of IEEE technical committee on robot learning.

Sponsored by