Computer Vision Seminar
Towards Joint Understanding of Images and Language
Add to Google Calendar
Numerous real-world tasks can benefit from practical systems that can identify objects in scenes based on language and understand language grounded in visual context. This presentation will focus on my group's recent work on developing systems for jointly modeling images and language. I will talk about neural models for learning cross-modal embeddings for text-to-image and image-to-text search, and about the challenging task of grounding or localizing of textual mentions of entities in an image. Finally, I will discuss applications of our models to automatic image description and visual question answering.
Svetlana Lazebnik is an Associate Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests include visual object recognition, scene understanding, and machine learning for big visual data. She received her Ph.D. in 2006 at U of I and was an Assistant Professor at UNC Chapel Hill from 2007 to 2012 before returning to U of I. She is the recipient of an NSF CAREER award, a Microsoft Research Faculty Fellowship, and an Alfred P. Sloan Research Fellowship. In 2016, she received the Longuet-Higgins Prize for a CVPR 2006 paper with significant impact on computer vision research. She serves as an Associate Editor for the International Journal of Computer Vision and the IEEE Transactions on Pattern Analysis and Machine Intelligence, and has served as a Program Chair for ECCV 2012 and Workshops Chair for CVPR 2016.