Computer Vision Seminar
Image Synthesis for Self-Supervised Visual Representation Learning
Add to Google Calendar
In recent years, deep convolutional networks have proven to be extremely adept at discriminative labeling tasks. Not only do networks solve the direct task, they also learn an effective, general representation of the visual world. We explore the use of deep networks for image generation, or synthesis. Generation is challenging, as it is difficult to characterize the perceptual quality of an image, and often times there is more than one "correct" answer. However, we show that networks can indeed perform the graphics task of image generation, and in doing so, learn a representation of the visual world, even without the need for hand-curated labels.
We propose BicycleGAN, a general system for image-to-image translation problems, with the specific aim of capturing the multimodal nature of the output space. We study image colorization in greater detail and develop automatic and user-guided approaches. Moreover, colorization, as well as cross-channel prediction in general, is a simple but powerful pretext task for self-supervised representation learning. We demonstrate strong transfer to high-level semantic tasks, such as image classification, and to low-level human perceptual similarity judgments. For the latter, we collect a large-scale dataset of human perceptual similarity judgments and find that our method outperforms traditional metrics such as PSNR and SSIM. We also discover that many unsupervised and self-supervised representations transfer strongly, even comparable to fully-supervised methods.
Richard Zhang is a research scientist at Adobe Research, San Francisco. He recently obtained his PhD in EECS at UC Berkeley, advised by Professor Alexei A. Efros. His research interests are in computer vision, deep learning, machine learning, and graphics. He graduated summa cum laude with BS and MEng degrees from Cornell University in ECE in 2010. He is a recipient of the 2017 Adobe Research Fellowship.