William Gould Dow Distinguished Lecture

Why do large AI models display new and complex skills?

Sanjeev AroraCharles C. Fitzmorris Professor of Computer SciencePrinceton University

WHERE:

1670 Beyster BuildingMap

WHEN:

Friday, December 1, 2023 @ 3:30 pm - 4:30 pm
This event is free and open to the publicAdd to Google Calendar

The Department of Electrical Engineering and Computer Science, CSE Division, is pleased to announce the 20th William Gould Dow Distinguished Lecture.

Why do large AI models display new and complex skills?

Abstract: At the heart of today’s AI advances is an interesting phenomenon: as language models (LLMs) are scaled up (at great expense!), they display new skills and capabilities. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of training is felt to be difficult. The talk describes a new conceptual framework (joint work with Anirudh Goyal) that formalizes “complex skills” and proves that emergence of such complex skills is implied by the famous (and empirically discovered) scaling laws for LLMs (Kaplan et al., 2020, Hoffman et al., 2022). A notable prediction of our theory is that scaling up language models can allow them to acquire the capability to combine up to k skills while solving tasks, despite never having seen any example in the training corpus that involved this skill combination. This prediction was recently verified experimentally via a new SKILL-MIX evaluation (Yu et al., 2023) for GPT-4. We also discuss the relevance of such evaluations to current discussions of AI capabilities and AI safety.

Biography: Sanjeev Arora is Charles C. Fitzmorris Professor of Computer Science at Princeton University and Director, Princeton Language and Intelligence. He has received the ACM Doctoral Dissertation Award (1995), Packard Fellowship (1997), Simons Investigator Award (2012), Gödel Prize (2001 and 2010), Fulkerson Prize (2012), and the ACM Prize in Computing (2011). He is a Member of NAS and a Fellow of the ACM and the AAAS.

Events

William Gould Dow Distinguished Lecture

Why do large AI models display new and complex skills?

Why do large AI models display new and complex skills?