Interactive Systems Seminar

Learning Syntax and Semantics for Machine Translation

Professor David ChiangResearch Assistant ProfessorUSC Department of Computer Science

Machine translation, or automatic translation of human languages, is one of the oldest problems in computer science, dating back to the 1950s. Broadly, two approaches to the problem have been taken: one which relies on knowledge of linguistic structure and meaning, and the other which relies on statistics from large amounts of data. For years, these two approaches seemed at odds with each other, but recent developments have made great progress towards building translation systems according to the maxim, “linguistics tells us what to count, and statistics tells us how to count it” (Joshi).

I will give an overview of three such developments from our research group. The first is the introduction of formal grammars (namely, synchronous context-free grammars) to model the syntax of human languages, first successfully demonstrated by my system Hiero. The second is ongoing work to incorporate knowledge of formal semantics. I will describe the formalism we are currently working with (synchronous hyperedge replacement grammars) and the efficient algorithms we have developed for processing semantic structures. Finally, I will discuss initial results on learning word meanings using neural networks, and prospects for learning them across languages.
David Chiang is Research Assistant Professor in the USC Department of Computer Science and Project Leader at the USC Information Sciences Institute. He earned his PhD from the University of Pennsylvania in 2004. His research is on computational models for learning human languages, particularly how to translate from one language to another. His work on applying formal grammars and machine learning to translation has been recognized with two best paper awards (at ACL 2005 and NAACL HLT 2009) and has transformed the field of machine translation. He has received research grants from DARPA, NSF, and Google, has served on the executive board of NAACL and the editorial board of Computational Linguistics, and is currently on the editorial board of Transactions of the ACL.

Sponsored by