Natural Language Processing Seminar
Toward Learning by Reading using a New Semantics that Merges Propositional and Distributional Information
Computer systems that educate themselves by reading text has been a longstanding dream of AI. Despite progress in NLP on Information Extraction and Text Mining, no NLP systems to date try to represent the entirety of a single document in depth. One of the main obstacles is the inadequacy of semantic representations of semantic content"”the actual meaning of the symbols used in semantic propositions. The traditional extensional and intensional models of semantics are difficult to actually flesh out in practice, and no large-scale models exist. Recent developments in so- called Distributional Semantics, based either on word co-occurrence statistics or on neural encodings thereof, offer some exciting new possibilities that are very actively being explored. However, these approaches are not true semantics either, because they lack certain requirements. In this talk I outline one way to combine traditional symbolic logic-based proposition-style semantics (of the kind used in older NLP) with Distributional Semantics. Our core resource is the PropStore, a single lexico- semantic "lexicon' that can be used for a variety of tasks. I describe how to define and build such a lexicon and how to use its contents for various NLP tasks. I describe experiments on composing its contents to form larger representation units. A serious problem is data sparsity "”the PropStore is only about 2% full, despite containing most of Wikipedia and much of Gigaword"” and I describe our current efforts to condense its representations into latent dimensions. Using the PropStore as a kind of background knowledge model one can address learning by reading in a new way.
Eduard Hovy is a professor at the Language Technology Institute in the School of Computer Science at Carnegie Mellon University. He holds adjunct professorships at universities in the US, China, and Canada, and is co-Director of Research for the DHS Center for Command, Control, and Interoperability Data Analytics, a distributed cooperation of 17 universities. Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987, and was awarded honorary doctorates from the National Distance Education University (UNED) in Madrid in 2013 and the University of Antwerp in 2015. He is one of the initial 17 Fellows of the Association for Computational Linguistics (ACL). From 1989 to 2012 he directed the Human Language Technology Group at the Information Sciences Institute of the University of Southern California. Dr. Hovy's research addresses several areas in Natural Language Processing, including machine reading of text, question answering, information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, and machine translation. His contributions include the co-development of the ROUGE text summarization evaluation method, the BLANC coreference evaluation method, the Omega ontology, the Webclopedia QA Typology, the FEMTI machine translation evaluation classification, the DAP text harvesting method, the OntoNotes corpus, and a model of Structured Distributional Semantics. Dr. Hovy is the author or co-editor of six books and over 350 technical articles and is a popular invited speaker. In 2001 Dr. Hovy served as President of the ACL, in 2001"“03 as President of the International Association of Machine Translation (IAMT), and in 2010"“11 as President of the Digital Government Society. Dr. Hovy regularly co-teaches courses and serves on Advisory Boards for institutes and funding organizations in Germany, Italy, Netherlands, and the USA.