Systems Seminar - CSE

Building and Interacting with Domain-Specific Knowledge Bases

Yunyao LiMaster Inventor, Research Manager, and Research Staff MemberIBM Research - Almaden

The ability to build and interact with large-scale domain-specific knowledge bases is the foundation for many cognitive systems. We are using an ontology-driven approach for the creation and consumption of domain-specific knowledge bases. In such knowledge bases, domain knowledge is captured by (1) the logical schema, constraints and domain vocabulary of the application, (2) the models and algorithms to populate instances of that schema, and (3) the data necessary to build and maintain those models and algorithms. The creation of such knowledge bases involves well known building blocks: natural language processing, entity resolution, data transformation, etc. It is critical that the models and algorithms that implement these building blocks be transparent and optimizable for efficient execution. In this talk, I will given an overview of our platform for creating and interacting with large-scale domain-specific knowledge bases. I will describe the design of domain-specific languages (DSL) with specialized constructs that serve as target languages for learning models and algorithms to populate knowledge bases, and the generation of training data for scaling up the learning. I will also briefly present our domain-specific natural language understanding and interaction technology to query knowledge bases. If time permits, I will also demonstrate an instantiation of the platform in the financial domain, where we construct a knowledge base for financial domain with company fundamentals such as financial metrics and key personnel, from public regulatory data (SEC, FDIC).
Yunyao Li is a Master Inventor, Research Manager, and Research Staff Member with IBM Research – Almaden, where she manages the Scalable Natural Language Processing group.. She is also a member of IBM Academy of Technology. Her expertise is in the interdisciplinary areas of databases, natural language processing, human-computer interaction, and information retrieval. She is particularly interested in designing, developing, and analyzing large scale systems usable by a wide spectrum of users. Her current focus is on creating and interacting with large-scale domain-specific knowledge bases. She is a founding member of SystemT, a state-of-the-art information extraction engine currently powering 8+ IBM products, and numerous research projects and customer engagements. She is also a founding member of Gumshoe, a novel enterprise search engine that has been powering IBM intranet and search since 2010. Her contribution in these projects has resulted in 25+ research publications and recognized by multiple prestigious IBM internal awards.

She received her PhD degree in Computer Science and Engineering in 2007 and dual master's degrees in Computer Science & Engineering and Information in 2003, all from the University of Michigan, Ann Arbor. She went to college in Tsinghua University, Beijing, China and graduated in 2000 with dual undergraduate degrees in Automation and Economics.

Sponsored by


Faculty Host

Professor H.V. Jagadish