Dissertation Defense

Keywords at Work: Investigating Keyword Extraction in Social Media Applications

Shibamouli Lahiri
SHARE:

This dissertation examines a long-standing problem in Natural Language Processing (NLP) – keyword extraction – from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset – student statements of purpose. Furthermore, the system is used with new features on the problem of \emph{usage expression} extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns.

While each of the above problems is conceptually distinct, they share two key common elements – keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.

Sponsored by

Rada Mihalcea