Keyphrase Extraction in Citation Networks: How Do Citation Contexts Help?
Add to Google Calendar
Keyphrase extraction is defined as the problem of automatically extracting descriptive phrases or concepts
from documents. Keyphrases for a document act as a concise summary of the document and have been successfully
used in many applications such as query formulation, document clustering, classification, recommendation, indexing, and summarization. Previous approaches to keyphrase extraction generally used the textual content of a target document or a local neighborhood that consists of textually-similar documents. We posit that, in a scholarly domain, in addition to a document's textual content and textually-similar neighbors, other informative neighborhoods exist that have the potential to improve keyphrase extraction. In a scholarly domain, research papers are not isolated. Rather, they are highly inter-connected in giant citation networks, in which papers cite or are cited by other papers in appropriate citation contexts, i.e., short text segments surrounding a citation's mention. These contexts are not arbitrary, but they serve as brief summaries of a cited paper. We effectively exploit citation context information for keyphrase extraction and show remarkable improvements in performance over strong baselines in both supervised and unsupervised settings.
Cornelia Caragea is an Assistant Professor at the University of North Texas, where she directs the Machine Learning group. Her research interests lie at the intersection of machine learning, information retrieval, and natural language processing, with applications to scholarly digital libraries. She has published research papers in prestigious venues such as AAAI, IJCAI, WWW, and ICDM. Cornelia reviewed for many journals including Nature, ACM Transactions on Intelligent Systems and Technology, and IEEE Transactions on Knowledge and Data Engineering, served on several NSF panels, and was a program committee member for top conferences such as ACL, IJCAI, Coling, and CIKM. She also helped organize several workshops in conferences such as AAAI, IEEE BigData, and CIKM. Cornelia earned a Bachelor of Science degree in Computer Science and Mathematics from the University of Bucharest, and a Ph.D. in Computer Science from the Iowa State University. Prior to joining UNT in Fall 2012, she was a post-doctoral researcher at the Pennsylvania State University. Her appointment at UNT marks one of the University's unique approaches to faculty hires: she is part of the Knowledge Discovery from Digital Information research cluster.