Dissertation Defense

Interactive Machine Learning with Applications in Health Informatics

Yue Wang

Recent years witness an unprecedented growth of health data, including millions of biomedical literature, electronic health records, and health forum posts. Information retrieval and machine learning techniques are powerful tools to unlock the potential knowledge in these data, yet they need to be guided by human experts. Unlike training machine learning models in other domains, labeling and analyzing health data requires highly specialized expertise, and the time of medical experts is extremely limited. How can we mine big health data with little expert effort? In this thesis, I develop state-of-the-art interactive machine learning algorithms that bring together human intelligence and machine intelligence in health data mining tasks. By making efficient use of human expert's domain knowledge, we can achieve high-quality solutions with minimal effort.

I will first introduce a high-recall retrieval algorithm that can help users efficiently harvest not just one but as many relevant documents as possible from a search index. This is a common need in medical search, legal search, and literature review. Then I propose two interactive learning algorithms that leverage human expert's domain knowledge to combat the curse of "cold start" in classical active learning, with applications in clinical natural language processing. Finally, I will show a general optimization framework that connects many existing interactive learning algorithms and inspires the design of new ones.

Sponsored by

Qiaozhu Mei