Distinguished Lecture | Women in Computing | Alumni
Detection structure and patterns in big biomedical data
This event is free and open to the publicAdd to Google Calendar
Smita Krishnaswamy is the 2019 College of Engineering Alumni Merit Award Winner for CSE.
Bio: Smita Krishnaswamy (PhD CSE 2008) is an Assistant Professor in the Department of Genetics at the Yale School of Medicine and Department of Computer Science. She is also affiliated with the Yale Center for Biomedical Data Science, Yale Cancer Center, and Program in Applied Mathematics.
While a student at Michigan, Smita’s research focused on algorithms for automated synthesis and probabilistic verification of nanoscale logic circuits. Her dissertation, “Design, Analysis and Test of Logic Circuits Under Uncertainty,” won the 2009 Outstanding Dissertation Award in the area of “New directions in circuit and system test” from the European Design and Automation Association (EDAA).
After Michigan, Smita spent two years at IBM’s TJ Watson Research Center as a researcher in the systems division where she worked on automated bug finding and error correction in logic. It was at this time that she realized her work could translate to the domain of genetics research.
Talk abstract: High-throughput, high-dimensional data has become ubiquitous in the biomedical and health sciences as a result of breakthroughs in measurement technologies like single cell RNA-sequencing, as well as vast improvements in health record data collection and storage. While these large datasets containing millions of cellular or patient observations hold great potential for understanding generative state space of the data, as well as drivers of differentiation, disease and progression, they also pose new challenges in terms of noise, missing data, measurement artifacts, and the so-called “curse of dimensionality.” In this talk, I will cover a unifying theme in my research which has helped to generally tackle these problems: manifold learning and the associated manifold assumption. The manifold assumption in the data analysis context refers to the idea that while the ambient measurement space is high dimensional and noisy, that the intrinsic state space lies in lower dimensional smoothly varying patches that are locally Euclidean, called manifolds. In my work, I learn the data manifold using two types of techniques: graph signal processing and deep learning. Manifold learning provides a powerful structure for algorithmic approaches to denoise the data, visualize the data and understand progressions, clusters and other regulatory patterns, as well as correct for batch effects to unify data. I will cover several applications of this principle via specific projects including: 1) MAGIC: a manifold denoising algorithm that low-pass filters data features (like audio and video signals are denoised) on a data graph, for denoising and recovery of cellular data. 2) PHATE: a general visualization and dimensionality reduction technique technique that offers an alternative to tSNE in that it preserves local and global structures, clusters as well as progressions using an information-theoretic distance between diffusion probabilities. 3) MELD: an analysis technique for comparing two or more experiments measuring the same underlying system (i.e., cells from the same type of tissue) that produces a continuously varying likelihood score throughout the manifold to indicate whether each position in the state space is enriched in one of the specific conditions. This technique is useful for pulling out subtle differences in response between different drug treatments or experimental conditions in large datasets. 4) SAUCIE (Sparse AutoEncoders for Clustering Imputation and Embedding), our highly scalable neural network architecture that simultaneously performs denoising, batch normalization, clustering and visualization via custom regularizations on different hidden layers. Finally, I will preview ongoing work in neural network architectures for predicting dynamics and other biological tasks.
Smita’s current research focuses on developing unsupervised machine learning methods to denoise, impute, visualize and extract structure, patterns, and relationships from big, high-throughput, high-dimensional biomedical data. Her methods have been applied to a variety of datasets from many systems including embryoid body differentiation, zebrafish development, the epithelial-to-mesenchymal transition in breast cancer, lung cancer immunotherapy, infectious disease data, gut microbiome data, and patient data.