Systems Seminar - CSE
Rethinking the Database for the Data Science Era
Add to Google Calendar
The relational DBMS was developed to address the core needs of the enterprise: it provides a rigid schema to ensure data consistency, provides query optimization for flexibility, and supports high-throughput transactions or bigger-than-memory analytics. Over time the database community has pushed into many new frontiers, extending out to Web data and beyond, but most people think of a DBMS as enabling roughly this same set of core capabilities.
In the era of data science, the Web, and the cloud — where machine learning techniques, data in files, and robust MapReduce-style distributed compute platforms are all the rage — the question is whether the "common core" data needs have fundamentally changed. I will describe our (still evolving) answer to this question, which is that the "new DBMS" should be focused on iteratively improving and integrating unstructured data to make it amenable to analysis, on making the data more useful through annotation and provenance, and on enabling interaction among users, algorithms, and the broader community. I will describe our experiences in trying to foster community-scale data science (for the neuroscience domain) with these techniques.
Zachary Ives is a Professor of Computer and Information Science at the University of Pennsylvania, as well as the Associate Dean for Master's and Professional Programs at Penn's School of Engineering and Applied Science. He is also a co-founder of Blackfynn, Inc., a startup focused on enabling life sciences research and discovery through data integration. His research interests include data integration and sharing, managing "big data," and data provenance. He is a recipient of the NSF CAREER award, and an alumnus of the DARPA Information Science and Technology advisory panel. He is a co-author of the textbook Principles of Data Integration, has received an ICDE 2013 ten-year Most Influential Paper award, and has been an Associate Editor for Proceedings of the VLDB Endowment and a Program Co-Chair for the ACM SIGMOD conference.