The database group’s research is focused on building the data management infrastructure for the twenty-first century, with particular emphasis on issues surrounding Big Data, including stream processing, approximate query answering, text mining, data integration, information extraction, and data sharing. We have a strong emphasis on database usability. Our approach is to understand at a fundamental level what it is about the data model and representation that make it hard to use and query. In addition, we have a very strong data science effort, with particular emphasis on the effective integration and efficient querying of materials and biological data.
The growth of Web services, sensor networks, and high-capacity storage devices have led to an explosion in the quantity and diversity of data suitable for data mining. Statistical techniques are critical for data mining, but are certainly not the only important part; our work also includes novel applications, software infrastructure for large-scale analytics, privacy preservation while mining data, and new methods and interfaces that extend human capabilities to find patterns in graph and other data. Results so far include effective prediction of cardiac and epileptic events in medical patients, large-scale data extraction from the Web, large efficiency gains in Hadoop and Spark frameworks, and efficient exploration of large-scale real-world networks, including social, communication and brain networks.