Dissertation Defense

Leveraging Data Semantics for Relational Data Management Tasks

Junjie XingPh.D. Candidate
WHERE:
3901 Beyster BuildingMap
SHARE:

Hybrid Event: 3901 BBB / Zoom

Abstract: In an era of rapidly growing data, efficient and intelligent relational data management is essential for generating actionable insights and automating decision-making. A key factor driving advancements in this domain is the use of data semantics, which captures the deeper meaning and context of data, extending beyond traditional heuristic and syntactic approaches. By leveraging data semantics, we can enhance tasks such as insight generation, data integration, and other essential relational data management tasks.

This dissertation explores how advanced data semantics can address several key challenges in relational data management. First, we investigate methods to capture user-defined semantics for assessing the interestingness of data insights, moving beyond traditional developer-defined measures of interestingness. Second, we leverage the enhanced natural language understanding capabilities of large language models (LLMs) to generate fine-grained column semantics for relational data and introduce the concept of ”aggregate-related table search”, which captures table semantics across varying aggregation levels. Finally, we propose a self-training framework for LLM fine-tuning on table-related tasks, incorporating table task semantics by generating and validating training data to improve model performance in tasks such as NL2SQL and schema matching.

Through these contributions, this dissertation aims to advance relational data management by embedding a deeper understanding of different aspects of data semantics into core processes, ultimately improving both the performance and efficiency of data management tasks.

Organizer

CSE Graduate Programs Office

Faculty Host

Prof. H.V. Jagadish