Dissertation Defense

Reducing End-User Burden in Everyday Data Organization

Li Eric Qian

As digital data permeates every aspect of our daily life, end-users find it
appealing to organize their everyday data electronically. In fact, people are
already used to managing their personal data such as contact books and calendars
in electronic devices. Meanwhile, the desire for organizing more information
into the computer is expanding. Rather than record shopping lists
and recipes on notes stick to the refrigerator, a household would prefer storing
these information in their smartphones to bring to supermarkets. As online
structured data sources such as Freebase and BigTable flourish, end-users
would also like to leverage these sources to create their own data collections
such as favorite movie libraries and travel wishlists.
However, there is a major barrier to end-users' electronically organizing
their everyday data. The user has to first design a database according to
his original data, and then to continuously digest new data sources into the
database. This process involves various cognitive and operational burdens.
First of all, when designing her data collection, the user has the burden to abstract
her mental model of her real-life data into a reasonable database schema
design. Second, when incorporating external data sources, there is a burden
to understand the external data semantics as well as a burden to transform
the data from those sources into the user's own collection. Meanwhile, if the
user wants to filter the data, she has the burden to understand and specify the
selection condition. Finally, when existing sources are updated or additional
sources are added, there is a burden to understand these updates and fuse
them into her data collection.
This dissertation introduces various approaches to help the end-user reduce
these burdens in organizing their everyday data. To ease the birthing pain of
creating new databases, the dissertation proposes a system with direct manipulation
interface and user-friendly operators for the end-user to easily design
and evolve her data schema. To facilitate incorporation of external data
sources, a sample-driven schema mapping approach is introduced with a direct
manipulation interface. Using this approach, the user can restfully provide
sample instances in her collection and the system will automatically deduce the
desired schema mapping from the external sources to her own collection. In
a similar flavor, we propose an approach to facilitate the user derive selection
conditions in an example-driven scenario. Finally, to help the user fuse source
data updates into her own collection, the dissertation proposes a technique
to automatically update the user data collection according to external source
change, by conducting efficient incremental information integration.

Sponsored by

H V Jagadish