Dissertation Defense

Steerable AI-powered Art-making Tools

John ChungPh.D. Candidate
WHERE:
3941 Beyster BuildingMap
SHARE:

Hybrid Event: ZoomPasscode:1BibR4

Abstract: Artificial Intelligence (AI) and Machine Learning (ML) algorithms are pushing the boundaries of art-making. They broaden the kinds of artifact artist can create as well as the mechanisms for creation. AI algorithms also can make the art creation process easy and accessible. For example, prompt-based image generation models allow a novel way of generating images with prompts. Moreover, with prompting, even non-experts in visual image creation can experience producing images. However, for users to steer the behaviors of AI models to versatile artistic visions, AI/ML models often require users to have a technical understanding of these models. Not all art-making users would have such knowledge. In this dissertation, I expand the use of AI models in human art-making by connecting novel and simple steering interactions to these models. In the first part of the dissertation, I study users and existing creativity support tools to identify how we should design steering interactions for AI-powered tools. From these studies, I found that 1) steerability needs to support iterative uses, 2) AI models can adapt their functions by modeling user languages, preferences, styles, and values from steering interactions, and 3) steering interactions can alleviate complex specifications with intuitive expressions.

Based on the first two studies, I introduce three AI-powered art-making creativity support tools. I designed the first tool, Artinter, to support human-human communication around art commissions. This tool shows how modeling users’ subjective and ambiguous languages can help them search and generate reference artifacts. TaleBrush is a human-AI story co-creation system powered by generative language models. It allows users to intuitively and iteratively steer generative language models with visual sketching of the protagonist’s fortune. Finally, PromptPaint is a text-to-image generation model that expands steering interactions beyond prompting. It adopts visual interactions that resemble how we use paint mediums (e.g., oil painting) to facilitate intuitive and iterative steering. Through these tools, my research demonstrates how AI-powered art-making tools can be more usable and useful with iterative and intuitive steering and adaptive learning from past steering interactions.

Organizer

CSE Graduate Programs Office

Faculty Host

Prof. Eytan Adar