AI Seminar

AI Lab Idea Snippets

Alvaro Vega HidalgoUniversity of MichiganMahzad KhoshlessanUniversity of MichiganSiyang LiuUniversity of MichiganMartin Ziqiao MaUniversity of MichiganTrenton ChangUniversity of MichiganJie RuanUniversity of Michigan
WHERE:
3725 Beyster Building
SHARE:

Speakers (alphabetical order)

Trenton Chang (Advisor: Jenna Wiens)

Title

Measuring the Steerability of Large Language Models

Abstract

While LLMs have demonstrated impressive capabilities across many tasks, it remains unclear how precisely LLMs are steerable towards goals of LLM users. Understanding LLM steerability is key to LLM usage in safety-critical domains such as healthcare, where LLMs show potential for tasks such as clinical note generation/summarization, among others. We propose a quantitative framework for defining and evaluating steerability and highlight preliminary results on the “un-steerability” of current LLMs.

Alvaro Vega Hidalgo (Advisor: Rada Mihalcea)

Title

Euphonia App: Open Source AI Computational Bioacoustics for Community-Based Wildlife Conservation

Abstract

Using AI to analyze forest sounds enables new approaches to investigate behavioral, ecological and evolutionary processes in relation to animal communication, as well as tracing the trajectory of sustainability stakeholders. The Euphonia App is a data annotation platform that enables active learning in bioacoustics projects by enabling the collaboration between local experts and global efforts of biodiversity monitoring.

Mahzad Khoshlessan

Title

Towards Dynamics-Consistent Manifold Metrics in Pose Estimation: A Spherical Double Pendulum Case Study

Abstract

we introduce a novel framework for evaluating 3D pose estimation models using geometric manifolds derived from a simplified dynamic system, the 3D two-link pendulum, as an analog for human body kinematics. By assessing the deviation of predicted poses from this manifold, we aim to measure model performance accurately, revealing shortcomings in existing metrics that fail to capture physical plausibility. This approach provides critical insights into model inaccuracies, paving the way for developing more resilient and accurate pose estimation frameworks.

Siyang Liu (Advisor: Rada Mihalcea)

Title

The generation gap: exploring age bias in the underlying value systems of large language models

Abstract

The project explores the alignment of values in large language models with specific age groups, leveraging data from the World Value Survey across thirteen value categories.

Martin Ziqiao Ma (Advisor: Joyce Chai)

Title

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Abstract

In situated communication, ambiguities naturally arise from the chosen reference system, with varying valid interpretations of the same spatial expression depending on the selected frame of reference (FoR). While spatial language understanding and reasoning of vision-language models (VLMs) is receiving increasing attention, the potential ambiguities, along with the commonsense and consistency in spatial reasoning, remain largely under-explored. We present COnsistent Multilingual Frame Of Reference Test (COMFORT), an evaluation protocol designed to system- atically assess VLMs on spatial reasoning abilities. We demonstrate that VLMs show alignment with English conventions in spatial language understanding when resolving ambiguities. However, they (1) are still far from achieving robustness and consistency, (2) lack the flexibility to accommodate multiple coordinate systems, and (3) fail to adhere to cultural conventions in cross-lingual tests, as English tends to overshadow other languages. With a growing effort to align vision-language models with human cognition, we highlight the ambiguous nature of spatial language and call for increased attention to cross-cultural diversity in spatial reasoning.

Jie Ruan (Advisor: Lu Wang)

Title

Towards Reliable NLG Evaluation

Abstract

Evaluating Natural Language Generation (NLG) remains a challenging task. With the advent of LLMs, there’s been a growing interest in using LLMs as evaluators (LLM-as-a-judge). However, current LLM-as-a-judge methods tend to align more closely with non-expert evaluations and lack comprehensive multi-domain assessments. Our work focuses on evaluating evaluation methods on expert level across diverse domains.

Organizer

AI Lab

Faculty Host

Rada MihalceaJanice M. Jenkins Collegiate Professor of Computer Science and EngineeringUniversity of Michigan