Yeda Song
Pronounced 'yeh-dah' (like 'Ye' in 'Yes')

Hi, I'm a first-year Ph.D. student in Computer Science and Engineering at the University of Michigan. I am fortunate to be advised by Prof. Honglak Lee and to work alongside wonderful groupmates Jacob Sansom
Muhammad Khalifa
Tiange Luo
Violet Fu
Yiwei Lyu
Anthony Liu (Graduated 🎉)
Yunseok Jang (Graduated 🎉)
. Before joining UMich, I earned my M.S. in Artificial Intelligence from Seoul National University, where I had the privilege of being advised by Prof. Gunhee Kim in the Vision & Learning Laboratory.

My research interests lie in multimodal agents and reinforcement learning, with the goal of building real-world agents. Specifically, I am interested in leveraging and developing vision-language models to establish an effective cycle of perception, reasoning, decision-making, and grounding. Currently, I am working on scalable methods for constructing datasets from videos to support the development of generalizable computer-using agents (CUA).

(Last updated on: Feb 10, 2025)

Email  /  CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo
Publications
Mobile OS Task Procedure Extraction from YouTube
Yunseok Jang*, Yeda Song*, Sungryull Sohn, Lajanugen Logeswaran, Tiange Luo, Honglak Lee
NeurIPS Workshop, 2024
CODE / arXiv

We introduce MOTIFY, a method for predicting scene transitions and actions from mobile operating system (OS) task videos. It extracts task sequences from YouTube videos without manual annotation, outperforming baselines on Android and iOS tasks and enabling scalable mobile agent development.

Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
Yeda Song*, Dongwook Lee*, Gunhee Kim
ICLR, 2024
CODE / arXiv

We propose COCOA, a novel approach for addressing distributional shifts in offline RL (batch RL). COCOA encourages conservatism within the compositional input space of both the policy and Q-function, independently of the commonly employed behavioral conservatism.

MPChat: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn, Yeda Song, Sangdoo Yun, Gunhee Kim
ACL, 2023
CODE / arXiv

We construct a multimodal persona-grounded dialogue dataset, MPChat, accompanied with entailment labels. The multimodal persona consists of image-text pairs that represent one's episodic memories. We show the role of visual modality is crucial in MPChat through three benchmark tasks.

Education
University of Michigan
Ph.D. student in Computer Science & Engineering
Aug. 2024 - Current
Seoul National University
M.S. in Artificial Intelligence
Mar. 2022 - Feb. 2024
Seoul National University
B.S. in Statistics
B.S. in Artificial Intelligence
Mar. 2017 - Feb. 2022
Hong Kong University of Science and Technology
Exchange Student
Fall 2019
Seoul Science High School Mar. 2014 - Feb. 2017
Work Experiences
Anomaly Analysis Lab, Alchera Inc.
Machine Learning Researcher
Jun. 2021 - Aug. 2021
Multiscale Methods in Statistics Lab, Seoul National University
Research Intern
Mar. 2021 - Jun. 2021
Bioinformatics and Biostatistics Lab, Seoul National University
Research Intern
Jan. 2020 - Jun. 2020
Honors and Awards
AI Fellowship Mar. 2022 - Feb. 2024
Presidential Science Scholarship Mar. 2017 - Feb. 2021
Hanseong Nobel Scholarship (Sector: Mathematics) Mar. 2015 - Feb. 2017



This website is built with Jon Barron's template.