Yuchi Wang (王宇驰)
Yuchi Wang (王宇驰)
Home
Publications
Work Experience
Light
Dark
Automatic
Conference Paper
[NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
We explore the previously untapped advantages of diffusion models over autoregressive (AR) methods in image-to-text generation. Through meticulous design of a latent-based diffusion model tailored for captioning, we achieve comparable performance with some strong AR baselines.
Yuchi Wang
,
Shuhuai Ren
,
Rundong Gao
,
Linli Yao
,
Qingyan Guo
,
Kaikai An
,
Jianhong Bai
,
Xu Sun
PDF
Cite
Code
[ICLR 2024] GAIA: Zero-shot Talking Avatar Generation
An inside project in Microsoft. We design a codec to disentangle each frame of talking face videos into motion and appearance representations and then curated a large-scale, high-quality dataset to train our diffusion-based GAIA model. The results demonstrate remarkable naturalness and scalability.
Tianyu He
,
Junliang Guo
,
Runyi Yu
,
Yuchi Wang
,
Jialiang Zhu
,
Kaikai An
,
Leyi Li
,
Xu Tan
,
Chunyu Wang
,
Han Hu
,
HsiangTao Wu
,
Sheng Zhao
,
Jiang Bian
PDF
Cite
Demo Page (Anonymous version)
[FMDM@NeurIPS 2023] Towards end-to-end embodied decision making via multi-modal large language model: Explorations with gpt4-vision and beyond
We found that powerful multimodal LLM like GPT4-Vision makes End-to-End embodied decision making more possible than ever. Moreover, we propose a new benchmark called PCA-EVAL and a multi-agent cooperation framework HOLMES for evaluation.
Liang Chen
,
Yichi Zhang
,
Shuhuai Ren
,
Haozhe Zhao
,
Zefan Cai
,
Yuchi Wang
,
Peiyi Wang
,
Tianyu Liu
,
Baobao Chang
PDF
Cite
Code
Dataset
Cite
×