Yuchi Wang (王宇驰)
Yuchi Wang (王宇驰)
Home
Publications
Work Experience
Light
Dark
Automatic
Preprint
[Arxiv] RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
We propose RICO, a novel framework that refines captions through visual reconstruction. Extensive experiments demonstrate that our approach significantly improves caption accuracy and completeness.
Yuchi Wang
,
Yishuo Cai
,
Shuhuai Ren
,
Sihan Yang
,
Linli Yao
,
Yuanxin Liu
,
Yuanxing Zhang
,
Pengfei Wan
,
Xu Sun
PDF
Cite
Code
[CVPR 2025] VidTwin: Video VAE with Decoupled Structure and Dynamics
We propose a novel and compact video autoencoder, VidTwin, that decouples video into two distinct latent spaces: Structure latent vectors, which capture overall content and global movement, and Dynamics latent vectors, which represent fine-grained details and rapid movements.
Yuchi Wang
,
Junliang Guo
,
Xinyi Xie
,
Tianyu He
,
Xu Sun
,
Jiang Bian
PDF
Cite
Code
Demo Page
[ACL 2025] Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints
we propose SENSE, a novel prompting approach that embeds semantic hints within the prompt. Experiments show that SENSE consistently improves LLMs’ performance across various tasks, highlighting the potential of integrating semantic information to improve LLM capabilities.
Kaikai An
,
Shuzheng Si
,
Helan Hu
,
Haozhe Zhao
,
Yuchi Wang
,
Qingyan Guo
,
Baobao Chang
Cite
[ArXiv] Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
We present MyTalk, aiming to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details.
Runyi Yu
,
Tianyu He
,
Ailing Zeng
,
Yuchi Wang
,
Junliang Guo
,
Xu Tan
,
Chang Liu
,
Jie Chen
,
Jiang Bian
PDF
Cite
Demo Page
[AAAI 2025] InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
We present InstructAvatar, a novel text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity and generalizability to the resulting video.
Yuchi Wang
,
Junliang Guo
,
Jianhong Bai
,
Runyi Yu
,
Tianyu He
,
Xu Tan
,
Xu Sun
,
Jiang Bian
PDF
Cite
Code
Demo Page
[ArXiv] UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
We present UniEdit, a tuning-free framework that supports both video motion and appearance editing by harnessing the power of a pre-trained text-to-video generator within an inversion-then-generation framework.
Jianhong Bai
,
Tianyu He
,
Yuchi Wang
,
Junliang Guo
,
Haoji Hu
,
Zuozhu Liu
,
Jiang Bian
PDF
Cite
Code
Demo Page
[Findings of ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs).
Liang Chen
,
Yichi Zhang
,
Shuhuai Ren
,
Haozhe Zhao
,
Zefan Cai
,
Yuchi Wang
,
Peiyi Wang
,
Xiangdi Meng
,
Tianyu Liu
,
Baobao Chang
PDF
Cite
Code
Dataset
LLMs as Trustworthy Financial Advisors: Rationalizing Multimodal Stock Movement Prediction with Chain-of-Thought
Yi Liu
,
Yuchi Wang
,
Lei Li
,
Shicheng Li
,
Ruihan Bao
,
Keiko Harimoto
,
Xu Sun
Cite
Cite
×