Publications

(2024). [ArXiv] Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints. Arxiv.

Cite

(2024). [ArXiv] Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement. Arxiv.

PDF Cite Demo Page

(2024). [ArXiv] InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation. Arxiv.

PDF Cite Code Demo Page

(2024). [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?. NAACL 2024.

PDF Cite Code

(2024). [ArXiv] UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. Arxiv.

PDF Cite Code Demo Page

(2024). [Findings of ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain. Arxiv.

PDF Cite Code Dataset

(2023). [ICLR 2024] GAIA: Zero-shot Talking Avatar Generation. ICLR 2024.

PDF Cite Demo Page (Anonymous version)

(2023). [FMDM@NeurIPS 2023] Towards end-to-end embodied decision making via multi-modal large language model: Explorations with gpt4-vision and beyond. FMDM@NeurIPS 2023.

PDF Cite Code Dataset

(2023). LLMs as Trustworthy Financial Advisors: Rationalizing Multimodal Stock Movement Prediction with Chain-of-Thought. Arxiv.

Cite