Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Zhang, Haoyu; Li, Zhipeng; Guo, Yiwen; Yu, Tianshu

Ex-Omni: Enabling 3D Facial Animation Generation
for Omni-modal Large Language Models

Haoyu Zhang¹, Zhipeng Li², Yiwen Guo^3,†, Tianshu Yu^1,†

¹ The Chinese University of Hong Kong, Shenzhen
² LIGHTSPEED
³ Independent Researcher
^† Corresponding authors.

Paper Code (coming soon) 🤗 Dataset (coming soon)

Overall pipeline.

Abstract

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet incorporating speech with 3D facial animation remains largely unexplored despite its importance for natural interaction. A key challenge arises from the representation mismatch between discrete, token-level semantic reasoning in LLMs and the dense, fine-grained temporal dynamics required for 3D facial motion, which makes direct modeling difficult to optimize under limited data. We propose Expressive Omni (Ex-Omni), an open-source omni-modal framework that augments OLLMs with speech-accompanied 3D facial animation. Ex-Omni reduces learning difficulty by decoupling semantic reasoning from temporal generation, leveraging speech units as temporal scaffolding and a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection. We further introduce InstructEx, a dataset aims to facilitate augment OLLMs with speech-accompanied 3D facial animation. Extensive experiments demonstrate that Ex-Omni performs competitively against existing open-source OLLMs while enabling stable aligned speech and facial animation generation.

Interactive Demo

Note: The demo is deployed on an NVIDIA H20 GPU. The video is rendered using PyTorch3D based on the blendshape coefficients generated by Ex-Omni. The rendering latency is due to the video synthesis process and does not reflect the model's inference time.

Rendered Demo

BibTeX

@article{zhang2026exomni,
  title={Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models},
  author={Zhang, Haoyu and Li, Zhipeng and Guo, Yiwen and Yu, Tianshu},
  journal={arXiv preprint arXiv:2602.07106},
  year={2026},
  url={https://arxiv.org/abs/2602.07106}
}