• Home
  • Download
  • License
FusionFusion
  • Home
  • Download
  • License
  • Sign In
    Logout

FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion

Conference on Computer Vision & Pattern Recognition, Findings Track (CVPR), 2026

Enes Duran1,2, Nikos Athanasiou1, Muhammed Kocabas1,3, Michael J. Black1,3, Omid Taheri1

1Max Planck Institute for Intelligent Systems
2Eberhard Karls University of Tübingen
3Meshcapade GmbH

Paper Code Poster 

Abstract

Hands are central to interacting with our surroundings and conveying gestures, making their inclusion essential for full-body motion synthesis. Despite this, existing human motion synthesis methods fall short: some ignore hand motions entirely, while others generate full-body motions only for narrowly scoped tasks under highly constrained settings. A key obstacle is the lack of large-scale datasets that jointly capture diverse full-body motion with detailed hand articulation. While some datasets capture both, they are limited in scale and diversity. Conversely, large-scale datasets typically focus either on body motion without hands or on hand motions without the body. To overcome this, we curate and unify existing hand motion datasets with large-scale body motion data to generate full-body sequences that capture both hand and body. We then propose the first diffusion-based unconditional full-body motion prior, FUSION, which jointly models body and hand motion. Despite using a pose-based motion representation, FUSION surpasses SOTA skeletal control models on the Keypoint Tracking task in the HumanML3D dataset and achieves superior motion naturalness. Beyond standard benchmarks, we demonstrate that FUSION can go beyond typical uses of motion priors through two applications: (1) generating detailed full-body motion including fingers during interaction given the motion of an object, and (2) generating Self-Interaction motions using an LLM to transform natural language cues into actionable motion constraints. For these applications, we develop an optimization pipeline that refines the latent space of our diffusion model to generate task-specific motions. Experiments on these tasks highlight precise control over hand motion while maintaining plausible full-body coordination. 

    Video

Qualitative Results

Keypoint Tracking 

001170_pred

087503_pred

Self-Interaction

000054_pred000053_pred

000055_pred000082_pred 

Human-Object Interaction

000788_pred_merged000806_pred_merged 

Citation

@InProceedings{Duran_2026_WACV, 
author = {Duran, Enes and Kocabas, Muhammed and Choutas, Vasileios and Fan, Zicong and Black, Michael J.},
title = {FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion},
booktitle = {Proceedings of the IEEE/CVF of Computer Vision and Pattern Recognition (CVPR)},
month = {xx},
year = {2026},
pages = {xx}
}
© 2026 Max-Planck-Gesellschaft -Imprint-Privacy Policy-License
RegisterSign In
© 2026 Max-Planck-Gesellschaft