DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

Anonymous Authors
Anonymous Institution
Under Review

Highlights

  • DexFlyWheel is the first framework to continuously generate diverse and high-quality dexterous manipulation data across different objects, environments, and spatial settings through a self-improving data flywheel.
  • Starting from a single human demonstration per task, DexFlyWheel scales up to a 500.0× increase in new trajectories and a 214.7× increase in new scenarios, demonstrating scalable data generation while significantly reducing human effort.
  • DexFlyWheel exhibits strong generalizability on unseen scenarios in simulation, and successfully transfer in real dual-arm robot system.

Motivation

Motivation 1

Existing methods are inherently constrained by their source human demonstrations, suffering from limited diversity when handling different objects. This motivates us to rethink the role of human demonstrations in data generation pipelines. We observe that manipulating different objects only induces minor changes in the manipulation trajectories. This implies that Human demonstrations provide a strong behavioral prior in data generation. Imitation learning (IL) uses it to imitate human-like trajectories. Residual reinforcement learning (res-RL) further refines these priors to handle diverse objects scenarios.

Motivation 2

To achieve generalizable dexterous manipulation in the open world, robots need a mechanism for continuous data collection and policy performance improvement. This inspires us to build a data flywheel that enables continuous self-improvement, where better policies generate more diverse data, and enhanced data leads to more generalizable policies.

Data Generation Pipeline

From One Demonstration to Diverse Scenarios

Method

Overview of DexFlyWheel. A scalable data generation framework that starts with seed demonstrations warmup and iteratively expands data diversity through a self-improving cycle. Each iteration integrates imitation learning, residual reinforcement learning, rollout trajectory collection, and data augmentation into a closed-loop pipeline.

Experimental Setups

  • To validate the effectiveness of our framework across diverse manipulation scenarios, we designed four challenging tasks encompassing both single-arm and dual-arm dexterous manipulation: grasp, pour, lift, and handover. These tasks cover a wide range of manipulation complexities and coordination demands.
  • For each task, we collected only one human demonstration through a VR-based teleoperation system.
  • We conducted three iterations of data generation within our framework for each task.

Simulation Results

Task Demonstrations

Grasp Task

Pour Task

Lift Task

Handover Task

Iterative Data Generation Process

Through three iterations of self-improvement, DexFlyWheel achieves:

  • Scaling data diversity: 214.7× total scenarios with 20× more object variations
  • Improvement in policy generalization: from 16.5% to 81.9% success rate on unseen scenarios configurations
  • Enhanced policy performance by residual RL: from 48.5% to 82.6% success rate through residual RL refinement

Iteration Results

Performance Scaling with Diverse Datasets Generated by DexFlyWheel

With DexFlyWheel, we achieve simultaneous scaling of both data diversity and policy performance. This indicates the effectiveness of DexFlyWheel in generating diverse datasets, which contribute to enhanced policy generalization.

Performance Results

Comparison with Baselines

Baseline Comparison Table

DexFlyWheel consistently achieves higher success rates than replay-based methods.Compared to the Human Demo baseline, which uses 20 teleoperated trajectories per task, DexFlyWheel achieves vastly superior performance—81.9% vs. 13.4% average success—while requiring only a single human demonstration per task, significantly reducing the human effort.

Compared to DexMimicGen, our method demonstrates significant advantages in handling diverse objects and executing dynamic tasks, showcasing superior robustness across various manipulation scenarios.

Data Generation Efficiency of DexFlyWheel

DexFlyWheel significantly improves data generation efficiency by reducing the time and effort required for collecting successful demonstrations. Compared to prior methods like human teleoperation and replay-based pipelines, DexFlyWheel achieves faster per-trajectory generation and higher success rates across tasks, enabling robust and cost-effective dataset creation.

Data Generation Efficiency

Real Robot Results

Based on the data generated by DexFlyWheel, we successfully deployed it on a dual-arm dexterous robot system.

Contributions

  • We propose DexFlyWheel, a scalable and generalizable data generation paradigm for dexterous manipulation. DexFlyWheel autonomously enables the continuous generation of diverse and high-quality data through a self-improving data flywheel mechanism, with minimal human effort.
  • We empirically demonstrate the data flywheel effect in dexterous manipulation across four challenging tasks. Starting from a single human demonstration per task, DexFlyWheel leads to a 500.0x increase in new trajectories and a 214.7× increase in new scenario configurations, significantly outperforms replay-only pipelines in diverse data scalability.
  • We validate the effectiveness of our generated data in training generalizable policies, achieving an average 81.9\% success rate on challenging test sets and significantly outperforming policies trained on data from data generation baselines. Furthermore, we transfer our policies in real-world dual-arm robot system via digital twin, with 78.3% success rate in lift and 63.3% in handover tasks.