Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexterous manipulation is quite challenging. The complex, high-dimensional action-space involved with multi-finger control often leads to poor sample efficiency of learning-based methods. In this work, we propose `Dexterous Imitation Made Easy' (DIME) a new imitation learning framework for dexterous manipulation. DIME only requires a single RGB camera to observe a human operator and teleoperate our robotic hand. Once demonstrations are collected, DIME employs standard imitation learning methods to train dexterous manipulation policies. On both simulation and real robot benchmarks we demonstrate that DIME can be used to solve complex, in-hand manipulation tasks such as `flipping', `spinning', and `rotating' objects with the Allegro hand. Our framework along with pre-collected demonstrations will be made publicly available on this webpage.
DIME consists of two phases: demonstration collection, which is performed in real-time with visual feedback, and demonstration-based policy learning, which can learn to solve dexterous tasks from a limited number of demonstrations.
Unlike previous work, which uses one or multiple depth cameras, we collect demonstrations in real-time from a single RGB camera. We use MediaPipe's hand detector and re-target the human fingertip positions into desired fingertip positions for the AllegroHand. The low-level controller uses inverse kinematics and a PD control to reach the desired locations in 3D space.
We find a simple nearest neighbors-based immitation approach (INN) is sufficient to solve our tasks. Here we visualize a learned policy for each of the 3 tasks we consider.