A robot operating in a household environment will see a wide range of different objects. In this paper, we present a method to generalize object manipulation skills, acquired from a limited number of demonstrations, to novel objects of new categories of shapes. Our approach, Local Neural Descriptor Fields (L-NDF), utilizes neural descriptors defined on the local geometry of the object to effectively transfer manipulation demonstrations to novel objects at test time, leveraging the shared local geometry of novel objects. We illustrate the efficacy of our approach in manipulating novel objects in novel poses in both simulation as well as in the real world.
We present a method for performing tasks involving spatial relations between novel object instances initialized in arbitrary poses directly from point cloud observations. We overcome the key technical challenge of determining task-relevant local coordinate frames from a few demonstrations by developing an optimization method based on Neural Descriptor Fields (NDFs) and a single annotated 3D keypoint. An energy based learning scheme to model the joint configuration of the objects that satisfies a desired relational task further improves performance. The method is tested on three multi-object rearrangement tasks in simulation and on a real robot.
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results. Utilizing the shape completion capability and SE(3)-equivariance of Neural Descriptor Fields (NDFs), we represent objects with compact shape codes encoding full object shapes from partial observations. By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises. We conduct experiments on both synthetic and real-world sequences and achieve improved change detection results compared to multiple baseline methods.
Neural Descriptor Fields (NDFs) condition on object 3D point clouds, and map continuous 3D coordinates to spatial descriptors. NDFs have the key properties of encoding category-level correspondence across shapes and being equivariant to rigid 3D transformations. They can represent both points and oriented local coordinate frames in the vicinity of the point cloud, and allow recovering corresponding points/frames across shapes via nearest-neighbor search in descriptor space (performed via continuous energy optimization). We show NDFs facilitate effecient few-shot learning from demonstration for pick-and-place manipulation tasks.