Bringing Machine Learning Based Perception to MoveIt Pro
During my four-month internship at PickNik Robotics, I had the incredible opportunity to delve into the cutting-edge world of machine learning (ML) and its profound impact on perception in robotics. As a PhD candidate with a specialization in deformable object manipulation, I was excited to contribute to integrating machine learning for perception into MoveIt Pro, an exciting venture already shaping the future of robotics. My active involvement in MoveIt and other open-source projects at PickNik made this internship an ideal fit, allowing me to explore the potential of ML-based perception pipelines in autonomous robot manipulation.
Machine learning is quickly becoming the de-facto tool of choice for perception in robotics. Machine learning is being used to identify objects in images, predict their motion, identify defects, and even generate motion plans or controls to be directly executed by the robot. I was already contributing to MoveIt and other open source projects at PickNik, so it was a good fit for an internship.
MoveIt Pro already has a number of Objectives and Behaviors for manipulating objects. Some of these require the user to specify which object to manipulate by clicking on the object. In contrast, my focus was on using machine learning to detect and estimate the pose of objects in the scene so that the Objectives can execute fully autonomously. These capabilities are useful in applications where a user or operator is not always present, or can serve as an initial guess for operators to confirm or modify the autonomously generated plan.
We published a tutorial that describes how the system works, and serves as an example to users of MoveIt Pro who have their own ML based perception pipelines they want to integrate. In brief, this involved wrapping my Python based perception pipeline in a ROS 2 service, which could then be called from a custom Behavior in MoveIt Pro.
As a proof-of-concept we have so far developed fully autonomous Objectives for opening cabinet and lever-handle doors, pushing buttons, and grasping cubes. These tasks were chosen because of our work with NASA on space station robotics, through a NASA SBIR grant. Below is an example of a Universal Robots UR5e detecting and then pushing a handicap door access switch.
So how does it actually work? First we used a Mask R-CNN model trained on a small hand-labeled dataset of images to identify the objects in the color image. Next, we use the masks predicted by the neural network to segment the objects in the point cloud. From there, we can compute 3D properties like position, orientation, and dimensions (often referred to as affordance templates) for various object categories. This information is then used by MoveIt to plan the motion.
The intersection of perception and manipulation in robotics presents formidable challenges. During my internship at PickNik Robotics, I discovered that combining instance segmentation with point-cloud processing methods proved to be a straightforward yet remarkably effective solution, particularly in semi-structured environments. Moreover, this approach facilitated seamless integration with existing motion planning pipelines, making it an indispensable tool for advancing the capabilities of autonomous robots.
Overall, my internship at PickNik Robotics has been an invaluable experience, giving me a chance to explore the vast potential of machine learning in robotics and its ability to revolutionize the field of perception. I am grateful for the opportunity to contribute to MoveIt Pro and be a part of the team shaping the future of robotics through innovative technologies. Moving forward, I look forward to leveraging the knowledge and skills gained during this internship to further drive advancements in the field of robotic perception and manipulation.