Application Requirements

Piece picking applications are known to have strict time budgets where the throughput per hour typically measures the level of success. An hourly throughput of 1000 or more items picked per robot is considered viable, indicating a robot cycle time of between 3 to 4 seconds. While the robot places an item, the vision system simultaneously captures and processes the data to compute the next picking pose, assuming a stationary mounted camera. Typically, the camera time budget is between 350 to 700 ms, depending on the items handled. To reach typical requirements for piece picking, you must find the line where the data quality is good enough for the solution to be time efficient and successful.

The main objective for successfully picking an object is to compute an accurate picking pose for the robot. Both 2D and 3D data are useful for finding the picking pose. In 2D, object segmentation or recognition is typically done with, e.g., template matching or machine learning networks. For utilizing 2D data, it is especially important to minimize defocus, blooming, and saturation. In 3D, object segmentation, detection, or pose estimation is typically done with, e.g., CAD model matching, machine learning networks, or geometry matching. For utilizing 3D data, it is important to minimize the occurrence of false and missing points.

Today piece picking is applied to various use cases in the logistics industry. Therefore, the variety of products that needs to be handled is wide. Now, we will focus on some of these objects and scenes. The presented categories of scenes are handpicked because they each constitute individual types of challenges. We, therefore, list the object features that are especially important to preserve for each scene. Typical challenging scenes that appear in piece picking are:

Textured scenes



Scene: Closely stacked objects

Scene: Thin, overlapping objects

Features: Gaps

Features: Depth differences



Scene: Wrinkled, deformable objects, and very dark objects

Scene: Tiny objects

Features: Transitions

Features: Shape

For these scenes, it is most important to preserve 2D and 3D edges. Sharp edges enable one to clearly see, for instance, gaps between objects that are closely stacked and clear differences in depth for overlapping, thin objects. This means that it is easy to see where one object ends and where another begins. This is important for detection and segmentation algorithms both in 2D and 3D. 3D shapes are also important to maintain for being able to do for instance plane fitting for detection and pose estimation algorithms using 3D data. For 2D detection, textures should also be preserved, i.e., with minimal saturation, blooming, and halation.

Reflective scenes



Scene: Objects in poly bags

Scene: Highly reflective objects

Features: Surface coverage

Features: Shape and surface coverage



Scene: Objects in transparent packing

Scene: Objects in bubble wrap

Features: Surface coverage

Features: Surface coverage

For these scenes, it is important to preserve the 3D shape of the object. The surface coverage should also be as continuous as possible. This is in particular important for 3D detection algorithms. For 2D detection and object segmentation, the edges should be sharp and visible. Additionally, to ensure object detection in 2D, the textures should also be preserved, i.e., with minimal saturation, blooming, and halation.

Gripper compliance, motion planning, and collision avoidance

Additional elements to consider in piece picking are robot gripper compliance and motion planning with collision avoidance. Gripper compliance is often introduced to increase confidence in the success rate of a pick. This is because gripper compliance minimizes the chance of not reaching or crashing into objects and destroying them or the gripper. Dimension trueness, point precision, and planarity are some factors determining the level of compliance one needs in the gripper. Furthermore, motion planning is used to optimize the robot’s trajectories while picking, thus, saving cycle time. Motion planning is often paired with collision avoidance to avoid crashing into obstacles like bin walls, objects not currently being picked and other environmental restrictions. The obstacles seen by the vision system are then avoided by the robot. In an ideal world, the vision system would have an exact overlapping representation of the environment as it is. However, artifacts do arise. These artifacts comprise false or missing data that do not align with the real world. False data are, for instance, seen as ghost planes or floating blobs that do not exist in reality, whereas missing data are seen as holes in the point cloud. The latter is a result of incomplete surface coverage and comprise data that should have existed in the point cloud. Due to artifacts, collision avoidance may hinder the robot from reaching its destination. Hence, motion planning needs to define which obstacles are safe to disregard and which are not. With the increased quality of 2D and 3D data from the camera, the complexity of gripper compliance and motion planning with collision avoidance can be reduced.

This section has reviewed the requirements for piece picking. Now, the next step is to select the correct Zivid camera based on your scene volume.