2D + 3D Capture Strategy

Note, if you do not care about color information, jump straight to the next section, selecting 3D and 2D settings based on capture speed.

Many detection algorithms commonly used in piece-picking applications rely on 2D data to identify which object to pick. In this article, we provide insights into different ways to acquire 2D information, their pros and cons, and external lighting conditions. We also touch upon various 2D-3D approaches, their data quality, and how they affect cycle times.

There are two approaches to get 2D data:

Separate 2D capture via camera.capture(Zivid::Settings2D).imageRGBA(), see 2D Image Capture Process.
Part of 3D capture camera.capture(Zivid::Settings).pointCloud.copyImageRGBA(), see Point Cloud Capture Process.

Which one to use depends on your requirements and the machine vision pipeline. We advocate for a dedicated 2D capture as it provides better control over the 2D settings for color optimization and can leverage multi-threading and optimized scheduling. It also grants you increased flexibility in configuring desired camera resolution and projector settings. Utilizing 2D data from the 3D capture is simpler, but you may have to compromise speed to get desired 2D quality.

Tip

By taking a separate 2D capture, you can disable color in your 3D capture by setting Sampling::Color to disabled. This will reduce the capture time for the 3D acquisition.

Our recommendation:

Separate 2D capture with full resolution and projector on.
Subsampled 3D capture with color disabled.

Camera resolution and 1-to-1 mapping

For accurate 2D segmentation and detection, it is beneficial with a high-resolution color image. Zivid 2+ has a 5 MPx imaging sensor, while Zivid 2 and One+ have 2.3 MPx sensors. The following table shows the resolution outputs of the different cameras for both 2D and 3D captures.

2D capture resolutions
2D capture	Zivid One+	Zivid 2	Zivid 2+
Full Resolution	1920 x 1200	1944 x 1200	2448 x 2048

3D capture resolutions
3D capture	Zivid One+	Zivid 2	Zivid 2+
Full resolution [1]	1920 x 1200	1944 x 1200	2448 x 2048
2x2 subsampled [1]	Not available	972 x 600	1224 x 1024

Observe that 2D captures will output full-resolution images while 3D captures may be subsampled depending on pixel sampling. This means that we no longer have a 1-to-1 correlation between a 2D pixel and a 3D point. Consequently, it is more challenging to extract the 3D data from a segmented mask in the 2D image. To restore the correlation, we can either subsample or downsample the 2D image,

Full Resolution

Subsampled

Downsampled

Quarter resolution subsampled, zoomed again

Quarter resolution downsampled, zoomed again

or recompute the mapping by extracting RGB values from the pixels that correspond to the Blue or Red pixels from they Bayer grid. The code below shows how to do this:

C++

std::cout << "Pixels to sample: " << pixelsToSample << std::endl;
const int subsampleDivider =
    (pixelsToSample.value() == Zivid::Settings::Sampling::Pixel::ValueType::all) ? 1 : 2;
int offset = (pixelsToSample.value() == Zivid::Settings::Sampling::Pixel::ValueType::blueSubsample2x2) ? 0 : 1;
cv::Mat
    mappedBGR(fullResolutionBGR.rows / subsampleDivider, fullResolutionBGR.cols / subsampleDivider, CV_8UC3);
std::cout << "Mapped width: " << mappedBGR.cols << ", height: " << mappedBGR.rows << std::endl;
for(size_t row = 0; row < static_cast<size_t>(fullResolutionBGR.rows - offset); row += subsampleDivider)
{
    for(size_t col = 0; col < static_cast<size_t>(fullResolutionBGR.cols - offset); col += subsampleDivider)
    {
        mappedBGR.at<cv::Vec3b>(row / subsampleDivider, col / subsampleDivider) =
            fullResolutionBGR.at<cv::Vec3b>(row + offset, col + offset);
    }
}
return mappedBGR;

Python

if pixels_to_sample == zivid.Settings.Sampling.Pixel.blueSubsample2x2:
    return rgba[::2, ::2, 0:3]
if pixels_to_sample == zivid.Settings.Sampling.Pixel.redSubsample2x2:
    return rgba[1::2, 1::2, 0:3]
if pixels_to_sample == zivid.Settings.Sampling.Pixel.all:
    return rgba[:, :, 0:3]
raise RuntimeError(f"Invalid pixels to sample: {pixels_to_sample}")

For more insight into resolution, sampling and mapping, check out Monochrome Capture.

Note

If you use intrinsics and 2D and 3D capture have different resolutions, ensure you use them correctly. See Camera Intrinsics for more information.

Our recommendation:

2D capture with full resolution
3D monochrome capture with subsampled resolution

External light considerations

The ideal light source for a 2D capture is strong, because it reduces the influence of ambient light, and diffuse, because this limits the blooming effects. This light source can either come from the internal projector or from an external light source. A third option is not to use any light at all.

Regardless of your chosen option, you may encounter blooming. When utilizing the internal projector as light source, tilting the camera, changing the background, or tuning the 2D acquisition settings can mitigate the blooming effect. On the other hand, if using external light, ensuring the light is diffuse or angling it may help. It’s important to note that external light introduces noise in the 3D data, and you should deactivate them during the 3D capture. Consequently, the use of external lights adds complexity to your cell setup and the scheduling of your machine vision pipeline

Exposure variations caused by changes in ambient light, such as transitions from day to night, doors opening and closing, or changes in ceiling lighting, affects 2D and 3D data differently. For 2D data, they can impact segmentation performance, especially when it is trained on specific datasets. For 3D data, exposure variations may affect point cloud completeness due to varying noise levels. Using either an internal projector or external diffuse light helps reduce these variations.

The below table summarizes the pros and cons of the different options with respect to 2D quality.

	Internal projector	External light [2]	Ambient light
Robot Cell setup	Simple	Complex	Simple
Resilience to ambient light variations	Acceptable	Good	Bad
Blooming in 2D images	Likely	Unlikely	Likely
2D color balance needed	No	Likely	yes

Our recommendation:

Separate 2D capture with internal projector on

Capture strategies

Optimizing for 3D quality does not necessarily give you satisfactory 2D quality. Therefore, if you depend on color information, we recommend having a separate 2D capture. We can break it down to which data you need first. This gives us the three following strategies:

2D data before 3D data
2D data as part of 3D data
2D data after 3D data

Which strategy you should go for depends on your machine vision algorithms and pipeline. Below we summarize the performance of the different strategies. For a more in-depth understanding and comprehensive ZividBenchmarks, please see 2D + 3D Capture Strategy

The following tables list the different 2D+3D capture configurations. It shows how they are expected to perform relative to each other with respect to speed and quality. We separate into two scenarios:

Cycle time is so fast that each capture cycle needs to happen right after the other.
Cycle time is slow enough to allow an additional dummy capture between each capture cycle (only relevant for Zivid One+). An additional capture can take up to 800ms in the worst case. A rule of thumb is that for cycle time greater than 2 seconds a dummy capture saves time.

Back-to-back captures

Capture Cycle (no wait between cycles)	Speed		2D-Quality
Capture Cycle (no wait between cycles)	Zivid One+	Zivid 2	2D-Quality
3D ➞ 2D [4]	Slowest	Fast	Best
2D ➞ 3D [3]	Slow	Faster	Best
3D (w/2D [6])	Fast	Fast	Better
3D	Fastest	Fastest	Good

For back-to-back captures, it is not possible to avoid switching delay, unless the projector brightness is the same. However, in this case, it is better to set Color Mode to UseFirstAcquisition, see Color Mode.

Captures with low duty cycle

Capture Cycle (time to wait for next cycle)	Speed		2D-Quality
Capture Cycle (time to wait for next cycle)	Zivid One+	Zivid 2	2D-Quality
3D ➞ 2D [4] ➞ 3D ([5])	Slow	Fast	Best
2D ➞ 3D [3] ➞ 2D ([5])	Good	Faster	Best
3D (w/2D [6]) ➞ 3D (w/2D [6])	Fast	Fast	Best
3D ➞ 3D	Fastest	Fastest	Good

Following is a table showing actual measurements on different hardware. For the 3D capture we use the Fast Consumer Goods settings.

Zivid 2+: (Z2+ M130 Fast)
Zivid 2: (Z2 M70 Fast)
Zivid One+: (Z1+ M Fast)

Zivid 2+

Expected median (±stddev) in ms
2D+3D Capture			Intel UHD i5G1	NVIDIA 4070	Intel UHD 770
[7]	[8]		Low-end [9]	High-end [10]
Capture 2D and then 3D
✓	✓	2D	74 (±7) ms	73 (±1) ms	75 (±0.3) ms
✓	✓	3D	1951 (±198) ms	605 (±2) ms	1301 (±2) ms
✓		2D	81 (±23) ms	81 (±0.4) ms	81 (±0.4) ms
✓		3D	1980 (±9) ms	634 (±5) ms	1334 (±3) ms
	✓	2D	74 (±0.4) ms	73 (±1) ms	74 (±0.3) ms
	✓	3D	1968 (±151) ms	605 (±2) ms	1302 (±2) ms
		2D	43 (±20) ms	43 (±0.4) ms	43 (±0.5) ms
		3D	1966 (±8) ms	606 (±3) ms	1307 (±3) ms
Capture 3D and then 2D
✓	✓	2D	74 (±0.4) ms	74 (±1) ms	75 (±0.3) ms
✓	✓	3D	1944 (±257) ms	605 (±2) ms	1303 (±2) ms
✓		2D	85 (±16) ms	83 (±0.3) ms	86 (±0.4) ms
✓		3D	1817 (±540) ms	593 (±2) ms	1251 (±2) ms
	✓	2D	73 (±7) ms	73 (±1) ms	74 (±0.3) ms
	✓	3D	1963 (±192) ms	606 (±5) ms	1303 (±2) ms
		2D	74 (±76) ms	71 (±0.3) ms	74 (±0.3) ms
		3D	1780 (±500) ms	554 (±2) ms	1212 (±2) ms
Capture 3D including 2D
✓	✓	2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓	✓	3D	3997 (±269) ms	1741 (±3) ms	2516 (±2) ms
✓		2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓		3D	3948 (±269) ms	1741 (±8) ms	2516 (±8) ms

Zivid 2

Expected median (±stddev) in ms
2D+3D Capture			Intel UHD 750	Intel UHD i3G1	NVIDIA 3070
[11]	[12]		High-end [13]	Low-end [14]	High-end [15]
Capture 2D and then 3D
✓	✓	2D	52 (±1) ms	52 (±0.4) ms	50 (±1) ms
✓	✓	3D	814 (±7) ms	1259 (±8) ms	363 (±5) ms
✓		2D	25 (±0.5) ms	25 (±0.3) ms	25 (±0.5) ms
✓		3D	813 (±7) ms	1261 (±8) ms	361 (±3) ms
	✓	2D	50 (±1) ms	51 (±0.5) ms	49 (±1) ms
	✓	3D	815 (±7) ms	1257 (±6) ms	363 (±5) ms
		2D	13 (±0.2) ms	13 (±0.2) ms	13 (±0.2) ms
		3D	813 (±6) ms	1256 (±8) ms	361 (±4) ms
Capture 3D and then 2D
✓	✓	2D	51 (±1) ms	52 (±0.4) ms	50 (±1) ms
✓	✓	3D	800 (±4) ms	1242 (±8) ms	363 (±5) ms
✓		2D	47 (±0.3) ms	48 (±0.5) ms	46 (±0.3) ms
✓		3D	779 (±2) ms	1228 (±9) ms	353 (±3) ms
	✓	2D	50 (±1) ms	51 (±0.4) ms	49 (±1) ms
	✓	3D	799 (±4) ms	1242 (±6) ms	362 (±5) ms
		2D	40 (±0.3) ms	43 (±0.3) ms	40 (±0.3) ms
		3D	739 (±3) ms	1189 (±72) ms	312 (±4) ms
Capture 3D including 2D
✓	✓	2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓	✓	3D	817 (±3) ms	1345 (±9) ms	395 (±2) ms
✓		2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓		3D	822 (±7) ms	1374 (±331) ms	397 (±5) ms

Zivid One+

Expected median (±stddev) in ms
2D+3D Capture			Intel UHD 750	Intel UHD i3G1	NVIDIA 3070
[16]	[17]		High-end [18]	Low-end [19]	High-end [20]
Capture 2D and then 3D
✓	✓	2D	419 (±0.7) ms	421 (±0.6) ms	418 (±0.6) ms
✓	✓	3D	1381 (±7) ms	1829 (±9) ms	880 (±5) ms
✓		2D	24 (±0.5) ms	24 (±0.3) ms	24 (±0.5) ms
✓		3D	1380 (±7) ms	1829 (±6) ms	880 (±5) ms
	✓	2D	73 (±0.7) ms	75 (±0.6) ms	73 (±0.5) ms
	✓	3D	856 (±8) ms	1291 (±4) ms	339 (±2) ms
		2D	24 (±0.5) ms	24 (±0.4) ms	24 (±0.3) ms
		3D	844 (±8) ms	1286 (±4) ms	337 (±1) ms
Capture 3D and then 2D
✓	✓	2D	419 (±0.6) ms	421 (±0.6) ms	418 (±0.6) ms
✓	✓	3D	1380 (±10) ms	1822 (±9) ms	878 (±5) ms
✓		2D	419 (±0.6) ms	421 (±0.7) ms	418 (±0.5) ms
✓		3D	876 (±7) ms	1271 (±4) ms	368 (±2) ms
	✓	2D	73 (±0.5) ms	75 (±0.6) ms	73 (±0.5) ms
	✓	3D	849 (±7) ms	1282 (±5) ms	338 (±2) ms
		2D	73 (±0.5) ms	75 (±0.6) ms	73 (±0.5) ms
		3D	876 (±6) ms	1269 (±4) ms	368 (±2) ms
Capture 3D including 2D
✓	✓	2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓	✓	3D	1378 (±47) ms	1797 (±44) ms	924 (±42) ms
✓		2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓		3D	1389 (±48) ms	1824 (±44) ms	922 (±42) ms

Tip

To test different 2D-3D strategies on your PC, you can run ZividBenchmark.cpp sample with settings loaded from YML files. Go to Samples, and select C++ for instructions.

In the following section, we guide you on selecting 3D and 2D settings based on capture speed.