2D + 3D Capture Strategy

If you require both 2D and 3D data for your application, then this tutorial is for you.

We explain and emphasize the pros and cons of different 2D-3D capture approaches, clarify some limitations, and explain how they affect cycle times. We touch upon the difference between external light for 2D and using the internal projector.

2D data: RGB image
3D data: Point Cloud

There are two different ways to get 2D data:

Independently via camera.capture(Zivid::Settings2D).imageRGBA(), see 2D Image Capture Process.
As part of 3D capture camera.capture(Zivid::Settings).pointCloud.copyImageRGBA(), see Point Cloud Capture Process.

Which one to use, however, depends on your requirements.

Different scenarios will lead to different tradeoffs. We break it down by which data you require first. Then we will discuss tradeoffs of speed versus quality for the different scenarios.

I need 2D data before 3D data
I need 2D data as part of how I use 3D data
I only need 2D data after I have used the 3D data

About external light

Before we go into the different strategies we have to discuss external light. The ideal light source for a 2D capture is strong and diffuse because this limits the blooming effects. With the internal projector as the light source, the blooming effects are almost inevitable. Mounting the camera at an angle significantly reduces this effect, but still an external diffuse light source is better. External light introduces noise in the 3D data, so one should ideally turn the external light off during 3D capture.

In addition to the reduction in blooming effects, strong external light can smooth out variations in exposure due to variations in ambient light. Typical sources for variations in ambient light:

changes in daylight (day/night, clouds, etc.)
doors opening and closing
ceiling light turned on and off

Such variations in exposure impact 3D and 2D data differently. The impact of exposure variations in the 2D data depends on the detection algorithm used. If segmentation is performed in 2D, then these variations may or may not impact segmentation performance. For the point cloud, you may find variations in point cloud completeness due to variations in noise.

This leads to the question: Should we use the projector for 2D?

How to decide whether or not to use the projector as the light for 2D capture.

Check out Optimizing Color Image for more information on that topic.

On Zivid One+ it is important to be aware of the switching penalty that occurs when the projector is on during 2D capture. For more information, see Limitation when performing captures in a sequence while switching between 2D and 3D capture calls.

If there is enough time in between each capture cycle it is possible to mitigate the switching limitation. We can take on the penalty while the system is doing something else. For example, while the robot is moving in front of the camera. In this tutorial, we call this a dummy capture.

2D data before 3D data

If you, for example, perform segmentation in 2D and then later determine your picking pose, then you need 2D faster than 3D. The fastest way to get 2D data is via a separate 2D capture. Hence, if you need 2D data before 3D data then you should perform a separate 2D capture.

The following code sample shows how you can:

Capture 2D
Use 2D data and capture 3D in parallel
If duty cycle permits: perform a dummy capture to absorb the penalty where it does not impact performance.

C++

auto camera = zivid.connectCamera();
dummyCapture2D(camera, settings2D);
const auto frame2dAndCaptureTime = captureAndMeasure<Zivid::Frame2D>(camera, settings2D);
std::cout
    << "Starting 3D capture in current thread and using 2D data in separate thread, such that the two happen in parallel"
    << std::endl;
std::future<void> userThread =
    std::async(std::launch::async, useFrame<Zivid::Frame2D>, std::ref(frame2dAndCaptureTime.frame));
const auto frameAndCaptureTime = captureAndMeasure<Zivid::Frame>(camera, settings);
useFrame(frameAndCaptureTime.frame);
std::cout << "Wait for usage of 2D frame to finish" << std::endl;
userThread.get();
printCaptureFunctionReturnTime(frame2dAndCaptureTime.captureTime, frameAndCaptureTime.captureTime);

Following is a table with the expected performance for the different scenarios.

Platform used: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz with NVIDIA GeForce RTX 3070
		2D followed by 3D, back-to-back		2D followed by 3D, with delay [1]
		2D with projector	2D without projector	2D with projector	2D without projector
One+	2D	~420 ms [3]	~70 ms	~30 ms	~30 ms
One+	3D	~870 ms [2]	~320 ms	~870 ms [2]	~320 ms
Two	2D	~50 ms	~35 ms	~30 ms	~25 ms
Two	3D	~140 ms	~130 ms	~140 ms	~130 ms

Note

A new capture will not start until all the processing on any ongoing capture (2D or 3D) on the same camera is completed. This affects the course of events when sequentially calling two captures with the same camera. See Performance limitation of sequential captures with the same camera for more information.

2D data as part of how I use 3D data

In this case, we don’t have to get access to the 2D data before the 3D data. You always get 2D data as part of a 3D acquisition. Thus we only have to care about overall speed and quality.

Speed

For optimal speed, we simply rely on 3D acquisitions to provide good 2D data. There is no additional acquisition or separate capture for 2D data.

2D Quality

For optimal 2D quality, it is recommended to use a separate acquisition for 2D. This can either be as a separate 2D capture as discussed in the previous section, or HDR capture with UseFirstAcquisition. Adding a separate acquisition for 3D HDR for color can be costly in terms of speed. This is because the exposure is multiplied by the number of patterns for the chosen Vision Engine. This is a limitation that may be removed in future SDK updates.

Following is a table that shows what you can expect from the different configurations. At the end you will find a table showing actual measurements on different hardware with the Fast Consumer Goods settings for Zivid Two M70 (Z2 M70 Fast).

Fast: Use 2D data from 3D capture. No special acquisition or settings for 2D.
Medium Fast: Separate 2D capture followed by 3D capture.
Slow: 3D capture with an additional acquisition with special settings for optimal 2D.

Platform used: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz with NVIDIA GeForce RTX 3070
		Fast	Medium Fast		Slow
		3D [4]	2D with projector + 3D	2D without projector + 3D	3D (+1 for 2D) [5]
One+	2D	N/A	~420 ms	~30 ms	N/A
One+	3D	~340 ms	~870 ms	~320 ms	~910 ms
Two	2D	N/A	~40 ms	~40 ms	N/A
Two	3D	~280 ms	~330 ms	~290 ms	~370 ms

2D data after I have used the 3D data

You always get 2D data as part of a 3D acquisition. The table below shows 3D capture time examples.

Expected median (±stddev) in ms

Consumer Goods Settings

Zivid One+

Zivid Two

Intel UHD 750

Intel UHD G1

NVIDIA 3070

Intel UHD 750

Intel UHD G1

NVIDIA 3070

High-end [6]

Low-end [7]

High-end [8]

High-end [6]

Low-end [7]

High-end [8]

Z2 M70 Fast

NA

NA

NA

556 (±9) ms

947 (±379) ms

280 (±4) ms

Z2 L100 Fast

NA

NA

NA

558 (±9) ms

949 (±394) ms

281 (±5) ms

Z1+ M Fast

586 (±2) ms

903 (±4) ms

342 (±1) ms

NA

NA

NA

Z1+ L Fast

586 (±2) ms

901 (±4) ms

342 (±1) ms

NA

NA

NA

However, optimizing for 3D quality does not always optimize for 2D quality. Thus, it might be a good idea to have a separate 2D capture after the 3D capture. Following is a table with the expected performance for the different scenarios.

Platform used: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz with NVIDIA GeForce RTX 3070
		3D followed by 2D, back-to-back		3D followed by 2D, with delay [9]
		2D with projector	2D without projector	2D with projector	2D without projector
One+	3D	~680 ms [10]	~120 ms	~120 ms	~120 ms
One+	2D	~380 ms [11]	~30 ms	~380 ms [11]	~30 ms
Two	3D	~140 ms	~140 ms	~130 ms	~110 ms
Two	2D	~40 ms	~40 ms	~40 ms	~40 ms

Note

A new capture will not start until all the processing on any ongoing capture (2D or 3D) on the same camera is completed. This affects the course of events when sequentially calling two captures with the same camera. See Performance limitation of sequential captures with the same camera for more information.

Summary

The following tables list the different 2D+3D capture configurations. It shows how they are expected to perform relative to each other with respect to speed and quality. We separate into two scenarios:

Cycle time is so fast that each capture cycle needs to happen right after the other.
Cycle time is slow enough to allow an additional dummy capture between each capture cycle. An additional capture can take up to 800ms in the worst case. A rule of thumb is that for cycle time greater than 2 seconds a dummy capture saves time.

Back-to-back captures

Capture Cycle (no wait between cycles)	Speed		2D-Quality
Capture Cycle (no wait between cycles)	One+	Two	2D-Quality
3D ➞ 2D [13]	Slowest	Fast	Best
2D ➞ 3D [12]	Slow	Faster	Best
3D (w/2D [15])	Fast	Fast	Better
3D	Fastest	Fastest	Good

For back-to-back captures, it is not possible to avoid switching delay, unless the projector brightness is the same. However, in this case, it is better to set Color Mode to UseFirstAcquisition, see Color Mode.

Captures with low duty cycle

Capture Cycle (time to wait for next cycle)	Speed		2D-Quality
Capture Cycle (time to wait for next cycle)	One+	Two	2D-Quality
3D ➞ 2D [13] ➞ 3D ([14])	Slow	Fast	Best
2D ➞ 3D [12] ➞ 2D ([14])	Good	Faster	Best
3D (w/2D [15]) ➞ 3D (w/2D [15])	Fast	Fast	Best
3D ➞ 3D	Fastest	Fastest	Good

Following is a table showing actual measurements on different hardware with the Fast Consumer Goods settings for Zivid Two M70 (Z2 M70 Fast).

Note

We use the Fast Consumer Goods settings for Zivid Two (Z2 M70 Fast) and Zivid One+ (Z1+ M Fast).

Expected median (±stddev) in ms
2D+3D Capture			Zivid One+			Zivid Two
2D+3D Capture			Intel UHD 750	Intel UHD G1	NVIDIA 3070	Intel UHD 750	Intel UHD G1	NVIDIA 3070
[16]	[17]		High-end [18]	Low-end [19]	High-end [20]	High-end [18]	Low-end [19]	High-end [20]
Capture 2D and then 3D
✓	✓	2D	418 (±0.7) ms	420 (±0.8) ms	419 (±0.6) ms	51 (±1) ms	52 (±7) ms	50 (±1) ms
✓	✓	3D	1110 (±7) ms	1439 (±6) ms	865 (±9) ms	607 (±3) ms	1024 (±426) ms	325 (±2) ms
✓		2D	23 (±0.4) ms	24 (±0.4) ms	26 (±1) ms	23 (±0.3) ms	25 (±1) ms	24 (±0.3) ms
✓		3D	1104 (±8) ms	1438 (±9) ms	861 (±5) ms	608 (±3) ms	1030 (±456) ms	328 (±1) ms
	✓	2D	73 (±0.5) ms	74 (±0.7) ms	73 (±0.5) ms	50 (±1) ms	51 (±0.5) ms	49 (±1) ms
	✓	3D	566 (±5) ms	893 (±3) ms	319 (±4) ms	607 (±3) ms	1024 (±426) ms	326 (±0.5) ms
		2D	23 (±0.6) ms	23 (±0.4) ms	26 (±1) ms	12 (±0.2) ms	14 (±10) ms	13 (±0.3) ms
		3D	563 (±2) ms	897 (±5) ms	318 (±1) ms	610 (±3) ms	1027 (±407) ms	329 (±2) ms
Capture 3D and then 2D
✓	✓	2D	418 (±0.7) ms	420 (±0.7) ms	419 (±0.6) ms	51 (±1) ms	52 (±10) ms	50 (±1) ms
✓	✓	3D	1105 (±6) ms	1436 (±10) ms	864 (±6) ms	604 (±3) ms	1010 (±393) ms	324 (±1) ms
✓		2D	418 (±0.8) ms	420 (±0.7) ms	419 (±0.6) ms	43 (±0.2) ms	43 (±7) ms	43 (±0.3) ms
✓		3D	593 (±3) ms	889 (±17) ms	352 (±2) ms	592 (±3) ms	981 (±456) ms	323 (±0.9) ms
	✓	2D	73 (±0.5) ms	74 (±0.6) ms	73 (±0.5) ms	50 (±1) ms	51 (±14) ms	49 (±1) ms
	✓	3D	562 (±2) ms	891 (±3) ms	319 (±3) ms	604 (±3) ms	1008 (±338) ms	325 (±3) ms
		2D	73 (±0.6) ms	74 (±0.6) ms	73 (±0.5) ms	40 (±0.4) ms	40 (±12) ms	40 (±0.3) ms
		3D	593 (±3) ms	889 (±15) ms	352 (±2) ms	557 (±3) ms	945 (±408) ms	287 (±3) ms
Capture 3D including 2D
✓	✓	2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓	✓	3D	1108 (±41) ms	1403 (±44) ms	910 (±43) ms	622 (±3) ms	1108 (±503) ms	370 (±1) ms
✓		2D	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms	0 (±0.0) ms
✓		3D	1111 (±41) ms	1403 (±42) ms	911 (±42) ms	627 (±5) ms	1106 (±471) ms	373 (±4) ms

Tip

To test different 2D-3D strategies on your PC, you can run Capture2D+3D.cpp sample with settings loaded from YML files. Go to Samples, and select C++ for instructions.