2D + 3D Capture Strategy

If you require both 2D and 3D data for your application, then this tutorial is for you.

We explain and emphasize the pros and cons of different 2D-3D capture approaches, clarify some limitations, and explain how they affect cycle times. We touch upon the difference between external light for 2D and using the internal projector.

2D data

RGB image

3D data

Point Cloud

There are two different ways to get 2D data:

  1. Independently via camera.capture(Zivid::Settings2D).imageRGBA(), see 2D Image Capture Process.

  2. As part of 3D capture camera.capture(Zivid::Settings).pointCloud.copyImageRGBA(), see Point Cloud Capture Process.

Which one to use, however, depends on your requirements.

Different scenarios will lead to different tradeoffs. We break it down by which data you require first. Then we will discuss tradeoffs of speed versus quality for the different scenarios.

About external light

Before we go into the different strategies we have to discuss external light. The ideal light source for a 2D capture is strong and diffuse because this limits the blooming effects. With the internal projector as the light source, the blooming effects are almost inevitable. Mounting the camera at an angle significantly reduces this effect, but still an external diffuse light source is better. External light introduces noise in the 3D data, so one should ideally turn the external light off during 3D capture.

In addition to the reduction in blooming effects, strong external light can smooth out variations in exposure due to variations in ambient light. Typical sources for variations in ambient light:

  • changes in daylight (day/night, clouds, etc.)

  • doors opening and closing

  • ceiling light turned on and off

Such variations in exposure impact 3D and 2D data differently. The impact of exposure variations in the 2D data depends on the detection algorithm used. If segmentation is performed in 2D, then these variations may or may not impact segmentation performance. For the point cloud, you may find variations in point cloud completeness due to variations in noise.

This leads to the question: Should we use the projector for 2D?

How to decide whether or not to use the projector as the light for 2D capture.

Check out Optimizing Color Image for more information on that topic.

2D data before 3D data

If you, for example, perform segmentation in 2D and then later determine your picking pose, then you need 2D faster than 3D. The fastest way to get 2D data is via a separate 2D capture. Hence, if you need 2D data before 3D data then you should perform a separate 2D capture.

Tip

When you capture 2D separately you should disable RGB in the 3D capture. This saves both on acquisition and processing time. Disable RGB in 3D capture by setting Sampling::Color to disabled.

Warning

On Zivid 2+ we have a 4x4 subsampling mode, Monochrome Capture. When this is used there is a 35ms switching penalty between 2D and 3D capture. This happens only if the captures happen right after each other.

The following code sample shows how you can:

  1. Capture 2D

  2. Use 2D data and capture 3D in parallel

Go to source

source

const auto frame2dAndCaptureTime = captureAndMeasure<Zivid::Frame2D>(camera, settings2D);
std::future<Duration> userThread =
    std::async(std::launch::async, useFrame<Zivid::Frame2D>, std::ref(frame2dAndCaptureTime.frame));
const auto frameAndCaptureTime = captureAndMeasure<Zivid::Frame>(camera, settings);
const auto processTime = useFrame(frameAndCaptureTime.frame);
const auto processTime2D = userThread.get();

The following shows actual benchmark numbers. You will find a more extensive table at the bottom of the page.

2D data as part of how I use 3D data

In this case, we don’t have to get access to the 2D data before the 3D data. You always get 2D data as part of a 3D acquisition. Thus we only have to care about overall speed and quality.

Speed

For optimal speed, we simply rely on 3D acquisitions to provide good 2D data. There is no additional acquisition or separate capture for 2D data.

2D Quality

For optimal 2D quality, it is recommended to use a separate acquisition for 2D.

Following is a table that shows what you can expect from the different configurations. At the end you will find a table showing actual measurements on different hardware.

Fast

Use 2D data from 3D capture. No special acquisition or settings for 2D.

Best

Separate 2D capture followed by 3D capture.

Platform used: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz with NVIDIA GeForce RTX 3070

Fast

Best

3D [2]

2D + 3D

Zivid 2

2D

N/A

~25 ms

3D

~295 ms

~290 ms

2D+3D

~295 ms

~300 ms

Zivid 2+

2D

N/A

~55 ms

3D

~170 ms

~220 ms

2D+3D

~170 ms

~260 ms

The following shows actual benchmark numbers. You will find a more extensive table at the bottom of the page.

2D data after I have used the 3D data

You always get 2D data as part of a 3D acquisition. The table below shows 3D capture time examples.

However, optimizing for 3D quality does not always optimize for 2D quality. Thus, it might be a good idea to have a separate 2D capture after the 3D capture.

The following shows actual benchmark numbers. You will find a more extensive table at the bottom of the page.

Camera resolution and 1-to-1 mapping

For accurate 2D segmentation and detection, it is beneficial with a high-resolution color image. Zivid 2+ has a 5 MPx imaging sensor, while Zivid 2 has a 2.3 MPx sensors. The following table shows the resolution outputs of the different cameras for both 2D and 3D captures.

2D capture resolutions

2D capture

Zivid 2

Zivid 2+

Full resolution

1944 x 1200

2448 x 2048

2x2 subsampled

972 x 600

1224 x 1024

4x4 subsampled

Not available

612 x 512

3D capture resolutions

3D capture

Zivid 2

Zivid 2+

Full resolution [1]

1944 x 1200

2448 x 2048

2x2 subsampled [1]

972 x 600

1224 x 1024

4x4 subsampled [1]

Not available

612 x 512

Output resolution of both 2D and 3D captures in controlled via the combination of the Sampling::Pixel and the Processing::Resampling settings, see pixel sampling and Resampling. This means that it is possible to no longer have a 1-to-1 correlation between a 2D pixel and a 3D point. Consequently, it is more challenging to extract the 3D data from a segmented mask in the 2D image.

As mentioned, it is common to require high-resolution 2D data for segmentation and detection. For example, our recommended preset for Consumer Goods Z2+ M130 Quality preset uses Sampling::Pixel set to blueSubsample2x2. In this case we should either:

  • Upsample the 3D data to restore 1-to-1 correspondence, or

  • Map 2D indices to the indices in the subsampled 3D data.

Resampling

In order to match the resolution of the 2D capture, simply apply an upsampling which undoes the subsampling. This retains the speed advantages of the subsampled capture. For example:

auto settings2D = Zivid::Settings2D{
    Zivid::Settings2D::Acquisitions{ Zivid::Settings2D::Acquisition{} },
    Zivid::Settings2D::Sampling::Pixel::all,
};
auto settings = Zivid::Settings{
    Zivid::Settings::Engine::phase,
    Zivid::Settings::Acquisitions{ Zivid::Settings::Acquisition{} },
    Zivid::Settings::Sampling::Pixel::blueSubsample2x2,
    Zivid::Settings::Sampling::Color::disabled,
    Zivid::Settings::Processing::Resampling::Mode::upsample2x2,
};
settings_2d = zivid.Settings2D()
settings_2d.acquisitions.append(zivid.Settings2D.Acquisition())
settings_2d.sampling.pixel = zivid.Settings2D.Sampling.Pixel.all
settings = zivid.Settings()
settings.engine = "phase"
settings.acquisitions.append(zivid.Settings.Acquisition())
settings.sampling.pixel = zivid.Settings.Sampling.Pixel.blueSubsample2x2
settings.sampling.color = zivid.Settings.Sampling.Color.disabled
settings.processing.resampling.mode = zivid.Settings.Processing.Resampling.Mode.upsample2x2

For more details see Resampling.

Mapping pixel indices between different resolutions

The other option is to map the 2D indices to the indices in the subsampled 3D data. This option is a bit more complicated, but it is potentially more efficient. The point cloud can remain subsampled, and thus consume less memory and processing power.

To establish a correlation between the full-resolution 2D image and the subsampled point cloud, a specific mapping technique is required. This process involves extracting RGB values from the pixels that correspond to the Blue or Red pixels from the Bayer grid.

Zivid::Experimental::Calibration::pixelMapping(camera, settings); can be used to get parameters required to perform this mapping. Following is an example which uses this function.

const auto pixelMapping = Zivid::Experimental::Calibration::pixelMapping(camera, settings);
std::cout << "Pixel mapping: " << pixelMapping << std::endl;
cv::Mat mappedBGR(
    fullResolutionBGR.rows / pixelMapping.rowStride(),
    fullResolutionBGR.cols / pixelMapping.colStride(),
    CV_8UC3);
std::cout << "Mapped width: " << mappedBGR.cols << ", height: " << mappedBGR.rows << std::endl;
for(size_t row = 0; row < static_cast<size_t>(fullResolutionBGR.rows - pixelMapping.rowOffset());
    row += pixelMapping.rowStride())
{
    for(size_t col = 0; col < static_cast<size_t>(fullResolutionBGR.cols - pixelMapping.colOffset());
        col += pixelMapping.colStride())
    {
        mappedBGR.at<cv::Vec3b>(row / pixelMapping.rowStride(), col / pixelMapping.colStride()) =
            fullResolutionBGR.at<cv::Vec3b>(row + pixelMapping.rowOffset(), col + pixelMapping.colOffset());
    }
}
return mappedBGR;
pixel_mapping = calibration.pixel_mapping(camera, settings)
return rgba[
    int(pixel_mapping.row_offset) :: pixel_mapping.row_stride,
    int(pixel_mapping.col_offset) :: pixel_mapping.col_stride,
    0:3,
]

For more details about mapping (example for blueSubsample2x2)
An image sensor response curve

In order to extract all the RGB values which correspond to the blue pixels we use the indices:

\[\begin{split}\begin{bmatrix} (0,0) & (0,2) & (0,4) & (0,...) & (0,W) \\ (2,0) & (2,2) & (2,4) & (2,...) & (2,W) \\ (4,0) & (4,2) & (4,4) & (4,...) & (4,W) \\ (...,0) & (...,2) & (...,4) & (...,...) & (...,...) \\ (H,0) & (H,2) & (H,4) & (H,...) & (H,W) \\ \end{bmatrix}\text{, where }\begin{matrix} W=width-2 \\ H=height-2 \end{matrix}\end{split}\]

In order to extract all the RGB values which correspond to the red pixels we use the indices:

\[\begin{split}\begin{bmatrix} (1,1) & (1,3) & (1,5) & (1,...) & (1,W) \\ (3,1) & (3,3) & (3,5) & (3,...) & (3,W) \\ (5,1) & (5,3) & (5,5) & (5,...) & (5,W) \\ (...,1) & (...,3) & (...,5) & (...,...) & (...,...) \\ (H,1) & (H,3) & (H,5) & (H,...) & (H,W) \\ \end{bmatrix}\text{, where }\begin{matrix} W=width-1 \\ H=height-1 \end{matrix}\end{split}\]

Note

If you use intrinsics and 2D and 3D capture have different resolutions, ensure you use them correctly. See Camera Intrinsics for more information.

Summary

Our recommendation:
  • 2D capture with full resolution

  • 3D monochrome capture with subsampled resolution

Note

Subsampling or downsampling in user code is only necessary if you want to have 1-to-1 pixel correspondence when you capture and copy 2D and 3D with different resolutions.

The following tables list the different 2D+3D capture configurations. It shows how they are expected to perform relative to each other with respect to speed and quality.

Capture Cycle

Speed

2D-Quality

Zivid 2

Zivid 2+

Faster

Fast

Best

3D ➞ 2D / 2D ➞ 3D

Fast

Fast

Best

3D (w/RGB enabled)

Fastest

Fastest

Good

Following is a table showing actual measurements on different hardware. For the 3D capture we use the Fast Consumer Goods settings.

Zivid 2+

(Z2+ M130 Fast)

Zivid 2

(Z2 M70 Fast)

Tip

To test different 2D-3D strategies on your PC, you can run ZividBenchmark.cpp sample with settings loaded from YML files. Go to Samples, and select C++ for instructions.

Version History

SDK

Changes

2.12.0

Acquisition time is reduced by up to 50% for 2D captures and up to 5% for 3D captures for Zivid 2+. Zivid One+ has reached its End-of-Life and is no longer supported; thus, most of the complexities related to 2D+3D captures are no longer applicable.

2.11.0

Zivid 2 and Zivid 2+ now support concurrent processing and acquisition for 3D ➞ 2D and 3D ➞ 2D, and switching between capture modes have been optimized.