How To Use a Stereo Camera for Object Detection and Measurement — BMW No Code AI

Rami Naffah
9 min readJun 1, 2021

While growing up, the ability of looking to an image improves so that humans are capable of labeling and classifying objects. In parallel, machines are being developed to imitate that ability using sophisticated technologies such as object detection. This technique is one of the most used in artificial intelligence (AI) and especially in computer vision (CV) fields for identifying objects by placing them into different categories, for e.g. cars, humans, cats, dogs, etc. However, the efficiency of any computer vision algorithm relays on the input data quality, or in other terms, the images and the camera’s quality. In this article, the “Hemistereo NX Camera” by 3D Vision Labs is adopted thanks to its various features. Extra details are listed in further sections. In addition, an associated inference API to detect and measure objects in terms of width, height and depth is presented.

1. Hemistereo NX 180 X Camera

Before going through the project and its inference API, let us take a look at the different features of the camera in question: Hemistereo NX (displayed in Figure 1).

Hemistereo NX 180 X Camera

1.1. Edge AI

The Hemistereo camera has a powerful integrated AI e.g. real-time mapping, 3d reconstruction and AI inference tasks. Therefore, applications can run without external computing. No cloud is needed.

1.2. Wide Field of View

Hemistereo technology enables stereo vision [1], which is one of the most important and famous spatial measurement methods. The concept of stereo vision is combining two or sometimes three cameras with perspective projecting characteristic, and their images are associated to return 3D depth information aka. point cloud [2]. Stereo vision also uses non-rectilinear fish-eye lenses which allow depth-sensing for opening angles of 180 degrees. This overcomes the problem of limited field of view.

1.3. Ultra-High Resolution

The camera consists of two Sony IMX477 12.48 mega-pixel CMOS sensors. They create highly detailed images of the scene. The use of two sensors allows the camera to calculate a depth image using stereo matching [3]: Stereo matching, also known as disparity estimation, is the distance between two corresponding pixels in the left and right image of a stereo pair. As image settings, the depth map is calculated with a resolution up to 12 megapixels in real-time. Normally, the depth map is generated as an RGB image. Each color refers to a distance in the scene.

Shot of an RGB depth image in real-time

1.4. Made for Developers

The Hemistereo NX camera operates on a Linux distribution, the Linux4Tegra OS. It allows full root access for the developers to implement, deploy and run their own applications on it. It also has a flexible Input/Output (IO) and an easy-to-use Software Development Kit. Moreover, it contains device drivers and Docker as container engine.

Note: System Status

The Hemistereo NX software is shipped as docker images. Therefore, updating or restating process images and crashed containers might not work as expected. In these situations, the following commands can help to troubleshoot it by starting or stopping faulty containers.

As an important notice, while running the calibration-tool image, the stereo-backend one should be stopped so that the camera doesn’t crash.

List and status of Hemistereo apps can be listed using these commands:

  • sudo hemistereo_apps_list
  • sudo hemistereo_apps_status

The following commands are used to enable, disable, run and stop apps:

  • sudo hemistereo_app_run
  • sudo hemistereo_app_stop
  • sudo hemistereo_app_enable
  • sudo hemistereo_app_disable

More commands can be found on the 3D Vision Labs‘ documentation [4].

2. Developed Inference API

The Hemistereo camera by 3D Vision Labs is a rich device for object detection. In order to facilitate the use of the device and its features e.g. object detection and measurements, an inference API was developed thanks to 3D Vision Labs’ python modules [5]. These modules such as: imgconvert.py, context.py, etc. [6] are used to facilitate the implementation of this inference API.

In this section, the installation process to run the inference API, list of endpoints and detailed explanations are covered.

2.1. Installation

A detailed README.md file explains how to install all the required prerequisites, as well as how to build and run the docker container. The file can be found on the following GitHub repository:

https://github.com/BMW-InnovationLab/BMW-HemiStereo-API.git

2.2. API Endpoints

To list all available endpoints, open your favorite browser and navigate to:

http://<hemistereo_camera_IP>:<docker_host_port>/docs

There are 7 different endpoints:

List of Hemistereo inference API’s endpoints

/set_camera_settings

This endpoint allows the user to set the vertical and horizontal field of view of the camera in degrees. A field of view is the angle that is observed by the sensor.

Input:

  • cam_ip: IP of the camera the user is using.
  • vertical_fov: Vertical field of view of the sensor in degrees.
  • horizontal_fov: Horizontal field of view of the sensor in degrees.

Output:

  • Returns null. It changes the settings in the camera.
/set_camera_settings request screenshot

/single_shot

This endpoint returns a picture captured by the camera. It is a raw image, hence no objects are labeled yet. Moreover, this endpoint saves the name — which is a datetime stamp — and serializes the depth map of the image as a pickle file to label and measure an existing object later on (/detect_input).

Input:

  • cam_ip: IP of the camera the user is using.

Output:

  • Returns the image captured by the camera.
/single_shot request and response screenshot

Note: There is a watcher.py module that listens to the directory in which the images are being saved. It runs as long as the inference API is running. It detects any change made in the directory. So, if an image is deleted, its corresponding pickle file will be automatically removed.

/single_shot/distance_map

This endpoint returns and saves the raw image with its distance map in a raw_images folder in the project’s directory.

Input:

  • cam_ip: IP of the camera that the user is using.

Output:

  • Returns the depth map and the picture captured bu the camera combined in one single image.
/single_shot/distance_map request and response screenshot

/set_threshold

This endpoint allows the user to calibrate the camera in terms of textureness threshold. The textureness threshold filters the noises that can appear if the object is detected on a white smooth surface for example. Also, the user should adjust the threshold value whenever the environment, light intensity or distance have changed.

Input:

  • cam_ip: IP of the camera that the user is using.
  • model: Previously trained model needed to detect an object in the image.
  • server: Server that contains the model. Note that the URL must end with a “/“.

Output:

  • Returns the predicted threshold value and sets it in the camera settings.
/set_threshold request and response screenshot

/detect

This endpoint performs object detection and labeling on captured images from the camera for specific objects based on the trained model it is using.

Input:

  • cam_ip: IP of the camera that the user is using.
  • model: Previously trained model needed to detect an object in the image.
  • server: Server that contains the model. Note that the URL must end with a “/“.
  • vertical_fov: Vertical field of view of the sensor in degrees.
  • horizontal_fov: Horizontal field of view of the sensor in degrees.

Output:

  • It returns bounding boxes, distance and dimensions: width, depth and height. Note that the dimensions could be a bit inaccurate if the model used is not trained well (e.g. wide bounding box case).
/detect response screenshot

/detect_input

This endpoint allows the user to attach a previously saved raw image (not labeled — from /single_shot) in order to detect it, label it and measure it. This endpoint uses the depth map previously saved in the image’s corresponding pickle file while saving the image.

Input:

  • image: Image the user wants to label.
  • model: Previously trained model needed to detect an object in the image.
  • server: Server that contains the model. Note that the URL must end with a “/“.
  • vertical_fov: Vertical field of view of the sensor in degrees.
  • horizontal_fov: Horizontal field of view of the sensor in degrees.

Output:

  • It returns bounding boxes, distance and dimensions: width, depth and height. Note that the dimensions could be a bit inaccurate if the model used is not trained well (e.g. wide bounding box case).
/detect_input request and response screenshot

/detect_save_image

This endpoint allows the user to detect and save the labeled image.

Labeled image are saved in a labeled_images folder in the project’s /src directory.

Input:

  • image: Image the user wants to label.
  • model: Previously trained model needed to detect an object in the image.
  • server: Server that contains the model. Note that the URL must end with a “/“.
  • vertical_fov: Vertical field of view of the sensor in degrees.
  • horizontal_fov: Horizontal field of view of the sensor in degrees.

Output:

  • It returns the labeled image the user wants to save.
/detect_save_image response screenshot

3. Hemistereo Viewer Software

Although there are methods and endpoints implemented to change camera parameters, the change of environment can affect those calibrations therefore a manual intervention is needed.

A Hemistereo Software Viewer is available on [7] to change more camera parameters. This section will explain more about it.

3.1. Maximum Disparities

The camera does not give accurate values when an object is really close to it. In this case, the maximum disparities can be increased to 256 to solve this issue. However, the frame rate will drop. Therefore, it is recommended to avoid changing the maximum disparities value if the camera is used for streaming purposes.

3.2. Field of View

In case the user needs to change the field of view of the camera using the viewer, they can do that by using the slide bar provided by the software.

It is advised to give the matching resolution field of view the same values as in the target camera settings.

3.3. Textureness Filter Settings

If the camera is observing a smooth surface on which the light can affect the depth map, the textureness settings should be modified.

The threshold value is often changed to filter out any junk values in the depth map that are caused by light reflection or other factors.

Usually, the more the object is far from the camera, the more the threshold value should be increased.

4. Conclusion

This article introduced the Hemistereo NX 180 X camera by citing and explaining its main features and how useful they are. It showed how it is easy to implement an API thanks to many python modules provided by the designers of the camera: 3D Vision Labs.

Moreover, this article provided a detailed explanation of the use of our inference API to label and measure the dimensions of an object using BMW‘s servers.

References

[1] 3dvisionlabs. (2021). Setup. Setup — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/hemistereo/principle.html.

[2] 3dvisionlabs. (2021). Setup. Setup — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/hemistereo/principle.html#point-cloud

[3] Wikimedia Foundation. (2021, January 18). Computer stereo vision. Wikipedia. https://en.wikipedia.org/wiki/Computer_stereo_vision.

[4] 3dvisionlabs. (2021). Setup. Setup — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/setup/setup.html.

[5] 3dvisionlabs. (2021). Python — Getting Started. Python — Getting Started — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/software/python_getting_started.html.

[6] 3dvisionlabs. (2021). Welcome to HemiStereo NX’s documentation! Welcome to HemiStereo NX’s documentation! — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/index.html

[7] 3dvisionlabs. (n.d.). HemiStereo Viewer. HemiStereo Viewer — HemiStereo NX 1.0 documentation. https://docs.3dvisionlabs.com/hs-nx/software/viewer.html

--

--