Skip main navigation

Sensor Fusion

Autonomous vehicles make use of a suite of sensors to collect data from its environment to detect possible obstacles (perception), identify the location of objects with centimetre-level accuracy (localisation), decide on high-level routes to traverse (planning) and the correct movement of the steering wheel (control).
Cars on roads

Elements like perception, localisation, planning, and control are necessary for the success of any autonomous vehicle. Autonomous vehicles make use of a suite of sensors to collect data from its environment to detect possible obstacles (perception), identify the location of objects with centimetre-level accuracy (localisation), decide on high-level routes to traverse (planning) and the correct movement of the steering wheel (control).

Two of these sensors are the RGB camera, which provides 2D position and colour information, and a LiDAR which outputs 3D position and depth information of the environment. As both sensors (the camera and LiDAR) can capture the direct attributes of the environment simultaneously, the integration of those attributes with an efficient fusion approach greatly benefits the reliable and consistent perception of the environment.

Stereo Vision

The use of stereo vision cameras to construct a 3D view of an environment can be seen as an alternative to fusing camera and LiDAR data, however, processing stereo vision data is more computationally expensive and susceptible to estimation errors.

RBG camera output and 3D LiDAR output A RGB camera output and a 3D LiDAR output

Sensor Fusion Technology

To fuse sensory data from multiple sources, it is necessary to find the relative positions and directions of all the sensors used. As sensor fusion technology is applied to various fields, the calibration issue between sensors has become increasingly important, and more especially in fusing camera and LiDAR data in autonomous vehicles. To detect unique features in the point cloud data of LiDAR and the image data of the camera, it is necessary to determine an accurate corresponding relationship between the sensors. The data fusion process should be able to utilise the sensors’ information positively to perceive the surrounding environment, the results of fusion should be the union of sensor information rather than redundant combination of information.

To successfully fuse data from two sensors, they should have a common visible point which makes it possible to calibrate or register data from one of the sensors to the other. Secondly, the sensors must be synchronised or at least use a common time reference and thus making it possible to identify common events. For example, to fuse data from an RGB camera and a depth camera, both with the same resolution and taken at the same time from the same position is really a matter of pixel-by-pixel matching.

RGB camera and depth camera example

Unfortunately, this is not the case for RGB camera and LiDAR as there isn’t a direct pixel-to-pixel matching between the 2D and 3D data. Camera-based object detection and classification techniques are very robust and efficient as seen previously, however, it is difficult to obtain the accurate depth of detected objects. In contrast, the LiDAR sensor makes it possible to efficiently estimate the depth of objects but suffers in the classification of small and distant objects. Four sensor fusion strategies that can be used to combine data from sensors like RGB camera and LIDAR are:

  1. Early fusion, which involves the direct combination of the raw sensor data from the two sensors before any prediction about the environment is made. The combined data can be fed into a deep convolutional neural network to predict the objects in the environment.
  2. Late fusion, which involves combining the individual predictions from each sensor independently. For this approach, one can have two separate deep convolutional networks, each with an input from one of the sensors and the predictions combined to robustly identify the objects in the environment.
  3. Mid-level fusion, which involves building some intermediate representation and use that to train a deep convolutional neural network. This fusion strategy can be seen as a hybrid between the early and late fusion; however, it is generally more difficult to train with lots of parameters.
  4. Sequential or progressive fusion involves the use of sensor data one after the other from the two sensors to build an understanding of the environment. For example, to combine RGB camera and radar data, one can use the radar sensor to detect a moving object then point the camera in that direction to identify the exact object.

Assuming the camera and LiDAR data are synchronised and taken from the same viewpoint.

3D LiDAR image and 2D camera image

To fuse 2D RGB image and 3D LiDAR data, one can project each LiDAR point \(p\) onto the RGB image by using \(\alpha[u, v, 1]^T = K(Rp + t )\) , where \((u, v)\) is the pixel coordinate of the 3D point in the 2D image, \(K\) is the intrinsic calibration matrix of the camera, and \(R\) and \(t\) are the rotation matrix and translation vector that transform the 3D point from the LiDAR’s coordinate frame to the camera’s coordinate frame. \(p\) is the LiDAR point given by:

LiDAR point equation

where \(r\) is the range, \(θ\) is the azimuth angle, \(Φ\) is the elevation angle of the laser that generated the return. Thus a mapping from the LiDAR image to the RGB image can be obtained and subsequently used to copy features from the RGB image into the LiDAR image.

© University of York
This article is from the free online

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now