Data analysis involves two primary tasks: Registration and Feature extraction.
Registration aims at estimating the transformation that puts different sets of data into one coordinate system. For registration, local descriptor-based image registration and iterative closest point method for point cloud registration have been widely used. Feature extraction computes the abstraction of the 3D imaging data that is related to a specific application.
The feature concept in 3D imaging is very general but is the same sense as the one in machine learning or image processing. Extracted features may be edges, corners, or histograms of an attribute. Other examples of extracted features are related to the properties of a region (regional point descriptors) or motion of the view angle (spin images). For feature extraction, BA1 while automatic feature extraction through convolutional neural network is possible for images, point cloud processing still heavily relies on the aforementioned hand-crafted features, such as spin images 74 and regional point descriptors75.
BA2 Many approaches have been proposed to address the 3D feature problem for point-clouds, such as 1) domain-specific features, 2) sensor fusion, 3) extracting features in 2D space, and 4) 3D learned features. Below provides an overview of those approaches. Domain-specific features. Previous research focused on developing domain-specific hand-crafted features using local descriptors, such as Fast Point Feature Histogram(F/PFH) 76 or global descriptors, such as Fast Viewpoint Feature Histogram(F/VFH) 77. The PFH measures differences between the normal of points in the vicinity of a point or a cluster of points and is a powerful local descriptor that captures the geometric information of the surrounding points. BA3 In practice the VFH is often calculated at different resolution scales and concatenated to improve the robustness to local variations. The VFH is the PFH with viewpoint information that captures the geometric feature of a cluster of points when they are viewed from a certain angle. BA4 These features are useful for identifying some specific components, such as scaffolding 51 and Mechanical/Electrical/Plumbing components 78.
Sensor fusion to leverage 2D features. Sensor fusion registers the data captured by multiple sensors into the same coordinate system and enables taking advantage of the specific data provided by each sensors. Typical sensor fusion approaches include during camera and LiDAR, and camera and LiDAR and localization sensors53, 79.Through image calibration applied to the fused point cloud and image data, it is possible to accurately project 3D point cloud to 2D images. An example scenario in which such a capability is leveraged is for defect measurement and annotation.
The reason of fusing 3D point cloud with 2D images is that 3D point interface, although more accurate from a measurement perspective, does not provide an easy-to-use visual interface for measurement and annotation. On the other hand, 2D images are visually more convenient to handle, but suffer from scaling and perspective issues, when they are used for making measurements80. Therefore, fusing the visual information and spatial information can better support defect measurement and annotation.With utilization localization sensors, such as GPS/IMU, and fusing that data to an image and/or point cloud data, it is possible to capture and register imaging data in real-time. This alleviates the need for off-site processing necessary to get a unified 3D point cloud. Having such a unified 3D point cloud available in real time can give a sense of completeness of data capture, which in turn can help minimize occlusions.
2D Conversion. Through voxelization and projection, 3D point cloud can be converted to image-like representations to enable application of existing 2D feature extraction techniques, such as SIFT 81 and HOG 82. Voxelization first divides the whole space into many subspaces called voxels. Each voxel will be labeled using binary representation (occupied/empty) or quantitative representation (number of points, density, or other values). After voxelization, the point cloud becomes ordered and dense (the representation is dense even though the values could still be sparse) and therefore many 2D processing methods, such as 3D Convolutional Neural Network(CNN)83, can be directly applied to voxelized point clouds. Another advantage of 2D conversion methods is that it allows transfer learning for 3D imaging.
Transfer learning stores the knowledge learned from other applications and applies it to a similar problem. A typical example of transfer learning is to learn the convolutional neural network parameters from a large set of images such as ImageNet 84 and fine-tune the network with respect to a certain application 85. Since most deep learning methods are data-hungry, transfer learning can reduce the need for large amount of data and improve the generalizability of the 3D imaging techniques. Automatic 3D feature extraction. Instead of converting point cloud to 2D-similar representations, such as spin image and voxel-based representations, another approach is to learn 3D features from a given point cloud directly.
These methods employ symmetric operations, such as local maximization or calculating Euclidean norms when extracting features. The symmetric operators are independent of the input order and therefore these methods are able to take the point cloud as a whole input without considering the point order or sparseness86. With the ability of handling raw point clouds, it is possible to train a model that takes a raw point cloud as input and obtain any desired output such as a segmented point cloud, detected objects, or locations. BA1dont’ assume that everybody know what we are talking about when we say feature extraction.
Define what it is, describe why it is needed and provide examples of features we are talking about. BA2define what these are since not everybody would know what they are (this is not a predominantly vision group). BA3I modified this description. Make sure that it looks Ok. BA4this description is not clear to me.