AI Nerd: PointCNN

Yangyan Li et al. published PointCNN: Convolution On $\mathcal{X}$ -Transformed Points in 2018. The paper is online here. The GitHub code repository provided by the author is here, which uses TensorFlow 1.6.0 for the implementation. As the title suggests the architecture proposed is used to do learning on point clouds using convolution.

Abstract

The abstract points to (pun intended) two problems with point data:

Point clouds are irregular and unordered.
Convolving kernels against features associated with points will result in desertion of shape information and variance to point ordering.

PointCnn Paper Abstract

The first point is the cause and the second is the effect. Due to the nature of point data, we cannot simply push point data through CNN layers and expect good results. The proposed $\mathcal{X}$ -transformation comes to our rescue. The reasons for the transformation are twofold:

We promote the weighting of features associated with points.
We encourage the points to transform into canonical order.

These two benefits directly address our original two concerns about point data. Once we have performed our transformation CNN is performed on the result. The authors have shown state-of-the-art on multiple challenging benchmark datasets and tasks.

Introduction

PointCNN Paper Figure 1

In figure one, the authors illustrate the difference between regular domain representations, such as images, and point clouds. Since the point cloud is transformation is intended to neutralize the difference in order while maintaining the difference between shapes that are not the same. The architecture resulting from the transformation and CNN operations performs better than the state of the art at the time, which was PointNet++.

Hierarchical Convolution

PointCNN Paper Figure 2

In many CNN networks the layer height and width is reduced, while the number of channels is increased, which is illustrated in the top half of figure 2. Hierarchical convolution follow a similar strategy. The number of points is decreased, but the remaining points have more information, which we see in the bottom half of figure 2.

PointCNN Paper Figure 3

Figure 3 again shows the process of reducing the number of features but increasing the number of channels.

$\mathcal{X}$ -Conv Operator

PointCNN Paper Algorithm 1

The authors provide an in depth description of the $\mathcal{X}$ -Conv algorithm. First, let us break down the inputs and output.

Inputs

$\mathrm{K}$ : The trainable kernel

$p$ : The point we are finding the projection and weights for

$\mathrm{P}$ : The transpose matrix of $1$ through $K$ neighbor points, which can be expressed $\mathrm{P}=(p_1,p_2,...,p_K)^T$

$\mathrm{F}$ : The transpose matrix of $1$ through $K$ neighbor features, which can be expressed $\mathrm{P}=(f_1,f_2,...,f_K)^T$

Output

$\mathrm{F}_p$ : The features aggregated into representative point $p$

In step one we normalize the points to isolate their relative position. We do this by simply subtracting point $p$ from each point in $\mathrm{P}$ to get the normalized set of points $\mathrm{P}'$ .

Second, we use a multi-layered perceptron network on each point to change the points into $C_\delta$ dimensional space giving us the matrix $\mathrm{F}_\delta$ .

We concatenate our input feature matrix $\mathrm{F}$ with $\mathrm{F}_\delta$ to get the matrix $\mathrm{F}_*$ .

Our output from step one, $\mathrm{P}'$ , is passed through another multi-layered perceptron network to get $\mathcal{X}$ , which is the $K$ x $K$ transformation matrix we will learn.

Do a matrix multiplication with the $\mathcal{X}$ -tranformation and $\mathrm{F}*$ to get $\mathrm{F}_\mathcal{X}$ .

In the final step we do a standard convolution using our learnable kernel $\mathrm{K}$ and our transformed feature matrix $\mathrm{F}_\mathcal{X}$ to get $\mathrm{F}_p$ our final output.

Every step described here is differentiable. Therefore, the algorithm is differentiable and we can use backpropagation.

PointCNN Architectures

PointCNN Paper Figure 4

In figure 4 we see a breakdown of the PointCNN architectures. The final output point in figure 4a has a receptive field ratio that includes the entire point set, which is fantastic. The final point will often have many channels. So to distill a final classification we may want to use one or more fully connected layers.

However, moving all the information into a single point is problematic. Therefore, the authors propose a dense representation shown in figure 4b, which retains a canonical set of points. Using fully connected layers we use the canonical points and their information-rich channels to extract binary or multi-class classification of objects. Alternatively, pooling using an average of the canonical point channels can be used as the input to a softmax operation used for classification.

Finally, the point segmentation process is an extension of the dense PointCNN and is described in figure 4c. The point segmentation process uses the same $\mathcal{X}$ -transformation to increase the number of points for the second half of the process, following the example of U-Net.

The authors reduce overfitting by using dropout before the last fully connected layer. In addition, data augmentation is critical. A Gaussian distribution is used to get a new random point selection and point order for each training batch.

Results

Classification Results

PointCNN Paper Table 1

For classification PointCNN is evaluated using six datasets:

ModelNet40
ScanNet
TU-Berlin
Quick Draw
MNIST
CIFAR10

PointCNN was compared with the following classification strategies: Flex-Convolution, KCNet, Kd-Net, SO-Net, 3DmFV-Net, PCNN, PointNet, PointNet++, SpecGCN, SpiderCNN, and DGCNN.

Looking at table 1 from the paper we can see PointCNN does well for several of the datasets and configurations considered. PointCNN gets state-of-the-art results for overall accuracy with ModelNet40 and ScanNet.

Segmentation Results

PointCNN Paper Table 2

For segmentation Point CNN is evaluated using:

ShapeNet Parts
S3DIS
ScanNet

PointCNN gets top scores on segmentation of ShapeNet Parts, S3DIS, and ScanNet.

Sketch and Image Classification Results

PointCNN Paper Table 3

We see PointCNN also does well with the classification of sketches and images.

Ablation Experiments and Visualizations

PointCNN Paper Table 5

The authors use ablation to show the importance of the $\mathcal{X}$ -transformation. With the transformation, the performance is better than without.

PointCNN Paper Figure 5

Figure 5 from the paper provides a strong visual argument for the $\mathcal{X}$ -transformation. Each dot color in the visualization represents a single sample from the dataset. The data from the ModelNet40 is fed into the model using different random orders. Without the transformation, the variance for each sample is relatively high and leads to the overlap. With the transformation, the variance for each sample is relatively low and has much less overlap. The transformation is clearly a critical player in the performance of the PointCNN. Furthermore, the lower variance of the sample point locations demonstrates the authors' argument that a canonical set of points can be learned.

PointCNN Implementations

PointCNN Official Code Repository

Fortunately, the authors have provided detailed information in the supplementary section of their paper and an open-source PointCNN code repository on GitHub here. The author's implementation is provided in TensorFlow. They even provide pretrained model to download here.

PointCNN has other notable implimenatations:

PointCNN