The segment anything model (SAM) is another incredible contribution from the research team at Meta. Good news, this work is shared with an Apache-2.0 license making it useful for commercial use and more.
The best place to get started is the segment-anything Github repository here.
If you like to learn by running code you are in luck. We have two notebook to run.
The authors of the
PointCNN paper has provided
detailed information in the supplementary section of their paper and an
open-source PointCNN code repository. The author's implementation is
provided in TensorFlow.
Here we will focus on the Esri PointCNN implementation. To Better understand PointCNN let's look at an implementation of the X-Conv algorithm.
For a refresher above is the algorithm.
Pypi ArcGIS Page
You can find information to download or install the arcgis Python module here
https://pypi.org/project/arcgis/#files. Download the tar or install the files with pip. Once downloaded find the
file at the path arcgis/learn/models/_pointcnn_utils.py.
We can look at the class PointCNNSeg(nn.Module) declaration to
see Python PointCNN segmentation implementation.
We reviewed the key lines of code required to implement the PointCNN algorithm
in PyTorch. To take a deeper dive into the PointCNN paper see our post
here.
Yangyan Li et al. published
PointCNN: Convolution On
-Transformed Points
in 2018. The paper is online
here. The GitHub code
repository provided by the author is
here, which uses
TensorFlow 1.6.0 for the implementation. As the title suggests the architecture
proposed is used to do learning on point clouds using convolution.
Abstract
The abstract points to (pun intended) two problems with point data:
Point clouds are irregular and unordered.
Convolving kernels against features associated with points will result in
desertion of shape information and variance to point ordering.
PointCnn Paper Abstract
The first point is the cause and the second is the effect. Due to the nature
of point data, we cannot simply push point data through CNN layers and expect
good results. The proposed
-transformation comes to our rescue. The reasons for the transformation are twofold:
We promote the weighting of features associated with points.
We encourage the points to transform into canonical order.
These two benefits directly address our original two concerns about point
data. Once we have performed our transformation CNN is performed on the
result. The authors have shown state-of-the-art on multiple challenging
benchmark datasets and tasks.
Introduction
PointCNN Paper Figure 1
In figure one, the authors illustrate the difference between regular domain
representations, such as images, and point clouds. Since the point cloud is transformation is intended to neutralize the difference in order while
maintaining the difference between shapes that are not the same. The
architecture resulting from the transformation and CNN operations performs
better than the state of the art at the time, which was PointNet++.
Hierarchical Convolution
PointCNN Paper Figure 2
In many CNN networks the layer height and width is reduced, while the number
of channels is increased, which is illustrated in the top half of figure 2.
Hierarchical convolution follow a similar strategy. The number of points is
decreased, but the remaining points have more information, which we see in the
bottom half of figure 2.
PointCNN Paper Figure 3
Figure 3 again shows the process of reducing the number of features but
increasing the number of channels.
-Conv Operator
PointCNN Paper Algorithm 1
The authors provide an in depth description of the
-Conv algorithm. First, let us break down the inputs and output.
Inputs
: The trainable kernel
: The point we are finding the projection and weights for
: The transpose matrix of
through
neighbor points, which can be expressed
: The transpose matrix of
through
neighbor features, which can be expressed
Output
: The features aggregated into representative point
In step one we normalize the points to isolate their relative position. We do
this by simply subtracting point
from each point in
to get the normalized set of points
.
Second, we use a multi-layered perceptron network on each point to change the
points into
dimensional space giving us the matrix
.
We concatenate our input feature matrix
with
to get the matrix
.
Our output from step one,
, is passed through another multi-layered perceptron network to get
, which is the
x
transformation matrix we will learn.
Do a matrix multiplication with the
-tranformation and
to get
.
In the final step we do a standard convolution using our learnable kernel
and our transformed feature matrix
to get
our final output.
Every step described here is differentiable. Therefore, the algorithm is
differentiable and we can use backpropagation.
PointCNN Architectures
PointCNN Paper Figure 4
In figure 4 we see a breakdown of the PointCNN architectures. The final output
point in figure 4a has a receptive field ratio that includes the entire point
set, which is fantastic. The final point will often have many channels. So to
distill a final classification we may want to use one or more fully connected
layers.
However, moving all the information into a single point is problematic.
Therefore, the authors propose a dense representation shown in figure 4b, which
retains a canonical set of points. Using fully connected layers we use the
canonical points and their information-rich channels to extract binary or
multi-class classification of objects. Alternatively, pooling using an average
of the canonical point channels can be used as the input to a softmax
operation used for classification.
Finally, the point segmentation process is an extension of the dense PointCNN
and is described in figure 4c. The point segmentation process uses the same
-transformation to increase the number of points for the second half of the
process, following the example of U-Net.
The authors reduce overfitting by using dropout before the last fully
connected layer. In addition, data augmentation is critical. A Gaussian
distribution is used to get a new random point selection and point order for
each training batch.
Results
Classification Results
PointCNN Paper Table 1
For classification PointCNN is evaluated using six datasets:
ModelNet40
ScanNet
TU-Berlin
Quick Draw
MNIST
CIFAR10
PointCNN was compared with the following classification strategies:
Flex-Convolution, KCNet, Kd-Net, SO-Net, 3DmFV-Net, PCNN, PointNet,
PointNet++, SpecGCN, SpiderCNN, and DGCNN.
Looking at table 1 from the paper we can see PointCNN does well for several of
the datasets and configurations considered. PointCNN gets state-of-the-art
results for overall accuracy with ModelNet40 and ScanNet.
Segmentation Results
PointCNN Paper Table 2
For segmentation Point CNN is evaluated using:
ShapeNet Parts
S3DIS
ScanNet
PointCNN gets top scores on segmentation of ShapeNet Parts, S3DIS, and
ScanNet.
Sketch and Image Classification Results
PointCNN Paper Table 3
We see PointCNN also does well with the classification of sketches and images.
Ablation Experiments and Visualizations
PointCNN Paper Table 5
The authors use ablation to show the importance of the
-transformation. With the transformation, the performance is better than
without.
PointCNN Paper Figure 5
Figure 5 from the paper provides a strong visual argument for the
-transformation. Each dot color in the visualization represents a single
sample from the dataset. The data from the ModelNet40 is fed into the model
using different random orders. Without the transformation, the variance for
each sample is relatively high and leads to the overlap. With the transformation, the variance for each sample is relatively low and has much less overlap. The
transformation is clearly a critical player in the performance of the PointCNN.
Furthermore, the lower variance of the sample point locations demonstrates the authors'
argument that a canonical set of points can be learned.
PointCNN Implementations
PointCNN Official Code Repository
Fortunately, the authors have provided detailed information in the
supplementary section of their paper and an open-source PointCNN code
repository on GitHub here. The author's implementation is provided in TensorFlow. They
even provide pretrained model to download
here.
In this tutorial, we will teach Generative Adversarial Nets (GAN) to
generate synthetic satellite imagery. Two of the most critical limits of
AI today stem from the lack of training data and the difficulty of AI to
simulate human creativity. Generative Adversarial Nets (GANs) provide a
generative strategy that allows us to produce new outputs drawn from the
same distribution as the training data. It is possible GANs hold the key
to addressing issues related to both limited data and to simulating human
creativity. Simulating synthetic satellite imagery has applications
ranging from video game content generation to creating data to train
self-driving cars.
GANs have been shown to effectively create convincing handwritten digits
and celebrity faces. As a starting point, we will use the fantastic tutorial by Nathan Inkawhich. However, instead of celebrity faces, we will use satellite imagery from
the Planet: Understanding the Amazon from Space dataset as our source data. Our discriminator will be trained to predict
the fakes, not from the original dataset. Our generator will be trained
to generate convincing fake satellite images.
Generative adversarial Networks glean their name from the key
relationship that makes them tick. The goal of the generator (G) is to
produce fakes drawn from the same distribution as the training set. The
math-speak "from the same distribution" simply means G produces images
that look similar to the training set images. The goal of the
discriminator (D) is to predict if an image is a real image from the
training set or a fake from G. Our loss function rewards G for making
good fakes and rewards D for finding the fakes. You can read more about GANs from the source, the 2014 paper Generative Adversarial Nets paper by Goodfellow. DCGAN is a very effective extension of the GAN we will be
implementing, the paper on DCGAN is
here.
Figure 1: GAN Diagram
Model Inputs
Create a free
Kaggle account to get
access to the dataset. Install the Kaggle API, which allows us to quickly
download our data.
Install the Kaggle API
Once you have an account navigate to the account tab of your user profile and select Create API Token. You can now
download the kaggle.json, which contains your API credentials. Now we
can import our credentials into Google Colab.
Import your Credentials
Create the Dataset and Dataloader
Figure 2: Preview of Training Images
Model Outputs
You can see our loss function graph is much different than what we would expect to see from a traditional supervised learning process.
Figure 3: Generator and Discriminator Loss During Training
We see the generator going from producing white noise on the first epoch of training to images with a high level of detail after the 25th epoch of training.
Figure 4: GAN Learning Process the Output from Epoch 1 to 25
Let us compare the real images to the fake images after training. It appears we have a convincing set of fake images produced by our generator.
Figure 5: Compare Training Samples with Our Fake Images