AI Nerd

How to get started with Meta's segment anything model (SAM)?

The segment anything model (SAM) is another incredible contribution from the research team at Meta. Good news, this work is shared with an Apache-2.0 license making it useful for commercial use and more.

The best place to get started is the segment-anything Github repository here.

If you like to learn by running code you are in luck. We have two notebook to run.

1. Interactive Example

In this workflow we can select a point on the image and provide a few parameters to interactively extract masks.

2. Full Image Segmentation Example

In this workflow we can provide an image with a few parameters to get masks for the entire image.

To run the code on either notebook yourself do the following:

Click the open in Colab button.
Change the "using_colab" variable from False to True.
Change the settings to use a GPU.
Run all the the cells providing any needed input.

PointCNN Implementation

The authors of the PointCNN paper has provided detailed information in the supplementary section of their paper and an open-source PointCNN code repository. The author's implementation is provided in TensorFlow.

PointCNN has other notable implimenatations:

PointCNN GitHub Repository

Esri PointCNN

PointCNN Paper Algorithm

Here we will focus on the Esri PointCNN implementation. To Better understand PointCNN let's look at an implementation of the X-Conv algorithm. For a refresher above is the algorithm.

Pypi ArcGIS Page

You can find information to download or install the arcgis Python module here https://pypi.org/project/arcgis/#files. Download the tar or install the files with pip. Once downloaded find the file at the path arcgis/learn/models/_pointcnn_utils.py.

We can look at the class PointCNNSeg(nn.Module) declaration to see Python PointCNN segmentation implementation.

Step 1 of the algorithm happens on line 188:

group_pts = group_pts - center_pts

Step 2 happens on lines 191 and 192:

group_pts = group_pts.permute(0,3,1,2).contiguous()
fts_lifted = self.MLP_delta(group_pts.contiguous())      # (B, C_delta, P, K)

Step 3 happens on lines 194 to 202:

if fts is not None:
	_, _, nf = fts.shape
	group_fts = fts.contiguous().view(-1, nf)
	group_fts = group_fts[k_ind].view(B, self.P, self.K * self.D, nf)
	group_fts = group_fts[:, :, rand_col, :]
	group_fts = group_fts.permute(0, 3, 1, 2).contiguous()
	feat = torch.cat((fts_lifted, group_fts), 1).contiguous()  # (B, C_delta + C_in, P, K)
else:
	feat = fts_lifted.contiguous()

Step 4 happens on lines 205 and 206:

X = self.MLP_X(group_pts).permute(0,2,3,1)  # (B, P, K, K)
X = X.contiguous().view(B*self.P, self.K, self.K)

Step 5 happens on lines 208 and 209:

feat = feat.permute(0,2,3,1).contiguous().view(B*self.P, self.K, -1)
feat = torch.bmm(X, feat).view(B, self.P, self.K, -1).permute(0,3,1,2)

Finally, step 6 happens on line 211:

feat = self.seperable_conv(feat.contiguous())

Summary

We reviewed the key lines of code required to implement the PointCNN algorithm in PyTorch. To take a deeper dive into the PointCNN paper see our post here.

PointCNN

Yangyan Li et al. published PointCNN: Convolution On $\mathcal{X}$ -Transformed Points in 2018. The paper is online here. The GitHub code repository provided by the author is here, which uses TensorFlow 1.6.0 for the implementation. As the title suggests the architecture proposed is used to do learning on point clouds using convolution.

Abstract

The abstract points to (pun intended) two problems with point data:

Point clouds are irregular and unordered.
Convolving kernels against features associated with points will result in desertion of shape information and variance to point ordering.

PointCnn Paper Abstract

The first point is the cause and the second is the effect. Due to the nature of point data, we cannot simply push point data through CNN layers and expect good results. The proposed $\mathcal{X}$ -transformation comes to our rescue. The reasons for the transformation are twofold:

We promote the weighting of features associated with points.
We encourage the points to transform into canonical order.

These two benefits directly address our original two concerns about point data. Once we have performed our transformation CNN is performed on the result. The authors have shown state-of-the-art on multiple challenging benchmark datasets and tasks.

Introduction

PointCNN Paper Figure 1

In figure one, the authors illustrate the difference between regular domain representations, such as images, and point clouds. Since the point cloud is transformation is intended to neutralize the difference in order while maintaining the difference between shapes that are not the same. The architecture resulting from the transformation and CNN operations performs better than the state of the art at the time, which was PointNet++.

Hierarchical Convolution

PointCNN Paper Figure 2

In many CNN networks the layer height and width is reduced, while the number of channels is increased, which is illustrated in the top half of figure 2. Hierarchical convolution follow a similar strategy. The number of points is decreased, but the remaining points have more information, which we see in the bottom half of figure 2.

PointCNN Paper Figure 3

Figure 3 again shows the process of reducing the number of features but increasing the number of channels.

$\mathcal{X}$ -Conv Operator

PointCNN Paper Algorithm 1

The authors provide an in depth description of the $\mathcal{X}$ -Conv algorithm. First, let us break down the inputs and output.

Inputs

$\mathrm{K}$ : The trainable kernel

$p$ : The point we are finding the projection and weights for

$\mathrm{P}$ : The transpose matrix of $1$ through $K$ neighbor points, which can be expressed $\mathrm{P}=(p_1,p_2,...,p_K)^T$

$\mathrm{F}$ : The transpose matrix of $1$ through $K$ neighbor features, which can be expressed $\mathrm{P}=(f_1,f_2,...,f_K)^T$

Output

$\mathrm{F}_p$ : The features aggregated into representative point $p$

In step one we normalize the points to isolate their relative position. We do this by simply subtracting point $p$ from each point in $\mathrm{P}$ to get the normalized set of points $\mathrm{P}'$ .

Second, we use a multi-layered perceptron network on each point to change the points into $C_\delta$ dimensional space giving us the matrix $\mathrm{F}_\delta$ .

We concatenate our input feature matrix $\mathrm{F}$ with $\mathrm{F}_\delta$ to get the matrix $\mathrm{F}_*$ .

Our output from step one, $\mathrm{P}'$ , is passed through another multi-layered perceptron network to get $\mathcal{X}$ , which is the $K$ x $K$ transformation matrix we will learn.

Do a matrix multiplication with the $\mathcal{X}$ -tranformation and $\mathrm{F}*$ to get $\mathrm{F}_\mathcal{X}$ .

In the final step we do a standard convolution using our learnable kernel $\mathrm{K}$ and our transformed feature matrix $\mathrm{F}_\mathcal{X}$ to get $\mathrm{F}_p$ our final output.

Every step described here is differentiable. Therefore, the algorithm is differentiable and we can use backpropagation.

PointCNN Architectures

PointCNN Paper Figure 4

In figure 4 we see a breakdown of the PointCNN architectures. The final output point in figure 4a has a receptive field ratio that includes the entire point set, which is fantastic. The final point will often have many channels. So to distill a final classification we may want to use one or more fully connected layers.

However, moving all the information into a single point is problematic. Therefore, the authors propose a dense representation shown in figure 4b, which retains a canonical set of points. Using fully connected layers we use the canonical points and their information-rich channels to extract binary or multi-class classification of objects. Alternatively, pooling using an average of the canonical point channels can be used as the input to a softmax operation used for classification.

Finally, the point segmentation process is an extension of the dense PointCNN and is described in figure 4c. The point segmentation process uses the same $\mathcal{X}$ -transformation to increase the number of points for the second half of the process, following the example of U-Net.

The authors reduce overfitting by using dropout before the last fully connected layer. In addition, data augmentation is critical. A Gaussian distribution is used to get a new random point selection and point order for each training batch.

Results

Classification Results

PointCNN Paper Table 1

For classification PointCNN is evaluated using six datasets:

ModelNet40
ScanNet
TU-Berlin
Quick Draw
MNIST
CIFAR10

PointCNN was compared with the following classification strategies: Flex-Convolution, KCNet, Kd-Net, SO-Net, 3DmFV-Net, PCNN, PointNet, PointNet++, SpecGCN, SpiderCNN, and DGCNN.

Looking at table 1 from the paper we can see PointCNN does well for several of the datasets and configurations considered. PointCNN gets state-of-the-art results for overall accuracy with ModelNet40 and ScanNet.

Segmentation Results

PointCNN Paper Table 2

For segmentation Point CNN is evaluated using:

ShapeNet Parts
S3DIS
ScanNet

PointCNN gets top scores on segmentation of ShapeNet Parts, S3DIS, and ScanNet.

Sketch and Image Classification Results

PointCNN Paper Table 3

We see PointCNN also does well with the classification of sketches and images.

Ablation Experiments and Visualizations

PointCNN Paper Table 5

The authors use ablation to show the importance of the $\mathcal{X}$ -transformation. With the transformation, the performance is better than without.

PointCNN Paper Figure 5

Figure 5 from the paper provides a strong visual argument for the $\mathcal{X}$ -transformation. Each dot color in the visualization represents a single sample from the dataset. The data from the ModelNet40 is fed into the model using different random orders. Without the transformation, the variance for each sample is relatively high and leads to the overlap. With the transformation, the variance for each sample is relatively low and has much less overlap. The transformation is clearly a critical player in the performance of the PointCNN. Furthermore, the lower variance of the sample point locations demonstrates the authors' argument that a canonical set of points can be learned.

PointCNN Implementations

PointCNN Official Code Repository

Fortunately, the authors have provided detailed information in the supplementary section of their paper and an open-source PointCNN code repository on GitHub here. The author's implementation is provided in TensorFlow. They even provide pretrained model to download here.

PointCNN has other notable implimenatations:

Generate Satellite Images using DCGAN

Introduction

In this tutorial, we will teach Generative Adversarial Nets (GAN) to generate synthetic satellite imagery. Two of the most critical limits of AI today stem from the lack of training data and the difficulty of AI to simulate human creativity. Generative Adversarial Nets (GANs) provide a generative strategy that allows us to produce new outputs drawn from the same distribution as the training data. It is possible GANs hold the key to addressing issues related to both limited data and to simulating human creativity. Simulating synthetic satellite imagery has applications ranging from video game content generation to creating data to train self-driving cars.

GANs have been shown to effectively create convincing handwritten digits and celebrity faces. As a starting point, we will use the fantastic tutorial by Nathan Inkawhich. However, instead of celebrity faces, we will use satellite imagery from the Planet: Understanding the Amazon from Space dataset as our source data. Our discriminator will be trained to predict the fakes, not from the original dataset. Our generator will be trained to generate convincing fake satellite images.

In this blog post, we will cover the big ideas and high-level concepts. You can see all of the code required and run it yourself on Google Colab here.

GAN Explained

Generative adversarial Networks glean their name from the key relationship that makes them tick. The goal of the generator (G) is to produce fakes drawn from the same distribution as the training set. The math-speak "from the same distribution" simply means G produces images that look similar to the training set images. The goal of the discriminator (D) is to predict if an image is a real image from the training set or a fake from G. Our loss function rewards G for making good fakes and rewards D for finding the fakes. You can read more about GANs from the source, the 2014 paper Generative Adversarial Nets paper by Goodfellow. DCGAN is a very effective extension of the GAN we will be implementing, the paper on DCGAN is here.

Figure 1: GAN Diagram

Model Inputs

Create a free Kaggle account to get access to the dataset. Install the Kaggle API, which allows us to quickly download our data.

Install the Kaggle API

Once you have an account navigate to the account tab of your user profile and select Create API Token. You can now download the kaggle.json, which contains your API credentials. Now we can import our credentials into Google Colab.

Import your Credentials

Create the Dataset and Dataloader

Figure 2: Preview of Training Images

Model Outputs

You can see our loss function graph is much different than what we would expect to see from a traditional supervised learning process.

Figure 3: Generator and Discriminator Loss During Training

We see the generator going from producing white noise on the first epoch of training to images with a high level of detail after the 25th epoch of training.

Figure 4: GAN Learning Process the Output from Epoch 1 to 25

Let us compare the real images to the fake images after training. It appears we have a convincing set of fake images produced by our generator.

Figure 5: Compare Training Samples with Our Fake Images

Want to learn more? You can see all of the code required and run it yourself on Google Colab here.

How to get started with Meta's segment anything model (SAM)?

PointCNN Implementation

Esri PointCNN

Summary

PointCNN

Abstract

Introduction

Hierarchical Convolution

\mathcal{X} -Conv Operator

PointCNN Architectures

Results

Classification Results

Segmentation Results

Sketch and Image Classification Results

Ablation Experiments and Visualizations

PointCNN Implementations

Generate Satellite Images using DCGAN

Introduction

GAN Explained

Model Inputs

Install the Kaggle API

Import your Credentials

Create the Dataset and Dataloader

Model Outputs

$\mathcal{X}$ -Conv Operator