MNIST is a dataset which contains the images of handwritten digits from 0–9. It contains two sets of images, one set of images are used for training the model and the other set of images are used for testing purpose.

classification means, given an input to our machine learning model it should be able to classify to which part of the class this input belongs to.

For Example, assume that the classes here are 0–9 which are ten in number and the input is one of the handwritten images from MNIST dataset, our model should classify whether the image is 0 or 1 or 2,…

**Building the convolutional Neural Network Model:**

TensorFlow.JS provides an API to create sequential models, where the output from one layer is used as the input to the next layer.

1 |
const model = tf.sequential(); |

**Adding a Layer:**

Now let us add a two-dimensional convolutional layer that takes an object as an argument which contains the properties that determine our layer’s structure.

1 2 3 4 5 6 7 8 |
model.add(tf.layers.conv2d({ inputShape: [28, 28, 1], kernelSize: 5, filters: 8, strides: 1, activation: 'relu', kernelInitializer: 'varianceScaling' })); |

Input shape: It takes width, height & depth of the image.

kernelSize: size of the convolutional filter that slides the input data.

filters: Number of kernelSize filters that slide the input data sequentially.

strides: By how many pixels the convolutional filter slides the input data.

activation: Rectified Linear unit where the inputs ≤ 0 are ignored.

kernelInitializer: will randomly generate the weights which are nothing but the convolutional filter values. Assume that this filter is a 2*2 matrix and the values in the matrix as weights.

1 |
Note: Now our model should be trained enough to generate these weights accurately which can recognize the handwritten digits |

**Max Pooling Layer:**

we can add one more layer to downsample our data, Max pooling is generally used for downsampling the input data. Now let us add a max pooling layer for this purpose.

1 2 3 4 5 |
model.add(tf.layers.maxPooling2d( {poolSize: [2, 2], strides: [2, 2] }) ); |

poolSize is again a convolutional filter window that slides down the input data and gets the maximum value from that part of the input data.

You can add both the layers once again to the model, before passing it to the output layer it is followed as a practice to flatten the output. Dense layer is a fully connected layer which does the final classification.

**Dense Layer:**

1 2 3 4 5 |
model.add(tf.layers.dense( {units: 10, kernelInitializer: 'varianceScaling', activation: 'softmax' })); |

Softmax is an activation function which creates a probability distribution over our 10 classes. Before evaluating the model first, we need to compile the model by specifying the loss function, optimizer & the metrics that are used for evaluation.

1 2 3 4 5 |
model.compile({ optimizer: optimizer, loss: 'categoricalCrossentropy', metrics: ['accuracy'], }); |

**Convolution Process Flowchart:**

**How the weights are updated?**

Assume that the image 7 is a matrix of zeroes and ones, consider that the grey spot area contains one and rest as zero.

Convolution Filters are also matrices with zeroes and ones where the red spot in the filter window as one and blue part as 0.

when this filter slides down the input data, Dot product between the matrices of the filter and the input data is calculated, and this filter slides down by 1 pixel till it covers the entire image which produces one of the output channels.

Now, these output channels will be downsampled through max pooling and given as an input to the next layer. This process again repeats and the output is given as an input to the fully connected layer.

Softmax activation function will create a probability distribution over 10 classes and picks up the class that has the maximum probability.

Loss function will calculate the error between the probability distribution generated by our output layer and the expected probability based on this error it will internally update its weights to minimize the error.

I would like to thank the below references which had helped me to understand how the handwritten images are being classified using TensorFlow.js.

**References:**

https://www.youtube.com/watch?v=HMcx-zY8JSg&index=4&list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcX