For Dummies — Internal working of Neural networks

siddharth joshi
8 min readJun 24, 2020

--

In the last post, I have covered the key points to understand neural network based on what I learned in the Neural network and deep learning Coursera course taught by Andrew Ng

In this post, the aim is to go deeper and understand the various internal components of a neural network. This post is written considering Neural network usage for Supervised learning.

The post aims to

  1. Help non-technical folks understand neural networks
  2. Explain the internal components of neural networks in simple language

A quick recap of the basics of Neural networks

A neural network is a technique used in Machine learning. In neural networks, multiple layers (stacked together) are used to process the input and understand what the input means. The presence of multiple layers helps in learning the underlying pattern in the training dataset better.

The layers are nothing but mathematical formulae of different types. And hence, a trained neural network is nothing but a neural network that has finalized values for various variables in the used in the mathematical formulae in the neural network. (To understand the basics of Neural network, do check out my last post.)

With that summary in mind, let us proceed ahead.

Before we jump into the details of the neural networks, let us define the goal behind developing a Neural network.

The goal of developing a Neural network

Is to have an algorithm that can process the training set iteratively and understand the pattern from the training set. The pattern thus learned defines the expected output of the training set.

This neural network can then be saved and used in further tasks like Prediction.

E.g. To develop a neural network to identify a cat as a cat from Cat pictures, we will say train the neural network on 1000 pictures of Cats. During the training, the neural network will try to understand the pattern and conclude the values for the various algorithm variables. If this trained model is given a new picture of Cat, it will give an output of 1 (which means the input picture is of the cat)

Let us proceed to the next step i.e to understand the internals of a Neural network.

What are the components of a neural network

The following are the internal components of a Neural network

  1. Layers
  2. Size of the layer
  3. Training dataset
  4. Layer variables/Algorithm variables/Neural network variables
  5. Learning process
  6. Activation function
  7. Forward propagation
  8. Cost function
  9. Gradient descent
  10. Backward propagation
  11. Update Algorithm variables
  12. Static variables

a. Number of iterations

b. Learning rate

  • What is a layer in a neural network
  1. A neural network consists of more than 1 layers
  2. Each neural network layer is a collection of one or more neurons.
  3. A neuron can be defined as

Neuron = Processing formula (e.g Regression) + Neuron activation function

  1. The layers between the input and output are called Hidden layers.
  2. In fact, in general, the input and output layers are not considered as layers.
  3. Keep reading to know about Activation function
  • Size of the layer — Number of neuron in the layer

A neural network layer might have more than one neuron in a given layer. The number of units in a layer is known as the size of the layer. The size of the layer is dependent on the complexity of the input data.

  • Input to a Neural network — the Training Dataset

The input to a neural network is a dataset called the Training set. The training set has the input data and also the expected output or meaning of the data.

For e.g, the training set of images of Cats will have the RGB profile of each frame as rows and also the label that this data is of Cat. A sample dataset is given below

Pixel profile Cat [[1,0,0],[0,1,1]] 1 [[1,0,0],[1,1,1]] 0

  • Algorithm variables

The algorithm variables are the variables used in the mathematical formula of each neuron in the layers.

E.g. if the neuron is using linear regression which has the formula

Z=W.X(transpose)+b

Then W and b are the layer variables. X is the input to the layer and Z is the output of the linear regression action.

And Z is then passed as an input to the activation function of the layer. The output of the activation layer is the input to the next layer.

W and b are initialized randomly for the first run, for all the layers.

Before the next iteration, the values of W and b are updated based on the calculation of the Gradient Descent. Keep reading to understand what is Gradient Descent.

  • How a neural network learns
  1. Iteratively process the training dataset
  2. Compare the neural network output to the expected output as per the training dataset.
  3. Find how far the neural network output is from the desired output
  4. And fine-tune the neural network variables to minimize the gap.
  • What is an activation function

The output of a given layer depends upon the activation function used in the layer.

An activation function is a mathematical formula and it acts as a gate.

The purpose of this gate is to redirect the algorithm towards the desired output. Based on the input, the activation function calculates a value and is passed on to the next layer.

Some of the common activation functions are Sigmoid, Relu, tanh

  • What is forward propagation

In the forward propagation, each layer of the neural network processes the training set and gives out an output. The output is compared to the actual desired output as per the training dataset. This is called as calculating Cost.

  • What is backward propagation

And in the backward propagation, the learning process is executed in the backward direction.

In backward propagation, the difference between the input to a given layer, and the output of the given layer is calculated.

We say it is backward propagation as we are comparing the algorithm output to the input to the last layer and continuing this calculation until we reach the first layer.

By doing this we have found the relation between the output and input of each layer and is stored.

For calculation, we use the Derivatives concept of calculus. You get to know the maths in detail in the course.

  • What is a Cost function

The difference between the algorithm output and the desired output (as per the training dataset) is loss function for a given training example.

When the loss function is summed up for the entire training set, it is called as the called Cost function.

In simple words, the cost function tells you how far is the output of the neural network from the desired output for a given iteration considering all training example.

You want the cost to be minimal at the end of the training of the model.

  • What is gradient descent

Gradient descent is the name of the approach taken to update the algorithm variables.

After completing the backward propagation, we know the difference between the expected output and the output of the algorithm for each layer.

Using this information, we want to change the algorithm variables so that the difference is reduced. This approach of changing the algorithm variables values using data from backward propagation is known as Gradient descent.

Graphically, it is like moving down from top position (a considerable difference between algorithm output and the actual output) to the ground position (the algorithm output is equal to the expected output).

How fast or slow do we roll down this slope is controlled by a static variable called a Learning rate.

  • Update Algorithm variables

The output of the backward propagation layer is multiplied by the learning rate to get the new algorithm variable values to be used in the next iteration.

The formula is -

New variable value for the given layer = old variable value for the given layer - learning rate * variable value from the backward propagation for the given layer

  • What are the static variables that control the neural network
  1. The first static variable is the Learning rate. It is defined by the Data Scientist. The learning rate is used in gradient descent to decide by what size, should the values of the algorithm variables ( in this post, W and b) be updated. If the learning rate is too high, the neural network might not learn enough. And if the learning rate is low, the neural network algorithm might never complete the learning process.

The learning rate decides at the jump between or the difference between the values of a variable in 2 consecutive iterations.

2. In a single iteration of a Forward propagation and a Backward propagation, the neural network might not be trained well enough. Hence. these 2 steps are repeated multiple times. This is known as the Number of Iterations and is another static variable set by the Data Scientist.

Training the neural network is all about updating these variables iteratively such that the output of the trained model is satisfactory (the gap between expected out the algorithm output is minimal)

Putting it all together — The flow of the Neural Network training

In each iteration,

  1. Initialize the algorithm variables for all the layers randomly if it is the first iteration
  2. In the forward propagation
  • The neural network will process the training set
  • Each layer will process the training set and output is passed on to the next layer based on its activation function.
  • The loss for each layer is computed and stored in memory (cache).

3. Calculate the cost function as the sum of loss function for all the layers. This is required to know if the algorithm is learning anything.

4. Start the backward propagation to know the relation between output and input of each layer. The processing starts from the last layer and goes till the first layer.

5. Start the gradient descent — update the algorithm variables using the learning rate and the output of the backward propagation step. We call it gradient descent as we use a derivative of the output of the backward propagation step to calculate the new values for the algorithm variables.

6. Repeat the iteration with the new algorithm values. After each iteration, if the algorithm is learning the cost function value should keep on decreasing.

7. After exhausting the number of iteration, check the accuracy

  • This is calculated by comparing the output of the algorithm to the actual output for a given number of test examples.

https://youtu.be/lTjDdlR4yZc

At the end of the iterations, we expect to have a model that can predict the output of a test data satisfactorily. This is called a Trained model.

The trained model can be saved and can be used for prediction in the next jobs.

I hope reading this post has helped you in understanding the internal working of neural networks. Do share your feedback in the comments…

Happy learning!!!

--

--

siddharth joshi
siddharth joshi

Written by siddharth joshi

Product Manager, Ex Edtech Startup founder, Interested in the intersection of Technology, Management and Human Psychology

No responses yet