Retrofitting MIT’s deep learning “boot camp” for the virtual world
Neural Network From Scratch with NumPy and MNIST


This article was first published by IBM Developer at developer.ibm.com, but authored by Casper Hansen. Here is the Direct link.
Creating complex neural networks with different architectures in Python should be a standard practice for any Machine Learning Engineer and Data Scientist. But a genuine understanding of how a neural network works is equally as valuable. This is what we aim to expand on in this article, the very fundamentals on how we can build neural networks, without the help of the frameworks that make it easy for us.
Please open the notebook from GitHub and run the code alongside reading the explanations in this article.
Prerequisite Knowledge
In this specific article, we explore how to make a basic deep neural network, by implementing the forward and backward pass (backpropagation). This requires some specific knowledge on the functionality of neural networks – which I went over in this complete introduction to neural networks.

It's also important to know the fundamentals of linear algebra, to be able to understand why we do certain operations in this article. I have a series of articles here, where you can learn some of the fundamentals. Though, my best recommendation would be watching 3Blue1Brown's brilliant series Essence of linear algebra.

NumPy
We are building a basic deep neural network with 4 layers in total: 1 input layer, 2 hidden layers and 1 output layer. All layers will be fully connected.
We are making this neural network, because we are trying to classify digits from 0 to 9, using a dataset called MNIST, that consists of 70000 images that are 28 by 28 pixels. The dataset contains one label for each image, specifying the digit we are seeing in each image. We say that there are 10 classes, since we have 10 labels.

For training the neural network, we will use stochastic gradient descent; which means we put one image through the neural network at a time.
Let's try to define the layers in an exact way. To be able to classify digits, we must end up with the probabilities of an image belonging to a certain class, after running the neural network, because then we can quantify how well our neural network performed.
- Input layer: In this layer, we input our dataset, consisting of 28x28 images. We flatten these images into one array with $28 times 28 = 7
Source - Continue Reading: https://mlfromscratch.com/neural-network-tutorial/