Backpropagation Neural Networks for MNIST Classification

Description

MNIST stands for Modified National Institute of Standards and Technology. MNIST is a  database of handwritten digits that used for training and testing many image processing  systems and machine learning research LeCun et al. [1998]. It has 60,000 training images  and 10,000 testing images of handwritten digits and characters LeCun et al. [1998].  Later in 2017, an extended database has been published that has 240,000 training images  and 40,000 testing images. Since MNIST is a standard training dataset for digits in English,  in recent times, others were also provided similar databases for training digit datasets in  other languages. The dataset is considered as a benchmark for nural networks worldwide.  MNSIT database is good for people who want to try machine learning techniques and nural  networks methods on real-world data while spending minimal efforts on preprocessing and  formatting LeCun et al. [1998]. It reduces the time and effort that spend on preprocessing and  formatting of data. The MNIST dataset was used, then neural layers with fully connected  (dense) architectures implemented.

In this project, we examined backpropagation neural networks for MNIST classification

using MATLAB. We focused on two multilayer perceptron (MLP) families:

  1. 1Sigmoid networks trained with mean squared error (MSE), representative of earlier BPNN designs (MLP MNIST P2.m).
  2. ReLU networks with softmax output and cross-entropy loss, reflecting contemporary practice (MLP2 MNIST P2.m).

For each family, we systematically studied:

 

Conclusions

This project evaluated backpropagation MLPs for MNIST in MATLAB, contrasting sig- moid/MSE networks with ReLU/softmax models while probing momentum, depth, and ini-tialization. Three conclusions stand out:

  1. Momentum aids convergence. Classical momentum consistently accelerates and stabilizes training for ReLU networks. It improves the accuracy and lowers the terminal loss.
  2. ReLU dominates sigmoid. Non-saturating activations avoid vanishing gradient and deliver better performance (∼98% on MNIST), whereas sigmoid networks failed to learn under our experments in this project.
  3. Depth and initialization interact with budget. Additional depth modestly boosts generalization but more markedly reduces loss. The benefits of He initialization become evident with longer training on the full dataset.

Authors

DOI: 10.5281/zenodo.20682487

Publication Date: 2026-03-14

Back to publications list


About