Backpropagation Neural Networks for MNIST Classification

Mouawad, Meshaal

Backpropagation Neural Networks for MNIST Classification

Description

MNIST stands for Modified National Institute of Standards and Technology. MNIST is a database of handwritten digits that used for training and testing many image processing systems and machine learning research LeCun et al. [1998]. It has 60,000 training images and 10,000 testing images of handwritten digits and characters LeCun et al. [1998]. Later in 2017, an extended database has been published that has 240,000 training images and 40,000 testing images. Since MNIST is a standard training dataset for digits in English, in recent times, others were also provided similar databases for training digit datasets in other languages. The dataset is considered as a benchmark for nural networks worldwide. MNSIT database is good for people who want to try machine learning techniques and nural networks methods on real-world data while spending minimal efforts on preprocessing and formatting LeCun et al. [1998]. It reduces the time and effort that spend on preprocessing and formatting of data. The MNIST dataset was used, then neural layers with fully connected (dense) architectures implemented.

In this project, we examined backpropagation neural networks for MNIST classification

using MATLAB. We focused on two multilayer perceptron (MLP) families:

1Sigmoid networks trained with mean squared error (MSE), representative of earlier BPNN designs (MLP MNIST P2.m).
ReLU networks with softmax output and cross-entropy loss, reflecting contemporary practice (MLP2 MNIST P2.m).

For each family, we systematically studied:

The effect of momentum on optimization dynamics,
The impact of depth (one versus two hidden layers), and
The role of weight initialization on final accuracy.

Conclusions

This project evaluated backpropagation MLPs for MNIST in MATLAB, contrasting sig- moid/MSE networks with ReLU/softmax models while probing momentum, depth, and ini-tialization. Three conclusions stand out:

Momentum aids convergence. Classical momentum consistently accelerates and stabilizes training for ReLU networks. It improves the accuracy and lowers the terminal loss.
ReLU dominates sigmoid. Non-saturating activations avoid vanishing gradient and deliver better performance (∼98% on MNIST), whereas sigmoid networks failed to learn under our experments in this project.
Depth and initialization interact with budget. Additional depth modestly boosts generalization but more markedly reduces loss. The benefits of He initialization become evident with longer training on the full dataset.

Authors

Mouawad, Meshaal

DOI: 10.5281/zenodo.20682487

Publication Date: 2026-03-14

Back to publications list

About