The Role of Gradient in CNN Training

This article follows up on our discussion about loss functions. For newcomers, I recommend reading the previous article on “Losses” for a comprehensive understanding.

Understanding Gradient in Machine Learning

After calculating the loss, the next step involves computing the gradient, the partial derivative of loss with respect to the model’s parameters (weights and biases). The objective is to minimize the loss for a better model fit.

Partial Derivative: Why It’s Crucial

  • Concept: The partial derivative indicates how the loss function changes with a slight variation in one parameter, keeping others constant.
  • Impact: In machine learning, this concept helps us understand how tweaking model parameters affects the loss, guiding us toward optimal adjustments.

Improving Model Performance by Minimizing Loss

weight updation through gradient
  • Process: Updating the model’s parameters in the direction opposite to the gradient reduces the loss, leading to better model predictions.
  • Goal: The aim is to find parameter values that minimize loss, aligning the model’s predictions closely with true values.

Practical Implementation of Gradient Calculation

Here’s a simple Python implementation showcasing how to calculate and use gradients for optimizing model weights:

import numpy as np

input_feature_1 = np.random.rand(100,1)
input_feature_2 = np.random.rand(100,1)
input_ground_truth = np.random.rand(100,1)

weight_1 = 2#intital random weight for input_feature_1
weight_2 = 3#initial random weight for input_feature_2
print("INITIAL WEIGHTS: ",weight_1," ",weight_2)

#y = w1xI1 + w2xI2; w1 = 2 and w2 = 3
output_feature_predicted = weight_1*input_feature_1 + weight_2*input_feature_2

#L1/MSE loss or regression loss
loss = np.mean((output_feature_predicted - input_ground_truth)**2)
print("LOSS:", loss)

#now we try to minimize this loss by calculating gradient of weights(w1, w2 here)
#we calculate the gradients with partial derivative; example gradient of w1 = partial derivative of loss w.r.t w1

gradient_w1 = 2 * np.mean((output_feature_predicted - input_ground_truth) * input_feature_1) # partial derivative of X^2 w.r.t w1 = 2X * p.d of X w.r.t w1, here X^2 = (output_feature_predicted - input_ground_truth)**2
gradient_w2 = 2 * np.mean((output_feature_predicted - input_ground_truth) * input_feature_2)

#learning rate should not be too large i.e 50%/0.5 and should not be too small i.e 0.1%/0.001
learning_rate = 0.1#means modify initial weight with just 10% of the gradient
#SGD here
weight_1 = weight_1 - learning_rate*gradient_w1
weight_2 = weight_2 - learning_rate*gradient_w2
print("UPDATED WEIGHTS: ",weight_1," ",weight_2)

Conclusion: From Theory to Practice

Understanding and calculating gradients is key to optimizing CNNs. By effectively updating model parameters using techniques like Stochastic Gradient Descent (SGD), we enhance the model’s predictive accuracy. Stay tuned for further exploration of various optimization algorithms in upcoming articles.

Leave a Reply

Trending

Discover more from ML Made Simple

Subscribe now to keep reading and get access to the full archive.

Continue reading