Cover

What are hyperparameters ?

In simple terms, hyperparameters are a set of knobs you can tune before starting the learning process in machine learning. When set to a specific values, the model has a specific performance. When the hyperparameters are set to new values, the model gets new performance values.

What is performance of a model?

In any machine learning process, we want to optimize the objective-function: a fancy name for a function that evaluates model predictions v/s values provided during training. A gradient descent algorithm can find a minima of an objective function. For example, in regression, the closer the predicted values are to real values, the better the model performance.

How changing a hyperparameter impacts model

This blog aims to show an impact analysis when mini-batch size and learning rate are used as control variables. The impact of changing these controls are studied on following variables:

  • Number of iterations/epochs
  • Algorithm convergence speed
  • Model performance and accuracy
  • Chance of finding better or global minimum
  • Noise in cost/evaluation metric

Knowing the impact of changing hyperparameters: mini-batch size and learning rate are very important in every day work of a Machine Learning Engineer or a Data Scientist. Also, very important for the MLS-C01 exam.

Impact of lowering Mini Batch Size

The following statements are general thumb rules and can vary depending on specific cases of model training, however they are generally true.

  • Lower mini batch size leads to lower number of iterations to achieve similar model performance, meaning, larger batch size requires higher number of iterations
    • Remember that:
      • mini batches use n number of data points in an epoch
      • SGD - stochastic gradient descent uses 1 data point in an epoch
      • Gradient descent uses all data points in the data set in one epoch
    • Think of it this way, larger the batch size, a model cannot generalize well and vice-versa
  • Lower number of iterations, obviously would lead to faster convergence
  • Lowering mini batch size increases noise in the objective function or cost function
  • The increase in noise, enables the gradient to jump out of local minima, thereby increasing chances of finding a global minima
  • Lowering mini batch size also improves model performance, when better minima is found

Impact of lowering Learning Rate

The following statements are general thumb rules and can vary depending on specific cases of model training, however they are generally true.

  • Lower learning rate would generate smaller gradient, which would increase the iterations
  • Higher the iterations, slower the convergence
  • Smaller the gradients, better the model performance as the cost would be lower
  • Chance of finding the global minima decreases as noise is lesser i.e. won’t be able to jump out of local minima
    • It’s a good thing if we are traversing towards global minima, but not if in local minima path

A diagram for ready reference to see the impact chain

Ready reference: hyperparam impact on model training