What are hyperparameters ?
In simple terms, hyperparameters are a set of knobs you can tune before starting the learning process in machine learning. When set to a specific values, the model has a specific performance. When the hyperparameters are set to new values, the model gets new performance values.
What is performance of a model?
In any machine learning process, we want to optimize the objective-function: a fancy name for a function that evaluates model predictions v/s values provided during training. A gradient descent algorithm can find a minima of an objective function. For example, in regression, the closer the predicted values are to real values, the better the model performance.
How changing a hyperparameter impacts model
This blog aims to show an impact analysis when mini-batch size and learning rate are used as control variables. The impact of changing these controls are studied on following variables:
- Number of iterations/epochs
- Algorithm convergence speed
- Model performance and accuracy
- Chance of finding better or global minimum
- Noise in cost/evaluation metric
Knowing the impact of changing hyperparameters: mini-batch size and learning rate are very important in every day work of a Machine Learning Engineer or a Data Scientist. Also, very important for the MLS-C01 exam.
Impact of lowering Mini Batch Size
The following statements are general thumb rules and can vary depending on specific cases of model training, however they are generally true.
- Lower mini batch size leads to lower number of iterations to achieve similar model performance, meaning, larger batch size requires higher number of iterations
- Remember that:
- mini batches use
n
number of data points in an epoch - SGD - stochastic gradient descent uses
1
data point in an epoch - Gradient descent uses
all
data points in the data set in one epoch
- mini batches use
- Think of it this way, larger the batch size, a model cannot generalize well and vice-versa
- Remember that:
- Lower number of iterations, obviously would lead to faster convergence
- Lowering mini batch size increases noise in the objective function or cost function
- The increase in noise, enables the gradient to jump out of local minima, thereby increasing chances of finding a global minima
- Lowering mini batch size also improves model performance, when better minima is found
Impact of lowering Learning Rate
The following statements are general thumb rules and can vary depending on specific cases of model training, however they are generally true.
- Lower learning rate would generate smaller gradient, which would increase the iterations
- Higher the iterations, slower the convergence
- Smaller the gradients, better the model performance as the cost would be lower
- Chance of finding the global minima decreases as noise is lesser i.e. won’t be able to jump out of local minima
- It’s a good thing if we are traversing towards global minima, but not if in local minima path