
Formulae Cheat Sheet to Prepare for Machine Learning Specialty
Remember how to write Confusion Matrix
- Know how to write confusion matrix when
ActualandPredictedare swapped - Write down both versions of confusion matrix in a rough sheet provided as soon as you start exam

Basic Formulae for Classification
Precision, Recall and Specificity
$$ Precision = Positive Prediction Value (PPV) = \frac{TP}{TP+FP} $$ $$ Recall = Sensitivity = True Positive Rate (TPR) = \frac{TP}{TP+FN} $$ $$ Specificity = True Negative Rate (TNR) = \frac{TN}{TN+FP} $$
Sudo Exam Tip:
- How to remember above formulae ?
- After having understood what above formulae mean, only way to quickly reproduce is to have some sort of recall :) from your memory, here is a trick, the way I remember:
- Precision:
- Precision formula has all components ending with P i.e. Precision = TP/(TP+FP)
- Recall:
- Once you know
Precision, forRecalljust replace the FP -> FN, that’s all!- Specificity
- Once you know
Precision, forSpecificityjust replace the TP -> TN, that’s all!
F1-Score
$$ F1 = \underbrace{\frac{2 * TP}{2*TP + FP + FN}}_{\text{In terms of TP, FP and FN}} $$
$$ F1 = \underbrace{\frac{2 * Precision * Recall}{Precision + Recall}}_{\text{In terms Precision and Recall}} $$
Sample Question and Solution

TF-IDF: Term Frequency - Inverse Document Frequency
- In information retrieval,
tf–idf,TF*IDF, orTFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpustf–idfis one of the most popular term-weighting schemes today- Read more on [Wikipedia]{https://en.wikipedia.org/wiki/Tf%E2%80%93idf}
- Term frequency (tf) of a
termis calculated over allndocuments. Iftfis to be calculated on many terms, the process is repeated again on allndocuments
$$ \begin{aligned} tf(t, d_{n}) \cr &= {\text{Term frequency of term } t \text{ in } document_{n} } \cr &= \frac{\text{Number of times }t\text{ occurs in } d_{n}}{\text{Number of words in }d_{n}} \cr \end{aligned} $$
- Document frequency (df) of a
termis calculated only once over allndocuments
$$ \begin{aligned} df(t) \cr &= {\text{Document frequency of term } t \text{ in all documents} } \cr &= \frac{\text{Number of documents with term }t}{\text{Total number of documents i.e. }n} \end{aligned} $$
- Inverse document frequency
idfis simply inverse ofdf
$$ idf(t) = log(\frac{1}{df(t)}) $$
- Finally
tf-idfis calculated for every term usingtfandidf
$$ \begin{aligned} &tf(t, d_{n}) * idf(t) \cr\cr &{\text{Which translates to every term as }} \cr &tf(t, d_{1}) * idf(t) \cr &tf(t, d_{2}) * idf(t) \cr &tf(t, d_{3}) * idf(t) \cr &tf(t, d_{4}) * idf(t) \cr &{\text{so on … till …}} \cr &tf(t, d_{n}) * idf(t) \cr \end{aligned} $$
TF-IDF Exercise
- TF-IDF tells us the significance of a term in a document
- Let’s consider few documents:
Document 1 - d1: Sudo Code blogs are a very good resource to prepare for machine learning specialty exam. The learning experience is very good. Machine learning specialty preparation is made easyDocument 2 - d2: Sudo Code blogs are a very good resource to prepare for machine learning specialty exam. The blogs are very informative and to the point. The blogs take a new approachDocument 3 - d3: Sudo Code blogs are very helpful for MLS-C01 exam
- The question we ask is: How significant is the term
learningin all documents - The answer is to calculate TF-IDF
Sample Question and Solution
$$ \begin{aligned} tf(learning,d_{1}) \cr &= \frac {\text{No. of times term learning occurs in } d_{1}} {\text{No. of words in } d_{1} } \cr &= \frac{3}{28} = 0.11 \cr \cr tf(learning,d_{2}) \cr &= \frac {\text{No. of times term learning occurs in } d_{2}} {\text{No. of words in } d_{2} } \cr &= \frac{1}{30} = 0.03 \cr \cr tf(learning,d_{3}) \cr &= \frac {\text{No. of times term learning occurs in } d_{3}} {\text{No. of words in } d_{3} } \cr &= \frac{0}{9} = 0 \cr \cr df(learning) \cr &= \frac {\text{No. of documents with term learning}} {\text{No. of documents }} \cr &= \frac{2}{3} \cr \cr idf(learning) \cr &= log(\frac{1}{df(learning)}) \cr &= log(\frac{3}{2}) \cr &= log(1.5) = 0.176 \cr \cr tfidf(learning, d_{1}) \cr &= tf(learning, d_{1}) * idf(learning) \cr &= 0.11 * 0.176 = 0.01936 \cr \cr tfidf(learning, d_{2}) \cr &= tf(learning, d_{2}) * idf(learning) \cr &= 0.03 * 0.176 = 0.00528 \cr \cr tfidf(learning, d_{3}) \cr &= tf(learning, d_{3}) * idf(learning) \cr &= 0 * 0.176 = 0 \cr \end{aligned} $$
Interpretation of TF-IDF Values
- tf-idf of term
learningis largest fordocument 1, hence the term is more significant indocument 1with weight 0.0193 - The next document is
document 2with weight 0.0058 document 3has no significance with weight 0
Kinesis Shards Calculation
- Number of shards required for a
Kinesisstream is a precisely calculation based on:- Record size
- Write Bandwidth (into kinesis stream)
- Read Bandwidth (out of kinesis stream)
- The number of shards =
shardsis calculated as
$$ \begin{aligned} shards \cr &= max(\frac{\text{Write bandwidth in KB}}{1000},\frac{\text{Read bandwidth in KB}}{2000}) \cr \cr Where,\cr &\text{Write Bandwidth in KB} = \text{Average Record Size in KB} * \text{Records Per Second} \cr &\text{Read Bandwidth in KB} = \text{Write Bandwidth in KB} * \text{Number of Consumers} \end{aligned} $$
- Reference: Read this FAQ on Kinesis Data Streams
- Search for question
How do I decide the throughput of my Kinesis stream?
- Search for question
Kinesis shard calculation example
- You are designing a system where Kinesis data streams are to be used for realtime processing of data produced by IoT systems
- The average record size produced by IoT devices are 500KB in size
- The data records are written to Kinesis stream by IoT devices using
PutRecordAPI directly at a rate of 120 records per minute - There are 7 Lambda instances that will read from the Kinesis stream and process the data, and finally store them to DynamoDB
- How many shards will the Kinesis stream need to support the above described system ?
Solution
- Remember to convert
RPMRecords Per Minute toRPSRecords Per Second
$$ 120 \text{ RPM} = \frac{120}{60} = 2 \text{ RPS} $$ $$ \begin{aligned} \text{Write Bandwidth in KB} \cr &= \text{Average Record Size in KB} * \text{Records Per Second} \cr &= 500 * 2 \cr &= 1000 KB \end{aligned} $$ $$ \begin{aligned} \text{Read Bandwidth in KB} \cr &= \text{Write Bandwidth in KB} * \text{Number of Consumers} \cr &= 1000 * 7 \cr &= 7000 KB \end{aligned} $$ $$ \begin{aligned} shards \cr &= max(\frac{\text{Write bandwidth in KB}}{1000},\frac{\text{Read bandwidth in KB}}{2000}) \cr &= max(\frac{1000}{1000},\frac{7000}{2000}) \cr &= max(1, 3.5) \cr &= 3.5 \approx 4 \end{aligned} $$
Answer
- As number of
shardscannot be a fraction, round up to the next integer i.e4 shardsare needed to support the demands of the system
Autoscaling Sagemaker
- The production variants of your model need to be autoscaled to handle fluctuation in traffic
- Perform load testing to find the peak
SageMakerVariantInvocationsPerInstancethat your model’s production variant can handle - The recommended
SAFETY_FACTORis 0.5 to start with, as per AWS - Refer here for a detailed AWS blog on fine tuning sagemaker.
If RPS is used:
$$ \begin{aligned} SageMakerVariantInvocationsPerInstance = MAXRPS * SAFETYFACTOR * 60 \end{aligned} $$
If RPM is used:
$$ \begin{aligned} SageMakerVariantInvocationsPerInstance = MAXRPM * SAFETYFACTOR \end{aligned} $$
- Where MAX_RPS is the maximum RPS that you determined from load test, and
SAFETY_FACTORis the safety factor that you chose to ensure that your clients don’t exceed the maximum RPS. Same holds for MAX_RPM
Sudo Exam Tip:
SageMakerVariantInvocationsPerInstanceis the average number of timesper minutethat each instance for a variant is invoked. The Gist! - Final configuration provided to Sagemaker forSageMakerVariantInvocationsPerInstanceshould be in terms of RPM
Exercise Question: When load testing results are in RPS
A Machine Learning Specialist wants to determine the appropriate SageMaker Variant Invocations Per Instance setting for an endpoint automatic scaling configuration. The Specialist has performed a load test on a single instance and determined that peak requests per second (RPS) without service degradation is about 20 RPS. As this is the first deployment, the Specialist intends to set the invocation safety factor to 0.5 Based on the stated parameters and given that the invocations per instance setting is measured on a per-minute basis, what should the Specialist set as the SageMakerVariantInvocationsPerInstance setting?
Solution
$$ \begin{aligned} SageMakerVariantInvocationsPerInstance \cr &= MAXRPS * SAFETYFACTOR * 60 \cr &= 20 * 0.5 * 60 \cr &= 10 * 60 = 600 \cr \end{aligned} $$
Exercise Question: When load testing results are in RPM
A Machine Learning Specialist has performed a load test on a single instance and determined that peak requests per minute (RPM) without service degradation is about 1400 RPM. The Specialist intends to set the invocation safety factor to 0.7
What should the Specialist set as the SageMakerVariantInvocationsPerInstance setting?
Solution
$$ \begin{aligned} SageMakerVariantInvocationsPerInstance \cr &= MAXRPM * SAFETYFACTOR \cr &= 1400 * 0.7 \cr &= 980 \cr \end{aligned} $$