Home » Posts

Amazon Sagemaker Built-in Algorithms Compared

Compare various built-in algorithms in Sagemaker

September 15, 2020 · 10 min · sudo-code

Table of Contents

Ready reference for MLS-C01: Sagemaker Algorithms Compared
Algorithms categorized by feature
Algorithms that support distributed training

Cover

Ready reference for MLS-C01: Sagemaker Algorithms Compared

Study following table to compare various built-in algorithms in Sagemaker
You can use this as a ready reckoner for MLS-C01 AWS Certified Machine Learning Specialty Exam
Pay attention: Scroll the table horizontally for more columns
Download the table as PDF here

Algorithm	Algo. Type	Input Format	INT/FLOAT	Processor Type	Instance	Multiprocessor in Single Machine	Multi Machine	Use Cases	Comments	HP
Linear Learner	SUPERVISED	• RecordIO Wrapped Protobuf / CSV • Float32 Data only	FLOAT32	CPU GPU	Any	CPU GPU	Only CPU No GPU	• Regression and classification • Classification: Binary or multi-class	• Need data to be normalized else algo may not converge • Multiple models are trained in parallel	balance_multiclass_weights learning_rate mini_batch_Size L1, L2
XGBoost	SUPERVISED	• CSV or LibSVM (not AWS algo, but adapted, hence NO RecordIO-protobuf)	-	CPU	M4	-	No	• Regression and classification • Classification: Binary or multi-class	• Output model as pickle • Uses extreme boosting of trees • Algo is memory bound, not much compute	• subsample_trees (less overfitting) • eta (eq. to learning rate) • alpha, gamma, lambda (conservative trees for higher values)
Seq2Seq	SUPERVISED	• RecordIO-Protobuf	INT	GPU	P3	GPU	No	• Machine translation • Text summarization • Speech to text • Any use case where input a sequence and output is a sequence	• Along with training data and validation data files, must provide vocabulary files – in case of text seq2seq • Start with tokenized text files, then convert to RecordIO-Protobuf • Uses RNNs and CNNs internally	• batch_size • optimizer • learning_rate • num_layers_encoder • num_layers_decoder • can optimize on: accuracy, BLEU score (mach. translation), perplexity
DeepAR	SUPERVISED	• JSON Lines • GZIP • Parquet -- Each record to contain - Start: starting TS - Target: the TS values to learn/predict	-	CPU GPU	C4 P3	CPU GPU	CPU GPU	• Stock price prediction • Sales and promotion effectiveness • Any time oriented forecasting, single dimension	• Uses RNNs • Can train several related timeseries, more series the better results, learns relationships b/w timeseries • Start with CPU (C4.2xlarge, or higher), if necessary, move to GPU. Only large models need GPU	• context_length (number of time points back in time the model learns) • epochs, batch_size, learning_rate, num_cells
Blazing Text - Text Classification	SUPERVISED	Augmented manifest text format – "__label__1 this is a sentence with , punctuations also tokenized . that is space delimited . One sentence per line . label at the start"	-	CPU GPU	size < 2GB: C5 size > 2GB: P2, P3	Single GPU	No	• web search and information retrieval	• predict labels for sentence	• epochs • learning_rate • word_ngrams • vector_dim
Blazing Text - Word2Vec	UNSUPERVISED	Word2Vec one sentence per line	-	CPU GPU	P3	CPU/GPU: CBOW & Skip Gram	GPU: Batch skip gram CPU: No	• Preparing input for NLP use cases • Vectorization of text for machine translation and sentiment analysis • Semantic similarity of words	• Represents words as vectors • Semantically similar words are represented by vectors close to each other • Semantic – of or relating to meaning in language MULTIPLE MODES: • CBOW - Continuous Bag of Words - Order of words DO NOT matter • Skip Gram i.e. n-gram - order of words matter • Batch skip gram - order of words matter	• mode: mandatory • learning_rate • window_size • vector_dim • negative_Samples
Object2Vec		• Any object to be tokenized into integers • Training data: - pairs of tokens - sequence of tokens	INT	CPU GPU	M5, P2	Single machine	No	• Collaborative recommendation system • Multi-label document classification system • Sentence Embeddings • Learns relations or associations: - sen to sen - labels to seq (genre to description) - product to product (recommendation) - user to item (recommendation)	• CNNs and RNNs used • Encoders used in input - uses 2 encoders in parallel - learns associations b/w encoders, using a comparator Encoder types: • Hierchical CNNs (hCNNs) • bi-lstm • pooled_embedding	dropout, early_stopping_ epochs, learning_rate, batch_size, layers, act. func., optimizer, weight_decay
Object Detection	SUPERVISED	RecordIO (NOT Protobuf) or Images (JPEG or PNG) + With image manifest in JSON, one JSON per image that contains annotations	-	GPU	P2, P3	Yes	Yes	• Detect objects in an image • Object tracking	• Uses CNN with SSD • Transfer learning/incremental learning supported • Uses FLIP, RESCALE, JITTER internally to avoid overfitting • CPUs can be used for inference, not for training	Standard CNN HPs like: learning_rate, batch size, optimizer etc.
Image Classification	SUPERVISED	• Pipe: Apache MxNET RecordIO (NOT Protobuf) - for interoperability with other DNN frameworks • File Mode: Raw JPEG/PNG + *.LST files - associates image index, class label, path to image -- To use images directly in Pipe mode use JSON based Augmented Manifest Format	-	GPU	P2, P3	Yes	Yes	• classify images into multiple classes • dog/cat/rat/tiger etc.	• Full training: ResNet CNN is used. N/W initialized with random weights • Transfer Learning/Pre-trained: Image Net is used. Initialized with pre trained weights. Only Top FC layer is initialized with random weights. • CPU can be used for inference, if not suitable, move to GPU	• batch_size • learning_rate • optimizer, B1, B2, eps, Gamma
Semantic Segmentation	SUPERVISED	• Raw JPEG/PNG in file mode + annotations • Add Augmented Manifest Format for Pipe Mode	-	GPU	P2, P3	Yes	No	• Self driving cars • Medical imaging and diagnostics • Robot sensing • Given a pixel - what object does it belong to ?	• Algo under hood: Gluon CV of MxNET = FC + Pyramid Scene Pairing + DeepLabV3 • Arch: ResNet50/ResNet101 = “Backbone” selection in HP • Trained on ImageNet data • Incremental/Transfer learning allowed • Inference can use CPU or GPU Each of the three algorithms has two distinct components: • The backbone (or encoder)—A network that produces reliable activation maps of features. • The decoder—A network that constructs the segmentation mask from the encoded activation maps. The segmentation output is represented as a grayscale image, called a segmentation mask. A segmentation mask is a grayscale image with the same shape as the input image.	epochs, learning_rate, batch size, algo, backbone
Random Cut Forest	UNSUPERVISED	• RecordIO-Protobuf • CSV	-	CPU	M4,C4,C5	-	No	• Anomaly detection • Detect unexpected spikes in TS data • Few people have tried using this for fraud detection	• Assigns anomaly score to each data point • Uses forest of trees • Looks at expected change in complexity as a result of adding a point to a tree • Random sampling • RCF is used in Kinesis Analytics in real time	num_trees, num_samples_per_tree (= choose inversely proportional to ratio #anomalous/#normal in dataset)
Neural Topic Modelling	UNSUPERVISED	• RecordIO-Protobuf • CSV - Words must be tokenized to integers - aux channel for vocab	INT	GPU	P2, P3	-		• Organize docs into topics • Summarize docs based on topics	• Algo: Neural Variational Inference • Define how many topics to group docs into • Used only on text • CPU / GPU for inference	num_topics mini_batch_size learning_rate variation_loss (at expense of learning time)
LDA (Latent Dirichlet Allocation)	UNSUPERVISED	• RecordIO-Protobuf (Pipe Mode) • CSV - Words must be tokenized to integers - aux channel for vocab	-	CPU	M4	No	No	• Cluster customers based on purchases • Harmonic analysis in music	• Algo: LDA - Open source availability, not DNN • Can process more than text, like harmonic music analysis • Single inst. CPU	num_topics alpha0 = small values - sparse topic mixtures, >1 uniform topic mixture
kNN (k Nearest Neighbors)	SUPERVISED	• RecordIO-protobuf • CSV -- File or pipe mode both - first column has label	-	CPU GPU	-	-	-	• Classification and regression	• Sagemaker automates 3 steps: - Sample data (can’t use for huge data) - Dim reduction (sign or nfjlt methods) - Build index for looking up neighbours	k sample_size
K-Means	UNSUPERVISED	• RecordIO-protobuf • CSV -- File or pipe mode both	-	CPU (recommended) GPU	M4, M5, C4, C5	-	-	• Cluster data - unsupervised • Find groups of data points based on similarity	• Webscale K-Means in Sagemaker • Similarity measured by euclidean distance • Works to optimize the centers of eack of the k-clusters • Algorithm: 1) Determine init. cluster centers = 2 ways: k-means++ (tries to make initial clusters far apart) OR random 2) Iterate over data and calculate cluster center 3) Reduce from K to k - using Lloyd’s method or k-means++ K comes from “extra_cluster_centers” which improves accuracy, but later reduced to k. K = k • x	• K • mini_batch_size • extra_center_factor (x) • init_method (k-means++ OR random)
PCA - Principal Component Analysis	UNSUPERVISED	• RecordIO-protobuf • CSV -- File or pipe mode both	-	CPU GPU	-	-	-	• Dimensionality Reduction • Removes Curse of Dimensionality	• Reduced Dimensions are called components • 1st component - largest possible variaility, next 2nd component, so on .. • Used Singular Value Decomposition (SVD) • Two Modes: - Regular: Sparse data. modelate #features, #rows - Randomized: Dense data. #large data, #large features, uses approximation algos	• algorithm_mode (regular, random • subtract_mean: unbiases data
Factorization Machines	SUPERVISED	• RecordIO-Protobuf	FLOAT32	CPU (recommended) GPU	-	-	-	• Regression, Classification, recommendation - all in one general purpose algo for sparse data • Click prediction • Item recommendation	• Limited to pairwise interaction - 2nd order e.g. user to item interactions • CSV not practical hence not supported,a s data is sparse *GPU not recommented as data is sparse, GPU works better on dense data	• Initialization methods for bias, factors and linear terms - methods: uniform, normal or const - can tune properties of each method
IP Insights	UNSUPERVISED	• CSV only for training • Inference: JSON lines, CSV, JSON	-	CPU GPU (recommended)	-	Multi GPU	-	• Identify suspicious IP addresses in context of security • Logins from anomalous IPs • Identify accounts creating resources from anamolous IPs	• Only IPv4 supported • Uses NN to learn latent vector rep. of entities and IP addresses • entities are hashed and embedded - large hash size • Automatically generates anomalous data by randomly pairing entities and IPs - as data will be highly imbalanced	• num_entity_vectors (hash size, set to twice the unique entity identifiers) • vector_dim (size of embedding vectors, scales model size) • Others: epoch, batch_size, leraning_rate, etc.
Reinforcement Learning	REINFORCEMENT LEARNING	• Nothing specific to Sagemaker	-	GPU	GPU	Yes	Yes - Multi Instance GPU recommended	• Games • Supply chain management • HVAC Systems • Industrial robotics • Dialog systems • Autonomous vehicles	• Supports Intel coach, Ray RLLib • Tensorflow, MxNET • Custom, commercial and opensource environments supported - Matlab simulink, energy plus, robo school, pybullet, Amazon Sumerian, AWS Robomaker	• Depends on framework and algo used, nothing tied to Sagemaker

Algorithms categorized by feature

Here a feature is based on:
- Data type of input
- Algorithms that can train on CPU or GPU
- Algorithms that can be trained incrementally
  - Incremental training enables to resume training, or to retrain a DNN by changing the final FC layer by using a pre-trained model i.e transfer learning

Mandatory FLOAT32	Mandatory INT32	CPU ONLY	GPU Only	Incremental Training Available
Linear Learner	Seq2Seq	XGBoost	Seq2Seq	Image Classification
Factorization Machines	Object2Vec	RCF	Image Classification	Semantic Segmentation
	NTM	LDA	Semantic Segmentation	Object Detection
			Object Detection
			NTM
			Reinforcement Learning

Algorithms that support distributed training

Mnemonic to remember: F-SKILBDR as in SkillBuilder (If you can come up with something better, please share in comments! )
An entire new blog to understand distributed training and how it works coming soon!

Distributed Training Support
Factorization Machine
Seq2Seq
K-Means
IP Insights
Linear Learner (Not LDA)
Blazing Text - Word2Vec
DeepAR
RCF - Random Cut Forests

Leave your comments to appreciate the article or request for change. Thanks!