Unless one is living under a rock, the buzz around the term “Machine Learning” is well known to all. Machine Learning is a subset of Artificial Intelligence and has made computers an important part of our everyday life. We encounter Machine Learning numerous times throughout our day without even realizing it. Machine learning has produced intelligent computer systems, continuously improving when fed with huge data, which are capable of learning patterns from data and making effective decisions without a lot of human involvement. Product recommendations in e-commerce sites, spam filtering in the mail, virtual chatbot assistants like Siri and Alexa, and self-driving cars are few of the numerous examples of machine learning around us.
In order to get more familiar with the science behind it all, let us get familiar with the crucial terms and sub fields of the machine learning vocabulary:
Dataset is a collection of training examples which are rows of data that machine learning models are to be trained on. The examples in the dataset may be labelled or unlabeled.
Label is the feature from the data that we want to predict. Label can be predicted in supervised learning through training of the machine on labelled data. Examples in labelled data consist of features and a label.
The dataset is divided into different portions called training and testing data. Training is the process of making the ML model learn good values from the training data.
Testing is performed only once when training of model is completed. After a machine has been trained on the training data, it is tested by only extracting the features from test data and predicting the labels of test examples features.
Accuracy is the measure of how correctly the model has predicted the test data labels. It is obtained simply by dividing the total number of correctly predicted observations to the total number of test observations.
Loss, also known as cost, is a measure of how much the predictions made by model vary from the original labels. The value of loss is inversely proportional to the performance of the model. Loss is generally measured through a loss function which is chosen based upon specific machines learning techniques.
Epoch can be defined as the number of times the ML algorithm has been passed over the entire training data. Increasing these number of iterations too much can result in a badly generalizing model.
Supervised learning is a subcategory of machine learning that makes use of labeled data to train its models and then make predictions accurately. The main objective behind supervised learning is to correctly predict the labels of new input data presented to the model. The two major subcategories of supervised learning are classification and regression.
Unsupervised learning is the counterpart supervised learning in which models run on their own without any kind of supervision. It uses unlabeled data and extracts new information out of data which was previously unknown by reading the characteristics of data. Clustering and Principal component analysis are the major kinds of unsupervised data.
Regression is a supervised learning technique used for predicting continuous valued outcomes. It consists of an outcome variable on Y axis with several predictor variables on the X axis and plots equations having Y as a function of X. Two major types of regression include linear and logistic regression. Predicting pricing trends is an example of regression.