Model Evaluation Techniques

The machine learning model should be trained and predict the output. But if we train the model on a dataset and also tested it on the same dataset. Then we will get almost 100% accuracy. But if we provide unseen data, it fails to predict the correct output. This problem occurs a problem called overfitting. The model should work fine on the unseen data also. "Model evaluation is nothing but calculate the accuracy score on the unseen data." The methods to evaluate the model are divided into two categories such as holdout and cross-validation.

Holdout:

In this method, the data is trained on the data and tested on the different or unseen data. That is we divide our original dataset into training data and testing data. The general ratio of the training and testing data is 80:20. It involves the following steps.

Divide the dataset into training and testing data. (Generally 80:20)
Train the model by using the training data.
Test the data by using the testing data. Calculate the accuracy score on the testing or unseen data.

The holdout method is useful due to its simplicity, high speed, and flexibility. We just train and calculate the accuracy score only one time.

Cross-Validation:

In this technique, the original dataset is divided into more than one part and calculate the accuracy for all the part and take an average of all scores which will be the final accuracy score of the model. It has many methods but the commonly used method is K-fold cross-validation. In K-fold cross-validation, the following steps we will perform.

Divide the original dataset into k parts (approximately equal). The value of K is user-defined, and generally, it is between 5 to 10.
Repeat steps 3 & 4 for k times.
The 1 part (different every time) is used as testing data and the k-1 parts are used as training data to train the model.
Calculate the accuracy score for that model.
We will get a k number of accuracy scores. Hence just take an average of it and it will be the final accuracy score of the model.

We can understand easily with the help of the following pictures.

In the above figure, the value of K is 10. We have taken the average of all scores at the last. Due to this method, overfitting does not occur and the effectiveness will increase.

Conclusion:

Evaluation of a model means to calculate the accuracy of the model
Two techniques of evaluation of a method: holdout and cross-validation.

Machine Learning

Search This Blog