Hyperparameter Tuning the Weighted Average Ensemble in Python

Jinhang Jiang
2 min readNov 15, 2021

Introduction

In a previous blog, “Simple Weighted Average Ensemble | Machine Learning,” I talked about how to implement a weighted average ensemble from multiple classifiers to capture different information of the data to boost your model performance. In this blog, I will go one step further to demonstrate how to do hyperparameter tuning of your ensemble in python.

Data & Model

For the data, you may find them here: https://www.kaggle.com/c/homesite-quote-conversion/data

For the code of implementing an ensemble, you may find them here: https://medium.com/analytics-vidhya/simple-weighted-average-ensemble-machine-learning-777824852426

We are going to use the VotingClassifier from the sklearn library to conduct the hyperparameter tuning. You may find everything here: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html

Code

Step 1

First, let’s load the VotingClassifier from sklearn, and then we fit the model with pre-trained classifiers: Decision Tree, K-nearest Neighbors, Multi-layer Perceptron, Random Forest, and XGBoost. We ignore the parameter “weights” to get a base score.

Step 1

The average base cross-validation score is 0.9048 (+- 0.0487)

Step 2

Let’s use the GridSearchCV to find the best parameters and the best score. According to the sklearn documentation: If the voting is set to ‘hard,’ the majority voting rule will be used. If we use the voting is set to “soft,” the argmax of the sums of the predicted probabilities will be used, and this method is recommended for an ensemble of well-calibrated classifiers.

Step 2

The best params is: {‘voting’: ‘soft’, ‘weights’: (1, 1, 1, 2, 2)}

The average tuned cross-validation score is 0.9188 (+- 0.0341)

Conclusion

This blog talks about how to optimize your ensemble in python. The VotingCLassifier facilitates the process of hyperparameter tuning for your ensemble. The average tuned cross-validation showed a higher average score and a much smaller standard deviation, which means the ensemble is performing better and more stable. There is another similar method in sklearn for linear regression modeling, which is called “VotingRegressor.” The details can be found here.

Please feel free to connect with me on LinkedIn.

Related Reading

Simple Weighted Average Ensemble | Machine Learning

--

--