Skip to main content

Most Important steps to win any Machine Learning Competition

Gaining experience from the blogs of all the winners of online competition, I could see what approaches they follow, their solutions etc. Some of the most important points that I try to follow when I see a new machine learning challenge are:

1. Understand : Try to understand the data thoroughly, what each feature means and how it it related to the target variable. Exploring the data using plots gives an idea about features. 
2. Data cleaning: Imputation of missing values and data pre-processing. 
3. Feature Engineering: The most important part is feature engineering. Good feature engineering can give you an edge over others in any competition. In fact, it's the most important and most difficult step to think of and comes only with experience and practice.
4. Splitting your data: Creating a strategy how to split your data into training and validation set. A good validation set that matches your score on the leader board is very crucial to fine tune your code.
5. Benchmark Algorithm: Create a benchmark score by choosing a standard algorithm so that you keep on improving from here. Apply basic algorithms with some standard parameters such as SVM, GBM, Lasso Regression etc  for this purpose.
6. Choosing the Best algorithm: Try to remove less important features because they can make the model overfit. Build the model with a suitable algorithm, generally boosting algorithms win most of the competitions. Also, try ensemble of 2 or more algorithms to make the model perform better on a new data set. Based on the score on validation set, choose the best algorithm to go.

For a brief explanation on the standard machine learning algorithms such as Gradient Boosting(GBM), SVM, Decision Trees, Random Forest, Xgboost, Regression techniques etc.
Follow this link ML Algorithms

Comments