3rd Place Winning Solution Approach on Mobile Money and Financial Inclusion in Tanzania Challenge

raheem nasirudeen
5 min readJul 8, 2019

--

I,m going about writing my 3rd place winning solution approach out of Total 164 participant of the competition which runs for about 4 months.

we all know how machine learning competition can really be challenging, fun, educative and winners get prizes at the end of the competition.Funniest thing about me i Keeps on Checking the leader board daily. So Much Fun to me

Since Artificial intelligence is becoming the talk of everyone in the world of technology the sexiest job of 21st century, data science competition is one of the best way to show your Machine Learning skills in real world and knowing more after watching tutorial videos with completely clean data and mostly on training data set(seen data) which over fitting on the test data(unseen data) is not well noticed. Let show my Approach Below.

Forget about been A Polytechnic Student

PROBLEM DESCRIPTION

The Competition is hosted on Zindi.africa.

Only 16.7% of the population in Tanzania has a bank account. But an additional 48.6% of Tanzanians who don’t have a bank account do have other types of formal financial services, primarily mobile money.

For people who have been traditionally excluded from the formal financial system in Africa and other developing markets, mobile money has become an important entry point to financial inclusion. While mobile money is a tool for transferring money among people and businesses/other institutions, it is increasingly becoming a platform for people to access a broad range of financial services, including savings, credit, and insurance.

The objective of this competition is to create a machine learning model to predict which individuals are most likely to use mobile money and other financial services (savings, credit, and insurance).

N:B — more of Approach ,coding implementation will be on github repo.

Data Heading

The Data Heading

The main Goal is to predict the mobile money into Four class Multi_classification Problem. link to read https://en.wikipedia.org/wiki/Multiclass_classification.

Approach

  1. using the Raw data set to predict
  2. feature engineering
  3. Believing too much in Extreme Gradient Boosting Algorithm
  4. More Feature engineering(Enriching the model using Arc gis, searching for the Domain Knowledge)
  5. ensemble using weighted Average.

Starting the competition by using the raw giving features to make the prediction first after doing and believing too much on Algorithm like Xgboost, Lightgbm to make better prediction and be on top of the leader board all seems not to work better which after 2 month of the competition drastically drop to 54th Position of the leader board. N:B — what i observe is never take the raw data given too important to be on top of over 150 participant because is all we have access to at the start of the competition with and most important part is to use the raw data given to find pattern that can improve the model and Crucially for Feature Engineering.

Always be conscious of the data

Feature engineering takes place from the Latitude and Longitude goes a long way after making my research of generating more features from the latitude and longitude data i created some Distance through Harvesine, Harmmattan and move back to the Top 30 of the competition and is not really enough it takes much to investigate on to generate more features which actually takes me to do more research on Tanzania as Country in Africa finally i was able to get the Location from the Geo data Latitude and Longitude to create region and district which moves me to Top 20 of the Leader board. next move thanks to the 5th Place Eniola @galileoeni which told us about enriching the data from Arcgis on youtube tutorial https://youtu.be/8i6hQn5yY1s to watch and finally moving to the Top 10 of the leader board.

Harvesine method
After Geo encoding and using Arc Gis to enrich my data

To make it make more interesting to improve on my model to make sure fellow participant didn’t drop me on the leader board i tried to use No free Lunch techniques by using all the algorithm i lay my hand on. Since there is no best model for all data. Finally reaching my conclusion on using Catboost Algorithm to improve my model drastically to my data available after all feature engineering Xgboost, Lightgbm, RandomForest did not give more improvement.

Finally the solution that gives me the Best Solution is using the Averaging weighted ensemble method of catboost algorithm with different parameters tunning and taking the round of the prediction to 3.

Best Solution code can really be simple but is not all about the code but the Result Desire.

New Notebook Overview of Different Vresion. LAUGH

N:B — there are some techniques i will list below i tried on this data and did not perform better but can do better on other competition.

  1. More feature interaction
  2. using Kfold validation
  3. Doing Principal Componenet Analysis(PCA)

This how far have gone after finishing 102/144 Inter-campus machine learning 2018 by Data Science Nigeria. So inspiring to me.

Little Advise to New Bie Machine Learning is a Field for everyone irrespective of Gender, Qualification, Age, Environment and more. Learn the Basics any tools example(python, R), More of statistics, pattern recognition and a lot moe in the long run

Keep getting hands Dirty with Data set. zindi.africa, kaggle.com, machinehack.com and http://analyticsvidhya.com are all there to keep on practicing with.

connect me on @Nasereliver

Github repo — https://github.com/nasirudeenraheem/Mobile-Money-and-Financial-Inclusion-in-Tanzania-Challenge

Thanks to @DataScienceNIG, @AISaturdaysIB, for moving the Artificial Intelligence Community in Nigeria.

--

--