Data is a tool; how you use it, is the art

Data visualization not only deals with expressing your data in visualized formats( infographics, images, clip arts, charts, etc.); it extends to ensuring that these formats aid consumers’ ability to easily internalize the identified patterns and insights in the data, and the stories or realities behind each pattern or insight. Simply put, data visualization holds the key to unraveling the truth embedded in the data through its relatable data stories.

Data Storytelling

The concept of using stories to relay the realities embedded in your data is called data storytelling. This concept holds…


Be ‘Visibly' sure there are clusters in that dataset before profiling what the main themes are.

Conducting Customer and Audience Segmentation etc are cost-effective ways of making evidence-based decisions, by finding themes from unsupervised data. However, conducting Cluster Analysis on a dataset and making decisions with it without verifying, first, if clusters indeed exist; can be far more damaging, expensive, and costly.

In this article, we will explain how you can use two visualization algorithms (VAT and iVAT), to assess the Clustering Tendency of an unsupervised data set in python before commencing any relevant Cluster analysis.

To look at the…


Test the predictor feature statistically at bivariate, before including it in the model

Do you know? Using boxplots, scatterplots, and other visualization tool is awesome but never enough in confirming that a relationship exists between two variables (The target feature (Y)and each of the predictor features(X))?

I know you may ask, why?

Well, because a statistician may argue that the visual pattern you are seeing in your sample is not what truly obtains in the population your sample represents. …


Visualization is never enough to conclude that a relationship exist between the two Variables. It is best to confirm it Statistically.

When preparing data for modeling or analysis, one of the known practices is, to conduct a data visualization of the target feature (Y) with each of the predictor features (X) so we can roughly see if any relationship/association exists between them. The truth is, using visualization only is never enough to conclude that a relationship/association exists between any two variables. No, it is not. The preferred standard is to combine the bivariate visualization with a statistical test.

The gold standard — “Use data visualization tools(charts/two-way table) to visualize possible existence of relationships or association between the two features, then conduct…


Starting out in an unfamiliar domain/field should be easy

Imagine being hired as the Data Manager Consultant for a Malaria-focused clinical study and you know close to nothing about the subjects of Malaria and its associated metrics/indicators.

How then do you navigate through this unfamiliar terrain and succeed in your role?

Before proceeding to the main aim for this article, let me quickly answer this pertinent question that may be rumbling through the mind of those who are strictly pro data science: How is a Data Manager different from a Data Scientist?

My short Answer — The two terms are related but not the same.

A Data Scientist is…


Meet Jane

Jane Doxney is a young, dynamic, brilliant, and intellectually sound individual, who rose quickly from being a machine learning enthusiast to a seasoned machine learning expert. In an instant, she had become a star of internationally recognized competitions and had received several awards and accolades in client behavior, satisfaction, and intention modelling

Jane and BrownTech

Jane caught the eye of Brown Foxy, an investor and entrepreneur, who was keen on investing in the next big thing. Foxy had re-directed all his investments to his new start-up — BrownTech and had engaged Jane to help lead the data-driven arm. It was all rosy from…


In this article, we would — state the appropriate criteria for applying the k-fold cross-validation on an imbalanced class distribution problem; and demonstrate how to implement that in python through Jupyter notebook.

This article will cover the following sections:

A Quick Overview Of The K-Fold Cross Validation

Overview Of Data Source

Rules For Correctly Applying The K-Fold Cross Validation On An Imbalanced Class Distribution Model

How To Apply A K-Fold Cross Validation On An Imbalanced Classification Problem In Python

A Quick Overview Of The K-Fold Cross-Validation

It is statistically unreliable to evaluate the performance of a model just once. It is best to repeat the performance evaluation…


In this article, we used the real-world example data of the demographic and health survey for Nigeria to develop a machine learning model that will predict modern contraceptive use in the rural area of Nigeria using the logistic regression classification model.

This article will have the following sections:

Overview of analysis, and data source

Loading data into python

Data and features description

Selecting out rural respondents sample from the whole dataframe

Univariate (single feature) exploratory data analysis

Feature Engineering

Bivariate (two feature) exploratory data analysis and test of association

Checking features for missing values

Dummy coding the features

Class imbalance…

Ayobami Akiode

Researcher|M&E Expert|Data analyst|ML|Business Intelligence|CAPI Expert|Writer| STATA|— Just write it.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store