Data is a tool; how you use it, is the art
Data visualization not only deals with expressing your data in visualized formats( infographics, images, clip arts, charts, etc.); it extends to ensuring that these formats aid consumers’ ability to easily internalize the identified patterns and insights in the data, and the stories or realities behind each pattern or insight. Simply put, data visualization holds the key to unraveling the truth embedded in the data through its relatable data stories.
The concept of using stories to relay the realities embedded in your data is called data storytelling. This concept holds…
Be ‘Visibly' sure there are clusters in that dataset before profiling what the main themes are.
Conducting Customer and Audience Segmentation etc are cost-effective ways of making evidence-based decisions, by finding themes from unsupervised data. However, conducting Cluster Analysis on a dataset and making decisions with it without verifying, first, if clusters indeed exist; can be far more damaging, expensive, and costly.
In this article, we will explain how you can use two visualization algorithms (VAT and iVAT), to assess the Clustering Tendency of an unsupervised data set in python before commencing any relevant Cluster analysis.
To look at the…
Do you know? Using boxplots, scatterplots, and other visualization tool is awesome but never enough in confirming that a relationship exists between two variables (The target feature (Y)and each of the predictor features(X))?
I know you may ask, why?
Well, because a statistician may argue that the visual pattern you are seeing in your sample is not what truly obtains in the population your sample represents. …
When preparing data for modeling or analysis, one of the known practices is, to conduct a data visualization of the target feature (Y) with each of the predictor features (X) so we can roughly see if any relationship/association exists between them. The truth is, using visualization only is never enough to conclude that a relationship/association exists between any two variables. No, it is not. The preferred standard is to combine the bivariate visualization with a statistical test.
The gold standard — “Use data visualization tools(charts/two-way table) to visualize possible existence of relationships or association between the two features, then conduct…
Imagine being hired as the Data Manager Consultant for a Malaria-focused clinical study and you know close to nothing about the subjects of Malaria and its associated metrics/indicators.
How then do you navigate through this unfamiliar terrain and succeed in your role?
Before proceeding to the main aim for this article, let me quickly answer this pertinent question that may be rumbling through the mind of those who are strictly pro data science: How is a Data Manager different from a Data Scientist?
My short Answer — The two terms are related but not the same.
Jane Doxney is a young, dynamic, brilliant, and intellectually sound individual, who rose quickly from being a machine learning enthusiast to a seasoned machine learning expert. In an instant, she had become a star of internationally recognized competitions and had received several awards and accolades in client behavior, satisfaction, and intention modelling
Jane caught the eye of Brown Foxy, an investor and entrepreneur, who was keen on investing in the next big thing. Foxy had re-directed all his investments to his new start-up — BrownTech and had engaged Jane to help lead the data-driven arm. It was all rosy from…
In this article, we would — state the appropriate criteria for applying the k-fold cross-validation on an imbalanced class distribution problem; and demonstrate how to implement that in python through Jupyter notebook.
This article will cover the following sections:
A Quick Overview Of The K-Fold Cross Validation
Overview Of Data Source
Rules For Correctly Applying The K-Fold Cross Validation On An Imbalanced Class Distribution Model
How To Apply A K-Fold Cross Validation On An Imbalanced Classification Problem In Python
It is statistically unreliable to evaluate the performance of a model just once. It is best to repeat the performance evaluation…
In this article, we used the real-world example data of the demographic and health survey for Nigeria to develop a machine learning model that will predict modern contraceptive use in the rural area of Nigeria using the logistic regression classification model.
This article will have the following sections:
Overview of analysis, and data source
Loading data into python
Data and features description
Selecting out rural respondents sample from the whole dataframe
Univariate (single feature) exploratory data analysis
Bivariate (two feature) exploratory data analysis and test of association
Checking features for missing values
Dummy coding the features