Encounter with Pandas๐Ÿผ profiling

Jyothi Panuganti
3 min readJan 6, 2021

๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ

I found pandas profiling interesting

Generating profile reports from pandas library it's very easy right now by using simple two lines of code which reveals all the information about the which is hidden and can come out only using a few and more commands.

But using Pandas profiling is very targeted we plot the things hands without a couple of lines of code.

For Exploratory data analysis in pandas, we will use df.decribe() and some other quick commands, but pandas_profiling extends the pandas with df.profile_report() which generates some awesome reports for your data.

for each column, the following statistics-if relevant for each column type those are presented in HTML format.

๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ

  • Type inference: detect the types of columns.
  • Essentials: type, unique values, missing values
  • Quantile statistics โ€” minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics โ€” mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histograms
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap, and dendrogram of missing values
  • Duplicate rows List the most occurring duplicate rows
  • Text analysis learns about categories (Uppercase, Space), scripts (Latin, Cyrillic), and blocks (ASCII) of text data.

๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ๐Ÿผ

Now shall we move on to the installation

using pip

pip install -U pandas-profiling[notebook]
jupyter nbextension enable --py widgetsnbextension

If you want to install from a notebook (google colab, Kaggle notebook, or any)

import sys
!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension

after this installation just restrt your kernel or runtime

Using Conda(Anaconda Prompt)

i. conda env create -n pandas-profiling(this creates new environment)
ii. conda activate pandas-profiling
iii. conda install -c conda-forge pandas-profiling

  • **if you want to install for the base env then just type this line in the command prompt( conda install -c conda-forge pandas-profiling)***

Jupyter notebook widgets of pandas profiling if you that work you should use extension

For jupyter notebook

jupyter nbextension enable --py widgetsnbextension this command works with conda command prompt.

Have a try and enjoy.

I found pandas profiling very interesting on the EDA part.

I have used a small dataset for putting forward this pandas profiling.

import pandas

import numpy as np

df.head()

To know the parts speech of the dataset.

from pandas_profiling import ProfileReport

df.profile_report()

--

--

Jyothi Panuganti

Data Science Enthusiast, Blogger, content writer, and Freelancer.