How to Use the pivot_table Function for Advanced Data Summarization in Pandas

Data Analytics

How to Use the pivot_table Function for Advanced Data Summarization in Pandas

Image by Author | Midjourney

Let me guide you on how to use the Pandas pivot_table function for your data summarization.

Preparation

Let’s start with installing the necessary packages.

pip install pandas seaborn

Then, we would load the packages and the dataset example, which is Titanic.

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

Let’s move on to the next section after successfully installing the package and loading the dataset.

Pivot Table with Pandas

Pivot tables in Pandas allow for flexible data reorganization and analysis. Let’s examine some practical applications, starting with the simple one.

pivot = pd.pivot_table(titanic, values="age", index='class', columns="sex", aggfunc="mean")
print(pivot)

Output>>>
sex        female       male
class                       
First   34.611765  41.281386
Second  28.722973  30.740707
Third   21.750000  26.507589

The resulting pivot table displays average ages, with passenger classes on the vertical axis and gender categories across the top.

We can go even further with the pivot table to calculate both the mean and the sum of fares.

pivot = pd.pivot_table(titanic, values="fare", index='class', columns="sex", aggfunc=['mean', 'sum'])
print(pivot)

Output>>>
             mean                   sum           
sex         female       male     female       male
class                                              
First   106.125798  67.226127  9975.8250  8201.5875
Second   21.970121  19.741782  1669.7292  2132.1125
Third    16.118810  12.661633  2321.1086  4393.5865

We can create our function. For example, we create a function that takes the data maximum and minimum values differences and divides them by two.

def data_div_two(x):
    return (x.max() - x.min())/2

pivot = pd.pivot_table(titanic, values="age", index='class', columns="sex", aggfunc=data_div_two)
print(pivot)

Output>>>
sex     female    male
class                 
First   30.500  39.540
Second  27.500  34.665
Third   31.125  36.790

Lastly, you can add the margins to see the differences between the overall grouping average and the specific sub-group.

pivot = pd.pivot_table(titanic, values="age", index='class', columns="sex", aggfunc="mean", margins=True)
print(pivot)

Output>>>
sex        female       male        All
class                                  
First   34.611765  41.281386  38.233441
Second  28.722973  30.740707  29.877630
Third   21.750000  26.507589  25.140620
All     27.915709  30.726645  29.699118

Mastering the pivot_table function would allow you to get insight from your dataset.

Additional Resources

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

Source link

Post Views: 77