How to Use MultiIndex for Hierarchical Data Organization in PandasHow to Use MultiIndex for Hierarchical Data Organization in Pandas
Image by Editor | Midjourney & Canva

 

Let’s learn how to use MultiIndex in Pandas for hierarchical data.

 

Preparation

 

We would need the Pandas package to ensure it is installed. You can install them using the following code:

 

Then, let’s learn how to handle MultiIndex data in the Pandas.

 

Using MultiIndex in Pandas

 

MultiIndex in Pandas refers to indexing multiple levels on the DataFrame or Series. The process is helpful if we work with higher-dimensional data in a 2D tabular structure. With MultiIndex, we can index data with multiple keys and organize them better. Let’s use a dataset example to understand them better.

import pandas as pd

index = pd.MultiIndex.from_tuples(
    [('A', 1), ('A', 2), ('B', 1), ('B', 2)],
    names=['Category', 'Number']
)

df = pd.DataFrame({
    'Value': [10, 20, 30, 40]
}, index=index)

print(df)

 

The output:

                Value
Category Number       
A        1          10
         2          20
B        1          30
         2          40

 

As you can see, the DataFrame above has a two-level Index with the Category and Number as their index.

It’s also possible to set the MultiIndex with the existing columns in our DataFrame.

data = {
    'Category': ['A', 'A', 'B', 'B'],
    'Number': [1, 2, 1, 2],
    'Value': [10, 20, 30, 40]
}
df = pd.DataFrame(data)
df.set_index(['Category', 'Number'], inplace=True)

print(df)

 

The output:

                Value
Category Number       
A        1          10
         2          20
B        1          30
         2          40

 

Even with different methods, we have similar results. That’s how we can have the MultiIndex in our DataFrame.

If you already have the MultiIndex DataFrame, it’s possible to swap the level with the following code.

 

The output:

                Value
Number Category       
1      A            10
2      A            20
1      B            30
2      B            40

 

Of course, we can return the MultiIndex to columns with the following code:

 

The output:

 Category  Number  Value
0        A       1     10
1        A       2     20
2        B       1     30
3        B       2     40

 

So, how to access MultiIndex data in Pandas DataFrame? We can use the .loc method for that. For example, we access the first level of the MultiIndex DataFrame.

 

The output:

 

We can access the data value as well with Tuple.

 

The output:

Value    10
Name: (A, 1), dtype: int64

 

Lastly, we can perform statistical aggregation with MultiIndex using the .groupby method.

print(df.groupby(level=['Category']).sum())

 

The output:

 

Mastering the MultiIndex in Pandas would allow you to gain insight into hierarchal data.

 

Additional Resources

 

 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.



Source link

Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *