Explaining the Inplace Parameter for Beginners
What I wish I knew when I was getting started with Pandas Dataframes
If you’ve ever found yourself copy pasta’ing Pandas code from Stack Overflow, you may have run into a dear old friend, the inplace
parameter. At first glance it probably looks like a seemingly random parameter, popping up here and there but more often hiding in the background, defaulting to False.
When I was starting to learn Pandas, I never once encountered an explanation for what the inplace
parameter truly was, or what it was used for. Well, maybe I did, but I certainly don’t remember reading anything that gave me confidence for how to deal with it.
Fast forward to the present day, and I believe I can finally help others on their Pandas journey to grasp the concept once and for all. You can also learn by watching a video I made on this concept as part of my course, Build an NBA Fantasy Projection Model in Python with Pandas.
The inplace parameter in all its glory
At its core, the inplace
parameter helps you decide how you want to affect the underlying data of the Pandas object. Do you want to make a change to the dataframe object you are working on and overwrite what was there before? Or do you want to make a copy of the dataframe object and assign it to a different variable so you can modify that original data later? Those are the two questions you need to ask yourself. Your answers will help you determine if you need to set the inplace parameter to True or False.
To help explain it further, let’s see it in action. We’ll run through an example using NBA stats…because ball is life.
Sorting an NBA Stats Dataframe
Let’s read in a dataframe with the 2018–19 Per Game Stats which we grabbed from stats.nba.com by using this handy chrome extension to easily download the data as a CSV file. We can use the read_csv()
method and save it as a dataframed called df
.
We’ll filter the dataframe to contain only the Toronto Raptors players since they are the reigning champs, and we should respect the champs. We can print the dataframe, which we’ve saved as df
, by putting it as the last line of the Jupyter Notebook cell.
If you look at the output below, you’ll see the df
dataframe is sorted in alphabetical order based on the player names.
Now let’s say we wanted to sort our dataframe by the points per game column, with the highest points per game average at the top. We can use the .sort_values()
method to do this.
We’ll pass it the PTS
column we want to sort on and set the ascending
parameter equal to False to make sure it sorts from highest to lowest. When we run this cell we get the following output:
Great! Mission accomplished. It looks like the df
dataframe is sorted by the PTS
column. But what happens if I print the df
dataframe again in the next notebook cell? Would I still see the same ordering with Kawhi at the top?
No. It reverts back to what we had before with the player names in alphabetical order…
The reason is this: whenever you modify a dataframe (aka make a change to it), the new modified dataframe doesn’t exist until you assign it to something. Until that moment, it is merely in the memory. This is often why you see the common workaround to using the inplace
parameter, which is saving the dataframe back to the same dataframe variable.
Let’s sort the values again by PTS
then assign it back to a df
variable and print df
.
This time when we print df
, it prints it in order of highest to lowest PTS
. This is because we modified the data by sorting the values, then assigned it back over the original df
variable. You will commonly see this practice around the web.
Sorting an NBA Stats Dataframe Using the Inplace Parameter
So where does the inplace
parameter come into play? If we clear our notebook output and then rerun our original cell to get the filtered Raptors dataframe again, we can walk through the same exercise. Only this time, we will use the inplace
parameter.
This time when we sort the values, we are going to add in the inplace
parameter and set it equal to True.
Now if we print df
in the next cell, we’d expect to get the unsorted Raptors dataframe again since we haven’t assigned it to any variable.
But look what happens when we print it. The dataframe is still sorted by the PTS
. Ah hah! The magical inplace parameter at work!
We asked ourselves:
Do you want to make a change to the dataframe object you are working on and overwrite what was there before?
And we answered “Yes” — hence why we set the inplace
parameter equal to True. Again, we told the original dataframe it was ok to update itself using the PTS
in descending order change. We didn’t need to assign it to any variable. We modified the dataframe “in place.”
Setting the parameter equal to False
The flipside of this means that every time we don’t set the inplace
parameter equal to True, it is set to False by default. Let’s go through the exercise one final time, making sure to call out the inplace
parameter and setting it equal to False for good measure.
We will use the same code as our first time around, but we’ll make sure to set the inplace
parameter equal to False. If we print the dataframe, what should we expect to get?
See what happens! The df
is unchanged! That means this is identical to our first attempt. We just didn’t need to call out the inplace
parameter since it is False by default.
Reviewing the inplace parameter one last time
To sum it up nicely for you, you can think of the True and False setting for the inplace
parameter as follows:
When inplace = True
, the data is modified in place, which means it will return nothing and the dataframe is now updated.
When inplace = False
, which is the default, then the operation is performed and it returns a copy of the object. You then need to save it to something. That something can be a new variable or the same variable name.
Bonus section for making it this far!
The inplace
parameter is commonly used with the following methods:
dropna()
drop_duplicates()
fillna()
query()
rename()
reset_index()
sort_index()
sort_values()
This post is part of my new Learn With Jabe Series which focuses on Python and Pandas tutorials for beginners. The topics and code are taken directly from a recently published Udemy course, Build an NBA Projection Model in Python with Pandas which you can access with a discounted code here.