We will find the fastest way to rename a pandas dataframe. Time of execution of three different methods will be compared.
rename
method with names replaced inplace
rename
method with names replaced and returnedTo demonstrate the differences in execution time we will create a large table and fill it with random numbers.
data = np.random.lognormal(2, 5, size=(100,100000))
df = pd.DataFrame(data)
rename
method with names replaced inplace
colsmap = {0: 'firstcol', 1: 'secondcol'}
df.rename(columns=colsmap, inplace=True)
614 ms ± 14.6 ms per loop
rename
method with names replaced and returnedcolsmap = {0: 'firstcol', 1: 'secondcol'}
df = df.rename(columns=colsmap)
673 ms ± 7.53 ms per loop
The following piece of code starts with a column map which maps old names to new ones. The column map is not exhaustive i,e, not all columns are renamed (which is quite common). The leftover columns are added into the column map so that the column name list can be directly replaced by the new list.
colsmap = {0: 'firstcol', 1: 'secondcol'}
dfcols = df.columns
missingcols = {c: c for c in dfcols if c not in colsmap.keys()}
colsmap.update(missingcols)
df.columns = [colsmap[c] for c in dfcols]
664 ms ± 21.8 ms per loop
The fastest way to rename columns of a pandas dataframe is to rename inplace
with the rename
method.
Returning the dataframe is very costly. When we cannot replace inplace
, it is best to cycle through the column
names and replace the entire list of column names. The final dataframe looks like this.