Pandas column rename methods compared

Posted by Saugata Chatterjee on July 10, 2022 · 3 mins read

We will find the fastest way to rename a pandas dataframe. Time of execution of three different methods will be compared.

  1. In-built rename method with names replaced inplace
  2. In-built rename method with names replaced and returned
  3. Rename columns with custom code

To demonstrate the differences in execution time we will create a large table and fill it with random numbers.

data = np.random.lognormal(2, 5, size=(100,100000))
df = pd.DataFrame(data)

1 In-built rename method with names replaced inplace

colsmap = {0: 'firstcol', 1: 'secondcol'}
df.rename(columns=colsmap, inplace=True)

614 ms ± 14.6 ms per loop

2 In-built rename method with names replaced and returned

colsmap = {0: 'firstcol', 1: 'secondcol'}
df = df.rename(columns=colsmap)

673 ms ± 7.53 ms per loop

3 Rename columns with custom code

The following piece of code starts with a column map which maps old names to new ones. The column map is not exhaustive i,e, not all columns are renamed (which is quite common). The leftover columns are added into the column map so that the column name list can be directly replaced by the new list.

colsmap = {0: 'firstcol', 1: 'secondcol'}
dfcols = df.columns
missingcols = {c: c for c in dfcols if c not in colsmap.keys()}
colsmap.update(missingcols)
df.columns  = [colsmap[c] for c in dfcols]

664 ms ± 21.8 ms per loop

Conclusion

The fastest way to rename columns of a pandas dataframe is to rename inplace with the rename method. Returning the dataframe is very costly. When we cannot replace inplace, it is best to cycle through the column names and replace the entire list of column names. The final dataframe looks like this.

rename_columns