3 ways to rename columns of a pandas dataframe - Saugata Chatterjee Portfolio

We look at 3 ways to rename columns of a pandas dataframe. The first approach is the regular brute-force approach. The rest 2 approaches are pythonic ways of improving the applicability and efficiency of the code.

1. The brute-force approach

When the column names have to be mapped to a set that bears no relationship to the previous column names then we have little choice but to type the entire dictionary of mapping.

df = df.rename(columns={'SIS Login ID': 'email', 'SIS User ID': 'id', 
                        'Current Score': 'grade', 'Section': 'course'})

2. The dict comprehension approach

When a relationship between the old and the new column names exists then the process can be sped up using dict comprehension. Let’s say we want to add an extra string ‘old’ at the end of each column name. We first create a new list of column names from the old ones.

Map from old column names to new column names

cols = df1.columns 
cols_new = [x + ' old' for x in cols]

['MATH205 - Chapter 1 and 2 Test (380919) old',
 'MATH205 - Chapter 3 and 4 Test (380916) old',
 'MATH205 - Chapter 5 Test (380926) old',
 'MATH205 - Chapter 6 Test (380900) old',
 'MATH205 - Chapter 7 Test (380902) old',
 'MATH205 - Chapter 8 Test (380938) old',
 'MATH205 - Chapter 9 Test (380922) old',
 'MATH205 - Chapter 10 Test (380929) old']

We then create the mapping.

mapping = {key1: key2 for key1, key2 in zip(cols, cols_new)}

{'MATH205 - Chapter 1 and 2 Test (380919)': 'MATH205 - Chapter 1 and 2 Test (380919) old',
 'MATH205 - Chapter 3 and 4 Test (380916)': 'MATH205 - Chapter 3 and 4 Test (380916) old',
 'MATH205 - Chapter 5 Test (380926)': 'MATH205 - Chapter 5 Test (380926) old',
 'MATH205 - Chapter 6 Test (380900)': 'MATH205 - Chapter 6 Test (380900) old',
 'MATH205 - Chapter 7 Test (380902)': 'MATH205 - Chapter 7 Test (380902) old',
 'MATH205 - Chapter 8 Test (380938)': 'MATH205 - Chapter 8 Test (380938) old',
 'MATH205 - Chapter 9 Test (380922)': 'MATH205 - Chapter 9 Test (380922) old',
 'MATH205 - Chapter 10 Test (380929)': 'MATH205 - Chapter 10 Test (380929) old'}

Here, we have used dict comprehension to generate the key-value pairs. The key is the old colmn name and the value is the new column name. We can now rename the column names using this mapping.

df1.rename(columns=mapping)

MATH205 - Chapter 1 and 2 Test (380919) old	MATH205 - Chapter 3 and 4 Test (380916) old	MATH205 - Chapter 5 Test (380926) old	MATH205 - Chapter 6 Test (380900) old	MATH205 - Chapter 7 Test (380902) old	MATH205 - Chapter 8 Test (380938) old	MATH205 - Chapter 9 Test (380922) old	MATH205 - Chapter 10 Test (380929) old
0	53.13	53.13	53.13	53.13	53.13	53.13	53.13
1	47.34	48.72	38.73	34.43	30.39	46.17	47.50
2	50.63	0.00	43.09	25.18	7.49	19.87	4.25

3. The regular expression approach

When a relationship between the old and the new column names exists, but it is complicated enough that we have to use a regular expression (regex). Here we will use lambda function to implement the regex using the re module to perform complicated string manipulations.

Preparing new column names

Transforming `MATH205 - Chapter 1 and 2 Test (380919)` to `Chapter 1 and 2 Test`

We have to strip out the first part and the last part. This can be achieved using a regex. Let’s first strip the first part.

cols_test = list(map(lambda x: re.sub('MATH205 - ','', x), cols_test))

['Chapter 1 and 2 Test (380919)',
 'Chapter 3 and 4 Test (380916)',
 'Chapter 5 Test (380926)',
 'Chapter 6 Test (380900)',
 'Chapter 7 Test (380902)',
 'Chapter 8 Test (380938)',
 'Chapter 9 Test (380922)',
 'Chapter 10 Test (380929)']

Now, let’s strip the last part.

cols_test = list(map(lambda x: re.sub(' \(\d+\)','', x), cols_test))

['Chapter 1 and 2 Test',
 'Chapter 3 and 4 Test',
 'Chapter 5 Test',
 'Chapter 6 Test',
 'Chapter 7 Test',
 'Chapter 8 Test',
 'Chapter 9 Test',
 'Chapter 10 Test']

This is the new list which will replace the old column names.

Renaming columns using a dict

This part is similar to what we did in Section 2. We will create a mapping and use that mapping in the rename() function of the pandas dataframe.

mapping = {key1: key2 for key1, key2 in zip(cols_test, cols_test_new)}

{'MATH205 - Chapter 1 and 2 Test (380919)': 'Chapter 1 and 2 Test',
 'MATH205 - Chapter 3 and 4 Test (380916)': 'Chapter 3 and 4 Test',
 'MATH205 - Chapter 5 Test (380926)': 'Chapter 5 Test',
 'MATH205 - Chapter 6 Test (380900)': 'Chapter 6 Test',
 'MATH205 - Chapter 7 Test (380902)': 'Chapter 7 Test',
 'MATH205 - Chapter 8 Test (380938)': 'Chapter 8 Test',
 'MATH205 - Chapter 9 Test (380922)': 'Chapter 9 Test',
 'MATH205 - Chapter 10 Test (380929)': 'Chapter 10 Test'}