Aggregation functions like mean, median, etc. can be modified using the pandas apply() function.
For example, the common pandas mean
method finds the mean for the entire data series.
We can extend this functionality to find the mean of a 95%-quantile range instead of the entire data range.
We will call this trimmed_mean.
First we write the fucntion which is an extension of the standard mean
.
def trimmed_mean(x: pd.Series, trim=0.05):
if trim>1 or trim<0:
print("Error: Trim value must be between 0 and 1")
return x
if not isinstance(x, pd.Series):
return [np.nan]
try:
x = x.astype(float)
except:
return [np.nan]
q1, q2 = np.quantile(x, [trim/2, 1-trim/2])
mask = (x >= q1) & (x <= q2)
tm = x[mask].mean()
return tm
We can apply this function column by column
df = pd.read_csv('https://raw.githubusercontent.com/saugatach/WQU/main/MScFE%20650%20MLF%20GWP%20Data.csv')
trimmed_mean(df['MSCI USA'])
2523.87
We can also apply trimmed_mean
to the entire dataframe using the apply
method.
df.apply(trimmed_mean, trim=0.25)
LUXXX 1489.462043
MSCI ARGENTINA 2477.794043
BLP ORIENTE MEDIO 3087.156948
The above piece of code calculates the mean of 75-percentile of the data i.e. mean of data in the top 75%. This method is useful when we want to know how each quantile performed.
How did the top 75% perform?
Trim off the bottom 25% and find the mean of the rest df.apply(trimmed_mean, trim=0.25)
.
How did the top 10% perform?
Trim off the bottom 90% and find the mean of the rest df.apply(trimmed_mean, trim=0.9)
.
The behaviour of the modified trimmed_mean
is similar to the pandas in-built mean
method. However, we are
able to achieve a substantial amount of customization. Note the keyword argument trim
to the method
trimmed_mean(x, trim)
is passed as a **kwargs
following the definition.
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
Reference
Pandas API reference