I wanted to add a related use-case here that I didn't see listed above, but this might help someone else. I often need to apply a custom function to many columns where that function itself takes multiple columns of a df
, where the exact columns might be a pain to spell out or change subtly depending on the data-frame. So, same problem as OP, but where the function might be user-defined and require multiple columns.
I took the basic idea from Rajib's comment above. I wanted to post it here since, while it might be overkill for some cases, it is useful in others. In that case, you'll need apply
, and you'll want to wrap the results in a pd.Series
to return them as a normal-looking table.
# Toy data
import numpy as np
import pandas as pd
inc_data = {f"inc_{k}" : np.random.randint(1000, size=1000)
for k in range(1,21)}
other_data = {f"o_{k}" : np.random.randint(1000, size=1000)
for k in range(1,21)} # Adding unnecessary cols to simulate real-world dfs
group = {"group" :
["a"]*250 + ["b"]*250 + ["c"]*100 + ["d"]*400}
data = {**group, **inc_data, **other_data}
df = pd.DataFrame.from_dict(data)
# Identify needed columns
check = [c for c in df.columns if "inc" in c] # Cols we want to check
need = check + ["o_1"] # Cols we need
ref = "o_1" # Reference column
# Not an actual function I use, but just a sufficiently complicated one
def myfunc(data, x, y, n):
return data.nlargest(n, x)[y].mean()
df.groupby('group')[need].apply( # Use apply() to access entire groupby columns
lambda g : pd.Series( # Use series to return as columns of a summary table
{c : myfunc(g, c, ref, 5) # Dict comprehension to loop through many cols
for c in check}
))
There might be much more performant ways to do this, but I had a hard time figuring this out. This method doesn't require more pre-defined functions than your custom function, and if the idea is just speeding up a lot of work, this is better than the manual methods of creating a Series detailed here, which has lots of good tips if the functions themselves are very different.