79585634

Date: 2025-04-22 03:17:13
Score: 0.5
Natty:
Report link

I wanted to add a related use-case here that I didn't see listed above, but this might help someone else. I often need to apply a custom function to many columns where that function itself takes multiple columns of a df, where the exact columns might be a pain to spell out or change subtly depending on the data-frame. So, same problem as OP, but where the function might be user-defined and require multiple columns.

I took the basic idea from Rajib's comment above. I wanted to post it here since, while it might be overkill for some cases, it is useful in others. In that case, you'll need apply, and you'll want to wrap the results in a pd.Series to return them as a normal-looking table.

# Toy data
import numpy as np
import pandas as pd
inc_data = {f"inc_{k}" : np.random.randint(1000, size=1000)
        for k in range(1,21)}
other_data = {f"o_{k}" : np.random.randint(1000, size=1000)
        for k in range(1,21)} # Adding unnecessary cols to simulate real-world dfs
group = {"group" :
     ["a"]*250 + ["b"]*250 + ["c"]*100 + ["d"]*400}
data = {**group, **inc_data, **other_data}
df = pd.DataFrame.from_dict(data)

# Identify needed columns
check = [c for c in df.columns if "inc" in c] # Cols we want to check
need = check + ["o_1"] # Cols we need
ref = "o_1" # Reference column
# Not an actual function I use, but just a sufficiently complicated one
def myfunc(data, x, y, n):
    return data.nlargest(n, x)[y].mean() 
df.groupby('group')[need].apply( # Use apply() to access entire groupby columns
    lambda g : pd.Series( # Use series to return as columns of a summary table
    {c : myfunc(g, c, ref, 5) # Dict comprehension to loop through many cols
    for c in check}
    ))

There might be much more performant ways to do this, but I had a hard time figuring this out. This method doesn't require more pre-defined functions than your custom function, and if the idea is just speeding up a lot of work, this is better than the manual methods of creating a Series detailed here, which has lots of good tips if the functions themselves are very different.

Reasons:
  • RegEx Blacklisted phrase (1): same problem
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: gjmb