import pandas as pd
import numpy as np
Raw Data:
date lplp_returns
2018-03-14 0.000000
2018-03-15 0.000000
2018-03-16 0.000006
2018-03-19 -0.000469
2018-03-20 -0.001312
... ...
2025-04-10 0.082415
2025-04-11 0.002901
2025-04-14 0.005738
2025-04-15 0.007664
2025-04-16 0.012883
1848 rows × 1 columns
Creating groups before using groupBy helped get me going in the right direction:
start_date = pd.Timestamp('2018-04-20')
df['group'] = ((df.index - start_date).days // 365)
grouped = df.groupby('group')
result = grouped['lplp_returns']
From there, I want cumulative returns (cumprod). This, of course, is problematic because it is a groupby/transform operation.
g = result.apply(lambda x: np.cumprod(1 + x) - 1)
g.groupby('group').tail(1)
Output:
group Date
-1 2018-04-19 0.003971
0 2019-04-19 -0.077341
1 2020-04-17 -0.068972
2 2021-04-16 0.429971
3 2022-04-18 -0.024132
4 2023-04-18 0.032741
5 2024-04-17 0.190119
6 2025-04-16 0.131955
Name: lplp_returns, dtype: float64
This gets me 95% to where I want to be.
Needs for improvement:
(1) I don't want/need group '-1',
(2) I want each group to start on or after 'xxxx-04-20' not to proceed 'xxxx-04-20', and
(3) to stop on or before 'xxxx-04-20' not to exceed 'xxxx-04-20'. (This is to address trading days).
Suggestions on coding or approaching/solving this in a better way?