79149819

Date: 2024-11-02 02:09:58
Score: 0.5
Natty:
Report link

The idea is in order, however you could apply this approach:

  1. Group Level 1 - Group by number of rows: This is straight - forward and effective. It helps ensures that the sample includes dataframes of different lengths, which could represent various intervals or scales.
  2. Group Level 2 - Group by days using DBSCAN: DBSCAN clustering for days is a good choice because of its robustness for non-uniform data distribution, including arbitrary shapes.
  3. Group Level 3 - Group by parameter values using DBSCAN: Your choice on this is also good. DBSCAN will capture clusters of similar parameter values. Feature scaling be applied (you may try MinMaxScaler) to ensure that each parameter p1, p2, p3 are treated equally in terms of clustering influence. Note that DBSCAN is sensitive to the scale of the input data.
  4. Group Level 4 - Sampling from GL3 Groups: Use Random Sampling for GL3 group to ascertain broad representation. In order to prevent oversampling from certain clusters, diversity could be a choice. To achieve this, you could stratify with additional metadata (you could possibly use avearage or range of days within each GL3 group).
Reasons:
  • Long answer (-1):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Oladimeji Kazeem