79392414

Date: 2025-01-27 23:26:15
Score: 1
Natty:
Report link

You can construct a new (flattened) dtype for the array and reinterpret the data based on the new type.

def flatten_dtype(dtype: np.dtype, join_char = '_'):
    if not dtype.fields:
        # Not a structured type
        return dtype
    fields = {
        'names': [],
        'formats': [],
        'offsets': [],
        'itemsize': dtype.itemsize
    }
    for field_name, (field_dtype, field_offset) in dtype.fields.items():
        flattened_dtype = flatten_dtype(field_dtype)
        if not flattened_dtype.fields:
            # Field is not a structured type, just add it as is
            fields['names'].append(field_name)
            fields['formats'].append(field_dtype)
            fields['offsets'].append(field_offset)
            continue
        for flattened_field_name, (flattened_field_dtype, flattened_field_offset) in flattened_dtype.fields.items():
            # Field is a structured type, so break it down into its subtypes
            fields['names'].append(f'{field_name}{join_char}{flattened_field_name}')
            fields['formats'].append(flattened_field_dtype)
            fields['offsets'].append(field_offset + flattened_field_offset)
    
    return np.dtype(fields)

In the given example, this could be used as follows:

import pandas as pd

# print(example)
# print(example.dtype)

flattened_example = example.view(flatten_dtype(example.dtype))
# print(flattened_example)
# print(flattened_example.dtype)

df = pd.DataFrame(flattened_example)

print(df)

which gives output as desired:

   state  variability  target  measured_mean  measured_low  measured_hi  var_mid  var_low  var_hi
0    4.0          0.0    0.51           0.52          0.41         0.68      0.6      0.2     0.2
1    5.0          0.0    0.89           0.80          0.71         1.12      0.6      0.2     0.2
2    4.0         -1.0    0.59           0.62          0.46         0.78      0.6      0.2     0.2
3    5.0         -1.0    0.94           1.10          0.77         1.19      0.6      0.2     0.2

This solution has the advantage of only operating on the type of the array, rather than its contents. This will likely be more efficient for large arrays than any solution that treats columns individually.

Reasons:
  • Blacklisted phrase (1.5): any solution
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: ajprieger