79603136

Date: 2025-05-02 09:47:05
Score: 1.5
Natty:
Report link

You can also use group aggregation functionality (see also https://arrow.apache.org/docs/python/compute.html#grouped-aggregations):

import numpy as np
import pyarrow as pa
from typing import Literal

def deduplicate(table: pa.Table, keys: str | list[str], op: Literal["one", "first", "last"]="one") -> pa.Table:
    table=table.append_column('__index__', pa.array(np.arange(len(dt))))
    grps=table.group_by(keys, use_threads=(op == "one")).aggregate([('__index__', op)])
    table=table.take(grps['__index___'+op])
    return table.drop_columns(['__index__'])
Reasons:
  • Probably link only (1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: S165