The expr function doesn't automatically translate the Python in
operator to its SQL equivalent when working with array types. The standard Spark SQL function for checking if an element exists in an array is array_contains
.
You should be able to fix by using array_contains
within your filter expression.
Pseudocode
from pyspark.sql import functions as F
df = df.withColumn(
'target_events',
F.expr('filter(events, x -> array_contains(target_ids, x.id))')
)
I don't know if this tutorial may be useful, but I'll link it anyway^^: