79432239

Date: 2025-02-12 07:22:25
Score: 0.5
Natty:
Report link

Based on your question (without data and code samples) I could suggest using broadcast join the map of values for filtering.

Will Shuffling Occur? No shuffling will occur if:

The Map is broadcast to executors (e.g., using broadcast() in Spark).

Filtering is applied per partition (e.g., using filter or where with partition keys).

Data is already partitioned by the key used in the Map.

Example Code in Scala

    val df = spark.read.parquet("/path/partitioned_by_country")
    val filterMap = Map("US" -> "North America", "CA" -> "North America")
    val broadcastFilter = spark.sparkContext.broadcast(filterMap)
    val filteredDF = df.filter(col("country").isin(broadcastFilter.value.keys.toSeq: _*))
Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Low reputation (1):
Posted by: Alexei Sosenkov