79199576

Date: 2024-11-18 10:40:28
Score: 1.5
Natty:
Report link

I understood the reason/cause for the error.

We are experimenting on dataproc serverless and it is setting spark.sql.autoBroadcastJoinThreshold=16g. Because of this, joins are resulting in broadcast. But in the spark code,here, is checking for limit of 8GB. This check results in failure as the broadcasted data is obviously more than 8GB.

Ideally spark should have a default max for spark.sql.autoBroadcastJoinThreshold (=8gb). Anything higher should get reset to 8gb.

Reasons:
  • No code block (0.5):
  • Low reputation (1):
Posted by: Chari