79294200

Date: 2024-12-19 12:08:08
Score: 1.5
Natty:
Report link

@G.Shand I saw your answer as I was facing similar issue while running my DBT job via GlueSession. Considering there is huge data lowering resource is failing with memory issue.

If I increase worker-type then too I need to set minimum 2 workers which again increases the DPU and causing failure.

Is there something I can do to play around with Spark configuration?

UseCase: I am trying to merge in iceberg table when I faced this issue. Config Passed: spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Also tried reducing max connection:

sc._jsc.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
sc._jsc.hadoopConfiguration().setInt("fs.s3a.connection.maximum", 100)
sc._jsc.hadoopConfiguration().set("fs.s3a.fast.upload", "true")
sc._jsc.hadoopConfiguration().set("fs.s3a.fast.upload.buffer", "bytebuffer")
sc._jsc.hadoopConfiguration().setInt("fs.s3a.connection.timeout", 500)
sc._jsc.hadoopConfiguration().setInt("fs.s3a.connection.acquisition.timeout", 500)

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html

Reasons:
  • Blacklisted phrase (0.5): I need
  • Blacklisted phrase (1): I am trying to
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Low reputation (1):
Posted by: barney_st_er