Reports

The issue was due to missing Spark configurations. While Spark executed the SELECT query, it didn’t leverage partitioning until I explicitly set:

sparkConf.set("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension");
sparkConf.set("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog");

Why This Fix Works:

spark.sql.extensions: Enables Delta Lake optimizations.
spark.sql.catalog.spark_catalog: Ensures Spark treats the table as a Delta table, allowing partition pruning.

Without these, Spark treated the table as a generic Parquet table, ignoring partitioning. After adding them, partition filters appeared in the physical plan, improving query efficiency. Zeppelin worked because it had these configs set.

Unresolved Issue: DELETE Queries

However, DELETE queries still don’t apply partition pruning, even with these configs. The physical plan remains the same, causing a full table scan. Is this expected behavior in Delta Lake, or is there a way to optimize DELETE operations?

Would appreciate any insights!

79425284

Why This Fix Works:

Unresolved Issue: DELETE Queries