Reports

You shouldn't run Spark inside Airflow, especially on MWAA, which uses the Celery Executor by default (tasks share the same compute). Airflow is designed for workflow orchestration, not heavy data processing. Running Spark directly within Airflow tasks will inevitably lead to resource contention and potential failures due to MWAA’s limited compute resources.

Instead, offload the Spark job to a dedicated service like AWS Glue or EMR & use the Airflow operators to trigger these services. See here for example operator for Glue.

79489427