That's a good question, let me first clarify something:
- Executors are independent JVM processes that execute tasks in isolation to interfere in each other tasks, when you share data between them, the executors would require inter-process communication (IPC), causing bottlenecks due complexity.
- When Spark broadcast data, it guarantees more consistency if one executor broke the data by itself. If it was shared data, this data would be inconsistent for all executors.
- Spark is based on fault-tolerance, so if one task fails, Spark will
re-execute again this task on a new executor with available resources,
keeping a broadcast data copy on each executor simplifies this
process of execution again.
Why can't it use off-heap space?
- Performance: Accessing off-heap memory could be slower tha keeping in a local copy, due IPC or other problems.
- Lack of Built-In Mechanisms: Spark's core design revolves around executors as the primary unit of memory management, sharing data directly between executors on the same node would require significant architectural changes to Spark's memory management system.
- Compability issues with differents Cluster Managers: Introducing node-level shared memory would require modifications and support from each of these environments.
Alternativies:
- Distributed systems, like HDFS.
- Distributed Caching, like Redis/Memcached.
- Custom Shared Memory, like some files to read.
Resources used:
1 - https://blog.devgenius.io/spark-on-heap-and-off-heap-memory-27b625af778b
2 - https://spark.apache.org/docs/latest/tuning.html