DevOps

Configure the Number of Spark Executors

Configuring the Apache Spark executors amount correctly is essential for building scalable and high-performance data processing applications. Let us delve into understanding how executors impact performance, resource utilization, and parallel processing in Spark.

1. Overview of Apache Spark and Executor Configuration

Apache Spark is a powerful distributed data processing framework designed for large-scale data computation. It provides high-level APIs and an optimized execution engine for batch and stream processing. One of the most critical factors influencing Spark application performance is how executors are configured.

Executors are the backbone of Spark’s distributed execution model. They are responsible for running tasks, caching data in memory, handling shuffle operations, and communicating results back to the driver. Poor executor configuration can lead to issues such as memory bottlenecks, excessive garbage collection, or underutilized cluster resources. Setting the right number of executors and tuning their resources ensures:

  • Efficient utilization of cluster hardware
  • Reduced job execution time
  • Better fault tolerance and stability
  • Optimized parallel processing

1.1 What Are Spark Executors?

A Spark executor is a JVM process launched on a worker node within the cluster. Each executor is dedicated to a specific Spark application and remains active throughout the application’s lifecycle. Executors perform the following responsibilities:

  • Execute tasks assigned by the driver program
  • Store intermediate and cached data in memory (RDD/DataFrame caching)
  • Perform shuffle read/write operations
  • Report execution status and metrics back to the driver

Each executor consists of multiple components:

  • Executor Memory – Amount of RAM allocated for computation and caching
  • Executor Cores – Number of CPU cores available for parallel task execution
  • Executor Instances – Total number of executors running in the application
  • Memory Overhead – Extra memory allocated for JVM overhead, shuffle buffers, and off-heap storage

The total parallelism of a Spark application is calculated as:

Total Parallelism = Number of Executors × Cores per Executor

Additionally, Spark internally uses task slots, where each core represents one slot capable of running one task at a time.

1.1.1 Executor Configuration Considerations

  • Too few executors → underutilized cluster
  • Too many executors → scheduling overhead and increased shuffle cost
  • Too many cores per executor → increased garbage collection pauses
  • Too little memory → frequent disk spills and slow performance

1.2 Static Executor Allocation

In static allocation, the number of executors is fixed at the start of the Spark application. This approach is simple and works well for predictable workloads. You can define executor settings using command-line arguments:

--num-executors 5
--executor-cores 2
--executor-memory 4G

Or via configuration properties:

spark.executor.instances=5
spark.executor.cores=2
spark.executor.memory=4g

1.2.1 How Static Allocation Works

  • Executors are allocated once at application startup
  • No scaling happens during runtime
  • Resources remain reserved even if idle

1.2.2 Benefits of Static Allocation

  • Predictable and stable resource usage
  • Easy to debug and monitor
  • No runtime overhead of scaling

1.2.3 Limitations of Static Allocation

  • Can lead to resource underutilization
  • Not suitable for variable or unpredictable workloads
  • May increase cost in cloud environments

1.2.4 When to Use Static Allocation

  • Batch jobs with fixed input sizes
  • Dedicated clusters
  • Performance-critical workloads requiring stability

1.3 Dynamic Executor Allocation

Dynamic allocation enables Spark to automatically scale the number of executors based on workload demand. This feature is especially useful in multi-tenant or cloud environments.

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=10
spark.dynamicAllocation.initialExecutors=2

1.3.1 How Dynamic Allocation Works

  • Executors are added when there is a backlog of pending tasks
  • Executors are removed when they remain idle for a configured duration
  • Relies on external shuffle service to preserve shuffle data

1.3.2 Benefits of Dynamic Allocation

  • Improved resource utilization
  • Automatic scaling based on workload
  • Cost-efficient in cloud environments

1.3.3 Limitations of Dynamic Allocation

  • Slight latency during executor ramp-up
  • Requires proper configuration of shuffle service
  • Debugging can be more complex

1.3.4 When to Use Dynamic Allocation

  • Streaming jobs with fluctuating loads
  • Shared clusters (YARN, Kubernetes)
  • Ad-hoc or exploratory workloads

1.4 Executors and Parallel Processing

Executors directly control the level of parallelism in a Spark application. Higher parallelism enables faster processing, but it must be balanced carefully.

  • More executors → higher concurrency
  • More cores per executor → fewer executors but heavier JVMs

However, excessive parallelism can lead to:

  • Task scheduling overhead
  • Network congestion during shuffle
  • Memory pressure and GC overhead

1.4.1 Executor Configuration Best Practices

  • Use 2–5 cores per executor for optimal balance
  • Avoid very large executors (reduces GC impact)
  • Reserve 1 core per node for OS and cluster manager
  • Allocate memory considering both execution and storage
  • Keep total tasks at least 2–3x of total cores for better utilization

1.4.2 Executor and Parallelism Calculation Example

Cluster: 5 nodes
Each node: 16 cores

Reserve 1 core per node → usable cores per node = 15
Total usable cores = 5 × 15 = 75

Executor cores = 5

Executors per node = 15 / 5 = 3
Total executors = 5 × 3 = 15

Total parallelism = 15 × 5 = 75 tasks

1.4.3 Memory Calculation Example

Node memory = 64 GB
Reserve 1 GB for OS → usable = 63 GB

Executors per node = 3
Memory per executor = 63 / 3 = 21 GB

Recommended:
spark.executor.memory = 18G
spark.executor.memoryOverhead = 3G

2. Executor Memory Calculation Example

// SparkExecutorExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class SparkExecutorExample {

    public static void main(String[] args) {

        SparkConf conf = new SparkConf()
                .setAppName("Executor Configuration Example")
                .setMaster("local[*]")
                .set("spark.executor.instances", "4")
                .set("spark.executor.cores", "2")
                .set("spark.executor.memory", "2g");

        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<Integer> numbers = sc.parallelize(
                java.util.Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8), 4);

        JavaRDD<Integer> squared = numbers.map(x -> {
            System.out.println("Processing: " + x + 
                " in Thread: " + Thread.currentThread().getName());
            return x * x;
        });

        System.out.println("Output:");
        for (Integer num : squared.collect()) {
            System.out.println(num);
        }

        sc.close();
    }
}

This Java program demonstrates how Apache Spark executors are configured and used in a simple distributed computation. The application begins by creating a SparkConf object, where the application name is set and the execution mode is defined as local[*], meaning it will run locally using all available CPU cores (in a real cluster, this would be replaced with a cluster manager like YARN or Kubernetes). The configuration explicitly sets spark.executor.instances=4, spark.executor.cores=2, and spark.executor.memory=2g, which means Spark will use 4 executors, each capable of running 2 parallel tasks with 2 GB of memory. A JavaSparkContext is then created to initialize the Spark execution environment. Next, a JavaRDD named numbers is created using the parallelize method, distributing a list of integers across 4 partitions, which directly impacts parallelism. The program then applies a map transformation to compute the square of each number; inside this transformation, a print statement logs which thread is processing each element, helping illustrate how tasks are executed in parallel across executor threads. The collect() action triggers the execution of the transformation and brings the results back to the driver program, where they are printed to the console. Finally, the Spark context is closed using sc.close() to release resources. Overall, this example highlights how executor configuration, partitioning, and transformations work together to enable parallel processing in Spark.

2.1 Code Output

Processing: 1 in Thread: ForkJoinPool.commonPool-worker-1
Processing: 2 in Thread: ForkJoinPool.commonPool-worker-3
Processing: 3 in Thread: ForkJoinPool.commonPool-worker-2
Processing: 4 in Thread: ForkJoinPool.commonPool-worker-1
Processing: 5 in Thread: ForkJoinPool.commonPool-worker-3
Processing: 6 in Thread: ForkJoinPool.commonPool-worker-2
Processing: 7 in Thread: ForkJoinPool.commonPool-worker-1
Processing: 8 in Thread: ForkJoinPool.commonPool-worker-3

Output:
1
4
9
16
25
36
49
64

The output shows how Spark processes data in parallel using multiple threads (representing executor task slots). When the map transformation is executed, each number is processed independently, and the log statement prints the number being processed along with the thread name. Since the application is running in local[*] mode, Spark uses all available CPU cores on the machine, and tasks are distributed across multiple worker threads (e.g., ForkJoinPool threads). The order of “Processing” logs may vary between runs because parallel execution is non-deterministic. After all tasks complete, the collect() action gathers the squared results back to the driver, which are then printed in order. The final output confirms that each input number has been squared correctly, demonstrating how Spark distributes computation across executors/threads while maintaining the logical result order.

3. Conclusion

Setting the right number of Spark executors is critical for achieving optimal performance. Static allocation provides predictability, while dynamic allocation offers flexibility and efficiency. Understanding how executors impact parallelism helps in designing scalable and high-performance Spark applications. Always tune executor settings based on workload characteristics, cluster capacity, and job requirements to get the best results.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button