Enterprise Java

Getting Started with Distributed Job Scheduling Using ElasticJob

ElasticJob is a distributed job scheduling framework for Java within the Apache ShardingSphere ecosystem. It uses Apache ZooKeeper for distributed coordination. ElasticJob enables features such as job sharding, failover handling, and elastic scaling, allowing scheduled tasks to run efficiently across multiple nodes. In this article, we explore how ElasticJob works and show how to configure and run a distributed job.

1. Understanding ElasticJob Architecture

ElasticJob is designed to distribute scheduled tasks across multiple nodes in a cluster. Instead of a single application instance executing all scheduled tasks, ElasticJob divides the work into shards and distributes them across available nodes. This allows large workloads to be processed in parallel while maintaining reliability and scalability.

ElasticJob relies on several components:

  • Scheduler: triggers jobs based on cron expressions
  • Registry Center: coordinates distributed nodes
  • Job Sharding: splits jobs into smaller tasks
  • Failover Mechanism: redistributes tasks when nodes fail
  • Elastic Scaling: automatically rebalances workloads when nodes join or leave the cluster

The coordination between nodes is managed through Apache ZooKeeper, which stores job metadata and cluster state information.

1.1 Job Types in ElasticJob

ElasticJob supports several job types that allow developers to execute scheduled tasks in different ways depending on the application requirements. These job types make it possible to run Java logic, process data streams, execute scripts, or trigger remote services. The main job types include Simple Jobs, Dataflow Jobs, Script Jobs, and HTTP Jobs.

Simple Job

A Simple Job is the most basic job type in ElasticJob. It executes a single task whenever the scheduler triggers the job according to the configured cron expression. Each shard runs independently, allowing different nodes in the cluster to process separate shards in parallel. Simple jobs are commonly used for tasks such as sending notifications, generating reports, or performing scheduled maintenance tasks.

Dataflow Job

A Dataflow Job is designed for scenarios where data needs to be processed in batches or continuously. The job repeatedly fetches and processes data until no more data is available. This type is useful for large-scale processing tasks such as batch record processing, ETL pipelines, or message queue consumption.

Script Job

A Script Job allows ElasticJob to execute external scripts instead of Java code. The scheduler runs a script file based on the configured schedule. This job type is useful for system-level operations such as backups, file cleanup, or infrastructure automation. It also allows existing scripts written in languages like Bash or Python to be integrated into the scheduling system without rewriting them in Java.

HTTP Job

An HTTP Job allows ElasticJob to send an HTTP request when a job is triggered. Instead of executing local code, the scheduler calls a remote API endpoint. This job type is useful in microservices architectures where scheduled tasks trigger actions in other services.

2. Project Setup

We first need to add the required dependency to our project. The elasticjob-bootstrap maven module simplifies configuration by providing bootstrap classes that initialize the scheduler and connect to the registry center.

<dependency>
    <groupId>org.apache.shardingsphere.elasticjob</groupId>
    <artifactId>elasticjob-bootstrap</artifactId>
    <version>3.0.5</version>
    <scope>compile</scope>
</dependency>

3. Creating the Job Implementation

ElasticJob jobs are implemented by creating a class that implements the SimpleJob interface. This interface defines a single execute method that runs when the job is triggered.

public class DataProcessingJob implements SimpleJob {

    @Override
    public void execute(ShardingContext shardingContext) {

        System.out.println("Job executed on shard: " + shardingContext.getShardingItem());

        System.out.println("Total shards: " + shardingContext.getShardingTotalCount());

        System.out.println("Job parameter: " + shardingContext.getJobParameter());

        // Simulate processing
        System.out.println("Processing data on node...");
    }
}

This class defines the actual task executed by the scheduler. The ShardingContext provides information about the shard being processed, including the shard index and total number of shards. When the job runs across multiple nodes, each node processes a different shard, enabling parallel processing.

4. Configuring the Registry Center

ElasticJob requires a registry center to coordinate job execution across nodes. A commonly used registry center is Apache ZooKeeper.

public class RegistryCenterConfig {

    public static ZookeeperRegistryCenter createRegistryCenter() {

        ZookeeperConfiguration configuration = new ZookeeperConfiguration("localhost:2181", "elasticjob-demo");
        ZookeeperRegistryCenter registryCenter = new ZookeeperRegistryCenter(configuration);
        registryCenter.init();

        return registryCenter;
    }
}

This configuration connects the application to a ZooKeeper instance running on localhost:2181. The namespace elasticjob-demo is used to isolate job metadata in ZooKeeper. When multiple application instances start, they register themselves in this registry center and coordinate job execution through it.

5. Configuring the Job Scheduler

Once the registry center is configured, the next step is to define the job configuration and scheduler. This includes specifying the cron expression, sharding configuration, and job parameters.

public class ElasticJobConfig {

    public static void startJob() {

        ZookeeperRegistryCenter registryCenter = RegistryCenterConfig.createRegistryCenter();

        JobConfiguration jobConfiguration = JobConfiguration.newBuilder("dataProcessingJob", 3)
                .cron("0/10 * * * * ?")
                .shardingItemParameters("0=A,1=B,2=C")
                .jobParameter("ElasticJob Demo")
                .build();

        ScheduleJobBootstrap bootstrap = new ScheduleJobBootstrap(registryCenter, new DataProcessingJob(), jobConfiguration);

        bootstrap.schedule();
    }
}

This configuration defines the scheduling behaviour of the job. The JobConfiguration builder specifies the job name, the number of shards, and the cron schedule. In this example, the job runs every 10 seconds and is divided into three shards. Each shard can be executed by different nodes in the cluster.

Below is the main class that starts the scheduler and initializes the job configuration.

public class ElasticjobDistributedScheduler {

    public static void main(String[] args) {

        ElasticJobConfig.startJob();
        System.out.println("ElasticJob scheduler started...");
    }
}

This class starts the application and triggers the job configuration initialization. Once the application runs, it connects to ZooKeeper, registers the node, and schedules the job according to the defined cron expression.

Running the Distributed Job

To simulate a distributed environment, we need to run ZooKeeper and start multiple instances of the application. The easiest way to run ZooKeeper is using Docker.

docker run -d --name zookeeper -p 127.0.0.1:2181:2181 zookeeper

This command pulls and runs the ZooKeeper image in a Docker container, exposing port 2181 for client connections.

To simulate multiple nodes in the cluster, open multiple terminal windows and run the Java application in each. Each instance will connect to the same ZooKeeper registry center and coordinate job execution.

java -jar target/elasticjob-distributed-scheduler-1.0.jar

Repeat the above command in separate terminals for at least three instances to simulate a distributed environment.

Example Output

When multiple instances are running, ElasticJob distributes shards across nodes. Example output from different instances might look like this:

Instance 1:

ElasticJob scheduler started...
Job executed on shard: 0
Total shards: 3
Job parameter: ElasticJob Demo
Processing data on node...

Instance 2:

ElasticJob scheduler started...
Job executed on shard: 1
Total shards: 3
Job parameter: ElasticJob Demo
Processing data on node...

Instance 3:

ElasticJob scheduler started...
Job executed on shard: 2
Total shards: 3
Job parameter: ElasticJob Demo
Processing data on node...

Each node handles a different shard, demonstrating distributed job execution. If one instance stops, ElasticJob will automatically reassign its shard to another active instance.

6 Conclusion

In this article, we explored how to implement distributed job scheduling using ElasticJob. We created a complete example that included configuring the registry center, implementing a job, defining scheduling configuration, and running the application using the elasticjob-bootstrap maven module.

By combining job sharding, distributed coordination through Apache ZooKeeper, and flexible scheduling capabilities, ElasticJob provides a powerful solution for executing background tasks across multiple nodes in a distributed system.

7. Download the Source Code

This article explored job scheduling in Java using ElasticJob.

Download
You can download the full source code of this example here: java elasticjob scheduling

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button