How to Install Apache Spark on Ubuntu 26.04

Apache Spark is a powerful unified analytics engine designed for large-scale data processing. Whether you are working with batch data, real-time streams, or machine learning pipelines, Spark provides a fast and flexible framework. In this tutorial, you will learn how to install Apache Spark on Ubuntu 26.04, configure it for local development, and verify the installation with practical examples in both Scala and Python. Since Spark requires a compatible Java version, this guide also covers the necessary Java setup before proceeding with the install Apache Spark Ubuntu 26.04 process.

In this tutorial you will learn:

  • How to install the correct Java version required by Apache Spark on Ubuntu 26.04
  • How to download and install Apache Spark 4.1.1
  • How to configure SPARK_HOME and related environment variables
  • How to verify the installation using the Spark Shell (Scala) and PySpark (Python)
  • How to run a sample Spark application using spark-submit
Abstract illustration representing Apache Spark data processing installation on Ubuntu Linux with data flow and processing elements
Installing and configuring Apache Spark on Ubuntu 26.04

Software Requirements

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Ubuntu 26.04 Resolute Raccoon
Software Apache Spark 4.1.1, OpenJDK 17 or 21
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user
TL;DR
Install Java 17 or 21, download the Spark binary, extract it, and set environment variables.

Quick Steps to Install Apache Spark on Ubuntu 26.04
Step Command/Action
1. Install Java $ sudo apt install openjdk-21-jdk
2. Download and extract Spark $ wget https://dlcdn.apache.org/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz && tar xzf spark-4.1.1-bin-hadoop3.tgz
3. Move to /opt and set environment $ sudo mv spark-4.1.1-bin-hadoop3 /opt/spark
4. Verify installation $ spark-shell

Prerequisites: Install Java for Apache Spark on Ubuntu 26.04

Apache Spark 4.1.1 requires Java 17 or 21 to run. Ubuntu 26.04 ships with OpenJDK 25 as the default Java version, which is not officially supported by Spark at this time. Therefore, you must install a specific Java version on Ubuntu 26.04 before proceeding.

  1. Install OpenJDK 21: This is the recommended version for Spark 4.x.
    $ sudo apt update
    $ sudo apt install openjdk-21-jdk
  2. Set OpenJDK 21 as the active version: If you have multiple Java versions installed, switch to OpenJDK 21 using update-alternatives.
    $ sudo update-alternatives --config java
    $ sudo update-alternatives --config javac

    Select OpenJDK 21 from the list for both commands.

  3. Set JAVA_HOME: If you have not already configured JAVA_HOME, set it now. For a detailed walkthrough, see our guide on how to install Java on Ubuntu 26.04.
    $ echo 'export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64' >> ~/.bashrc
    $ source ~/.bashrc
  4. Verify Java:
    $ java -version

IMPORTANT
Apache Spark 4.x officially supports Java 17 and 21. While OpenJDK 25 (the Ubuntu 26.04 default) may work in some cases, it is not officially supported and may produce unexpected errors. Use OpenJDK 21 for the best compatibility.

Terminal showing JAVA_HOME set to java-21-openjdk-amd64 and java -version confirming OpenJDK 21.0.11-ea on Ubuntu 26.04
Setting up OpenJDK 21 and JAVA_HOME as a prerequisite for Apache Spark installation

Download and Install Apache Spark on Ubuntu 26.04

Apache Spark is distributed as a pre-built binary archive. The installation process consists of downloading the archive, extracting it, and placing it in a suitable location on your system.

  1. Download the Spark binary: Fetch the latest stable release of Apache Spark 4.1.1 pre-built with Hadoop 3 support.
    $ wget https://dlcdn.apache.org/spark/spark-4.1.1/spark-4.1.1-bin-hadoop3.tgz
  2. Extract the archive:
    $ tar xzf spark-4.1.1-bin-hadoop3.tgz
  3. Move Spark to /opt: Place the Spark installation in the /opt directory, which is the conventional location for optional software on Linux.
    $ sudo mv spark-4.1.1-bin-hadoop3 /opt/spark
  4. Set ownership (optional): If you want your regular user to manage the Spark installation without sudo:
    $ sudo chown -R $USER:$USER /opt/spark
Terminal showing Apache Spark 4.1.1 download, extraction, and installation to /opt/spark on Ubuntu 26.04
Downloading Apache Spark 4.1.1, extracting the archive, and moving it to /opt/spark

Configure Environment Variables for Apache Spark

To use Spark commands from any directory, you need to set the SPARK_HOME environment variable and add the Spark binaries to your PATH. This is an essential step to install Apache Spark on Ubuntu 26.04 properly.

  1. Add Spark environment variables to your shell profile: Append the following lines to your ~/.bashrc file.
    $ cat << 'EOF' >> ~/.bashrc
    export SPARK_HOME=/opt/spark
    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
    EOF
  2. Apply the changes:
    $ source ~/.bashrc
  3. Verify the environment: Confirm that the SPARK_HOME variable is set and the Spark binaries are accessible.
    $ echo $SPARK_HOME
    $ spark-shell --version
Terminal showing SPARK_HOME configuration and spark-shell --version confirming Apache Spark 4.1.1 with Scala 2.13.17 and OpenJDK 21 on Ubuntu 26.04
Setting SPARK_HOME and PATH environment variables and verifying Spark 4.1.1 installation

Verify Apache Spark Installation with Spark Shell (Scala)

The Spark Shell provides an interactive Scala REPL that connects to a local Spark instance. This is the quickest way to verify that your installation is working correctly.

  1. Launch the Spark Shell:
    $ spark-shell

    After a few moments, you will see the Spark logo and the scala> prompt, indicating that Spark is running.

  2. Run a basic test: Create a simple dataset and perform a filter operation to confirm Spark is processing data correctly.
    val data = spark.range(1, 100)
    data.filter($"id" % 2 === 0).count()

    This creates a dataset of numbers 1 through 99 and counts the even numbers. The expected result is 49.

  3. Exit the Spark Shell:
    :quit
Terminal showing Spark Shell 4.1.1 launch with Scala 2.13.17 and a test query filtering even numbers returning 49 on Ubuntu 26.04
Running a basic data filter test in the Spark Shell to verify the Apache Spark 4.1.1 installation

Verify Apache Spark Installation with PySpark (Python)

PySpark is the Python API for Apache Spark. It is included in the Spark distribution, so no additional installation is required. Consequently, you can use PySpark immediately after the install Apache Spark Ubuntu 26.04 process is complete.

  1. Launch PySpark:
    $ pyspark

    You will see the Spark logo and the Python >>> prompt.

  2. Run a basic test: Perform the same filter operation using the Python API.
    df = spark.range(1, 100)
    df.filter(df.id % 2 == 0).count()

    The expected result is 49, confirming that PySpark is functioning correctly.

  3. Exit PySpark:
    exit()
Terminal showing PySpark 4.1.1 launch with Python 3.14.3 and a test query filtering even numbers returning 49 on Ubuntu 26.04
Running a basic data filter test in PySpark to verify the Apache Spark 4.1.1 installation

Run a Spark Application with spark-submit

In addition to the interactive shells, Spark includes a spark-submit command for running standalone applications. Apache Spark ships with several example programs that you can use to verify your setup further.

  1. Run the Pi estimation example: This bundled example uses a Monte Carlo method to estimate the value of Pi.
    $ spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.13-4.1.1.jar 10

    After the computation completes, look for the line containing Pi is roughly 3.14 in the output.

COMPLETED
If you see the Pi estimation result, your Apache Spark installation on Ubuntu 26.04 is fully operational. You are now ready to develop and run Spark applications locally.

Terminal showing spark-submit SparkPi example output with Pi is roughly 3.1387 highlighted on Ubuntu 26.04
Running the bundled SparkPi example to verify the Apache Spark 4.1.1 installation

Conclusion

You have successfully installed Apache Spark on Ubuntu 26.04 and verified the installation using the Spark Shell, PySpark, and spark-submit. With Spark configured in /opt/spark and the environment variables set, you can now develop data processing applications, run machine learning pipelines, or experiment with Spark SQL from your local machine.

Keep in mind that Apache Spark 4.x requires Java 17 or 21. If you need to manage multiple Java versions, our guide on how to install a specific Java version on Ubuntu 26.04 covers the full process. For more detailed information about Spark configuration and usage, consult the official Apache Spark documentation.

Frequently Asked Questions

  1. Does Apache Spark work with OpenJDK 25 on Ubuntu 26.04? Apache Spark 4.x officially supports Java 17 and 21. While OpenJDK 25 may work for basic tasks, it is not officially tested or supported by the Spark project. For production use and to avoid unexpected issues, install OpenJDK 21 using sudo apt install openjdk-21-jdk.
  2. Do I need Hadoop installed to run Apache Spark? No. The pre-built Spark binary (spark-4.1.1-bin-hadoop3.tgz) bundles the necessary Hadoop client libraries. You can run Spark in standalone or local mode without a separate Hadoop installation.
  3. How do I update Apache Spark to a newer version? Download the new version, extract it, and replace the contents of /opt/spark. Since the SPARK_HOME environment variable points to this directory, no further configuration changes are needed.
  4. What is the difference between spark-shell and PySpark? spark-shell is the interactive Scala REPL for Spark, while pyspark is the Python equivalent. Both connect to a local Spark instance and provide full access to the Spark API in their respective languages.
  5. Can I install PySpark separately using pip? Yes. You can run pip install pyspark to install PySpark as a Python package. However, this is primarily intended for connecting to an existing Spark cluster. For local development with the full Spark distribution, the manual installation described in this guide is recommended.