Optimize MongoStepExecutionDao#getLastStepExecution to filter and sort at the database level

## Problem

  `MongoStepExecutionDao#getLastStepExecution` currently performs filtering and
  sorting in Java after loading all step executions for the job instance into
  memory. This scales poorly as the number of step executions per job instance
  grows.

  ### Current behavior

  `spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/mongodb/MongoStepExecutionDao.java`
  (lines 113–147):

  ```java
  @Nullable
  @Override
  public StepExecution getLastStepExecution(JobInstance jobInstance, String stepName) {
      // TODO optimize the query
      // get all step executions
      Query query = query(where("jobInstanceId").is(jobInstance.getId()));
      List<...JobExecution> jobExecutions = this.mongoOperations.find(query, ...);
      List<...StepExecution> stepExecutions = this.mongoOperations.find(
          query(where("jobExecutionId").in(jobExecutions.stream()
              .map(...::getJobExecutionId).toList())),
          ...StepExecution.class, STEP_EXECUTIONS_COLLECTION_NAME);
      // sort step executions by creation date then id (see contract) and return the last one
      Optional<...StepExecution> lastStepExecution = stepExecutions.stream()
          .filter(stepExecution -> stepExecution.getName().equals(stepName))
          .max(Comparator
              .comparing(...::getCreateTime)
              .thenComparing(...::getStepExecutionId));
      ...
  }
  ```

  Specifically:

  1. The server returns **all** step executions belonging to the job instance
     (no `name` filter applied on the server).
  2. All returned rows are held in the JVM heap.
  3. The full list is filtered by `stepName` and fully sorted by
     `(createTime, stepExecutionId)` in Java.

  Only a single row is ultimately returned, so the cost grows linearly with
  the history of the job instance despite the semantically O(1) result.

  ## Proposed fix

  Push the `stepName` filter, the ordering (`createTime` DESC, then
  `stepExecutionId` DESC), and the `limit(1)` down to MongoDB. Two reasonable
  implementation options:

  - A single aggregation pipeline with `$lookup` against the job execution
    collection, `$match`, `$sort`, `$limit`.
  - Two queries: project only `jobExecutionId` from the job execution collection
    for the given `jobInstanceId`, then query the step execution collection with
    `{ jobExecutionId: { $in: [...] }, name: stepName }` combined with
    `.with(Sort.by(...)).limit(1)`.

  Either approach preserves the contract in `StepExecutionDao#getLastStepExecution`:

  > Retrieve the last `StepExecution` for a given `JobInstance` ordered by
  > creation time and then id.

  ## Prior art

  This change aligns with two closely related efforts that have already been
  applied to the JDBC and MongoDB DAOs:

  - **PR #4798** (merged in 6.0.0-RC2) did the analogous optimization for
    `JdbcStepExecutionDao#getLastStepExecution` — moving the ordering back to
    the database and using `setMaxRows(1)`.
  - **Issue #5061** (opened by @fmbenhassine) applied the same
    "filter/aggregate at the database layer, not in Java" principle to
    `MongoStepExecutionDao#countStepExecutions`.

  The MongoDB version of `getLastStepExecution` is the remaining gap in the
  pattern, and the existing source comment (`// TODO optimize the query`)
  explicitly flags it.

  I would like to submit a PR for this, including tests verifying semantic
  equivalence with the current implementation.

  ---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize MongoStepExecutionDao#getLastStepExecution to filter and sort at the database level #5385

Problem

Current behavior

Proposed fix

Prior art

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Optimize MongoStepExecutionDao#getLastStepExecution to filter and sort at the database level #5385

Description

Problem

Current behavior

Proposed fix

Prior art

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions