Problem
MongoStepExecutionDao#getLastStepExecution currently performs filtering and
sorting in Java after loading all step executions for the job instance into
memory. This scales poorly as the number of step executions per job instance
grows.
Current behavior
spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/mongodb/MongoStepExecutionDao.java
(lines 113–147):
@Nullable
@Override
public StepExecution getLastStepExecution(JobInstance jobInstance, String stepName) {
// TODO optimize the query
// get all step executions
Query query = query(where("jobInstanceId").is(jobInstance.getId()));
List<...JobExecution> jobExecutions = this.mongoOperations.find(query, ...);
List<...StepExecution> stepExecutions = this.mongoOperations.find(
query(where("jobExecutionId").in(jobExecutions.stream()
.map(...::getJobExecutionId).toList())),
...StepExecution.class, STEP_EXECUTIONS_COLLECTION_NAME);
// sort step executions by creation date then id (see contract) and return the last one
Optional<...StepExecution> lastStepExecution = stepExecutions.stream()
.filter(stepExecution -> stepExecution.getName().equals(stepName))
.max(Comparator
.comparing(...::getCreateTime)
.thenComparing(...::getStepExecutionId));
...
}
Specifically:
- The server returns all step executions belonging to the job instance
(no name filter applied on the server).
- All returned rows are held in the JVM heap.
- The full list is filtered by
stepName and fully sorted by
(createTime, stepExecutionId) in Java.
Only a single row is ultimately returned, so the cost grows linearly with
the history of the job instance despite the semantically O(1) result.
Proposed fix
Push the stepName filter, the ordering (createTime DESC, then
stepExecutionId DESC), and the limit(1) down to MongoDB. Two reasonable
implementation options:
- A single aggregation pipeline with
$lookup against the job execution
collection, $match, $sort, $limit.
- Two queries: project only
jobExecutionId from the job execution collection
for the given jobInstanceId, then query the step execution collection with
{ jobExecutionId: { $in: [...] }, name: stepName } combined with
.with(Sort.by(...)).limit(1).
Either approach preserves the contract in StepExecutionDao#getLastStepExecution:
Retrieve the last StepExecution for a given JobInstance ordered by
creation time and then id.
Prior art
This change aligns with two closely related efforts that have already been
applied to the JDBC and MongoDB DAOs:
The MongoDB version of getLastStepExecution is the remaining gap in the
pattern, and the existing source comment (// TODO optimize the query)
explicitly flags it.
I would like to submit a PR for this, including tests verifying semantic
equivalence with the current implementation.
Problem
MongoStepExecutionDao#getLastStepExecutioncurrently performs filtering andsorting in Java after loading all step executions for the job instance into
memory. This scales poorly as the number of step executions per job instance
grows.
Current behavior
spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/mongodb/MongoStepExecutionDao.java(lines 113–147):
Specifically:
(no
namefilter applied on the server).stepNameand fully sorted by(createTime, stepExecutionId)in Java.Only a single row is ultimately returned, so the cost grows linearly with
the history of the job instance despite the semantically O(1) result.
Proposed fix
Push the
stepNamefilter, the ordering (createTimeDESC, thenstepExecutionIdDESC), and thelimit(1)down to MongoDB. Two reasonableimplementation options:
$lookupagainst the job executioncollection,
$match,$sort,$limit.jobExecutionIdfrom the job execution collectionfor the given
jobInstanceId, then query the step execution collection with{ jobExecutionId: { $in: [...] }, name: stepName }combined with.with(Sort.by(...)).limit(1).Either approach preserves the contract in
StepExecutionDao#getLastStepExecution:Prior art
This change aligns with two closely related efforts that have already been
applied to the JDBC and MongoDB DAOs:
JdbcStepExecutionDao::getLastStepExecution#4798 (merged in 6.0.0-RC2) did the analogous optimization forJdbcStepExecutionDao#getLastStepExecution— moving the ordering back tothe database and using
setMaxRows(1)."filter/aggregate at the database layer, not in Java" principle to
MongoStepExecutionDao#countStepExecutions.The MongoDB version of
getLastStepExecutionis the remaining gap in thepattern, and the existing source comment (
// TODO optimize the query)explicitly flags it.
I would like to submit a PR for this, including tests verifying semantic
equivalence with the current implementation.