OperatorState
OperatorState目前只有一种实现:DefaultOperatorStateBackend。无论哪种stateBackend,
OperatorState都是保存在内存中的。
org.apache.flink.runtime.state.DefaultOperatorStateBackend.getListState 代码:
private <S> ListState<S> getListState(
ListStateDescriptor<S> stateDescriptor,
OperatorStateHandle.Mode mode) throws StateMigrationException {
Preconditions.checkNotNull(stateDescriptor);
String name = Preconditions.checkNotNull(stateDescriptor.getName());
@SuppressWarnings("unchecked")
// 这里检查state是否存在。name就是stateDescriptor定义的name。
PartitionableListState<S> previous = (PartitionableListState<S>) accessedStatesByName.get(name);
if (previous != null) {
checkStateNameAndMode(
previous.getStateMetaInfo().getName(),
name,
previous.getStateMetaInfo().getAssignmentMode(),
mode);
return previous;
}
// end up here if its the first time access after execution for the
// provided state name; check compatibility of restored state, if any
// TODO with eager registration in place, these checks should be moved to restore()
stateDescriptor.initializeSerializerUnlessSet(getExecutionConfig());
TypeSerializer<S> partitionStateSerializer = Preconditions.checkNotNull(stateDescriptor.getElementSerializer());
@SuppressWarnings("unchecked")
// registeredOperatorStates 是一个map,保存了所有的OperatorState
PartitionableListState<S> partitionableListState = (PartitionableListState<S>) registeredOperatorStates.get(name);
if (null == partitionableListState) {
// no restored state for the state name; simply create new state holder
// 创建一个新的OperatorState
partitionableListState = new PartitionableListState<>(
new RegisteredOperatorStateBackendMetaInfo<>(
name,
partitionStateSerializer,
mode));
registeredOperatorStates.put(name, partitionableListState);
} else {
// has restored state; check compatibility of new state access
checkStateNameAndMode(
partitionableListState.getStateMetaInfo().getName(),
name,
partitionableListState.getStateMetaInfo().getAssignmentMode(),
mode);
RegisteredOperatorStateBackendMetaInfo<S> restoredPartitionableListStateMetaInfo =
partitionableListState.getStateMetaInfo();
// check compatibility to determine if new serializers are incompatible
TypeSerializer<S> newPartitionStateSerializer = partitionStateSerializer.duplicate();
TypeSerializerSchemaCompatibility<S> stateCompatibility =
restoredPartitionableListStateMetaInfo.updatePartitionStateSerializer(newPartitionStateSerializer);
if (stateCompatibility.isIncompatible()) {
throw new StateMigrationException("The new state typeSerializer for operator state must not be incompatible.");
}
partitionableListState.setStateMetaInfo(restoredPartitionableListStateMetaInfo);
}
// 返回这个状态。
accessedStatesByName.put(name, partitionableListState);
return partitionableListState;
}
checkpoint核心代码,在org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy.snapshot:
@Nonnull
@Override
public RunnableFuture<SnapshotResult<OperatorStateHandle>> snapshot(
final long checkpointId,
final long timestamp,
@Nonnull final CheckpointStreamFactory streamFactory,
@Nonnull final CheckpointOptions checkpointOptions) throws IOException {
if (registeredOperatorStates.isEmpty() && registeredBroadcastStates.isEmpty()) {
return DoneFuture.of(SnapshotResult.empty());
}
final Map<String, PartitionableListState<?>> registeredOperatorStatesDeepCopies =
new HashMap<>(registeredOperatorStates.size());
final Map<String, BackendWritableBroadcastState<?, ?>> registeredBroadcastStatesDeepCopies =
new HashMap<>(registeredBroadcastStates.size());
ClassLoader snapshotClassLoader = Thread.currentThread().getContextClassLoader();
Thread.currentThread().setContextClassLoader(userClassLoader);
try {
// eagerly create deep copies of the list and the broadcast states (if any)
// in the synchronous phase, so that we can use them in the async writing.
// 将相关的状态深度拷贝到变量中。
// 这里拷贝OperatorState
if (!registeredOperatorStates.isEmpty()) {
for (Map.Entry<String, PartitionableListState<?>> entry : registeredOperatorStates.entrySet()) {
PartitionableListState<?> listState = entry.getValue();
if (null != listState) {
listState = listState.deepCopy();
}
registeredOperatorStatesDeepCopies.put(entry.getKey(), listState);
}
}
// 这里拷贝广播状态
if (!registeredBroadcastStates.isEmpty()) {
for (Map.Entry<String, BackendWritableBroadcastState<?, ?>> entry : registeredBroadcastStates.entrySet()) {
BackendWritableBroadcastState<?, ?> broadcastState = entry.getValue();
if (null != broadcastState) {
broadcastState = broadcastState.deepCopy();
}
registeredBroadcastStatesDeepCopies.put(entry.getKey(), broadcastState);
}
}
} finally {
Thread.currentThread().setContextClassLoader(snapshotClassLoader);
}
AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>> snapshotCallable =
new AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>>() {
@Override
protected SnapshotResult<OperatorStateHandle> callInternal() throws Exception {
// 这里创建了checkopint写入的文件流。
// checkpoin目录是: checkpointDir/chk-$id
// 文件名称是uuid生成的。
// operater state是排他性的,独属于某个checkpoint/savepoint
CheckpointStreamFactory.CheckpointStateOutputStream localOut =
streamFactory.createCheckp

本文深入探讨了Flink的OperatorState和keyed state,详细解析了状态创建、快照调用、增量checkpoint的实现。重点分析了checkpoint元数据的写入流程,包括job manager如何处理subtask提交的元数据。同时,文章对比了savepoint和checkpoint在RocksDB及内存模式下的区别,并介绍了增量checkpoint在应用中的清理策略。

526

被折叠的 条评论
为什么被折叠?



