flink checkpoint元数据源码分析

本文深入探讨了Flink的OperatorState和keyed state,详细解析了状态创建、快照调用、增量checkpoint的实现。重点分析了checkpoint元数据的写入流程,包括job manager如何处理subtask提交的元数据。同时,文章对比了savepoint和checkpoint在RocksDB及内存模式下的区别,并介绍了增量checkpoint在应用中的清理策略。

OperatorState

OperatorState目前只有一种实现:DefaultOperatorStateBackend。无论哪种stateBackend,
OperatorState都是保存在内存中的。

org.apache.flink.runtime.state.DefaultOperatorStateBackend.getListState 代码:

	private <S> ListState<S> getListState(
			ListStateDescriptor<S> stateDescriptor,
			OperatorStateHandle.Mode mode) throws StateMigrationException {

		Preconditions.checkNotNull(stateDescriptor);
		String name = Preconditions.checkNotNull(stateDescriptor.getName());

		@SuppressWarnings("unchecked")
		// 这里检查state是否存在。name就是stateDescriptor定义的name。
		PartitionableListState<S> previous = (PartitionableListState<S>) accessedStatesByName.get(name);
		if (previous != null) {
			checkStateNameAndMode(
					previous.getStateMetaInfo().getName(),
					name,
					previous.getStateMetaInfo().getAssignmentMode(),
					mode);
			return previous;
		}

		// end up here if its the first time access after execution for the
		// provided state name; check compatibility of restored state, if any
		// TODO with eager registration in place, these checks should be moved to restore()

		stateDescriptor.initializeSerializerUnlessSet(getExecutionConfig());
		TypeSerializer<S> partitionStateSerializer = Preconditions.checkNotNull(stateDescriptor.getElementSerializer());

		@SuppressWarnings("unchecked")
		// registeredOperatorStates 是一个map,保存了所有的OperatorState
		PartitionableListState<S> partitionableListState = (PartitionableListState<S>) registeredOperatorStates.get(name);

		if (null == partitionableListState) {
			// no restored state for the state name; simply create new state holder
            // 创建一个新的OperatorState
			partitionableListState = new PartitionableListState<>(
				new RegisteredOperatorStateBackendMetaInfo<>(
					name,
					partitionStateSerializer,
					mode));

			registeredOperatorStates.put(name, partitionableListState);
		} else {
			// has restored state; check compatibility of new state access

			checkStateNameAndMode(
					partitionableListState.getStateMetaInfo().getName(),
					name,
					partitionableListState.getStateMetaInfo().getAssignmentMode(),
					mode);

			RegisteredOperatorStateBackendMetaInfo<S> restoredPartitionableListStateMetaInfo =
				partitionableListState.getStateMetaInfo();

			// check compatibility to determine if new serializers are incompatible
			TypeSerializer<S> newPartitionStateSerializer = partitionStateSerializer.duplicate();

			TypeSerializerSchemaCompatibility<S> stateCompatibility =
				restoredPartitionableListStateMetaInfo.updatePartitionStateSerializer(newPartitionStateSerializer);
			if (stateCompatibility.isIncompatible()) {
				throw new StateMigrationException("The new state typeSerializer for operator state must not be incompatible.");
			}

			partitionableListState.setStateMetaInfo(restoredPartitionableListStateMetaInfo);
		}

        // 返回这个状态。
		accessedStatesByName.put(name, partitionableListState);
		return partitionableListState;
	}

checkpoint核心代码,在org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy.snapshot:

	@Nonnull
	@Override
	public RunnableFuture<SnapshotResult<OperatorStateHandle>> snapshot(
		final long checkpointId,
		final long timestamp,
		@Nonnull final CheckpointStreamFactory streamFactory,
		@Nonnull final CheckpointOptions checkpointOptions) throws IOException {

		if (registeredOperatorStates.isEmpty() && registeredBroadcastStates.isEmpty()) {
			return DoneFuture.of(SnapshotResult.empty());
		}

		final Map<String, PartitionableListState<?>> registeredOperatorStatesDeepCopies =
			new HashMap<>(registeredOperatorStates.size());
		final Map<String, BackendWritableBroadcastState<?, ?>> registeredBroadcastStatesDeepCopies =
			new HashMap<>(registeredBroadcastStates.size());

		ClassLoader snapshotClassLoader = Thread.currentThread().getContextClassLoader();
		Thread.currentThread().setContextClassLoader(userClassLoader);
		try {
			// eagerly create deep copies of the list and the broadcast states (if any)
			// in the synchronous phase, so that we can use them in the async writing.
			
			// 将相关的状态深度拷贝到变量中。
			
			// 这里拷贝OperatorState
			if (!registeredOperatorStates.isEmpty()) {
				for (Map.Entry<String, PartitionableListState<?>> entry : registeredOperatorStates.entrySet()) {
					PartitionableListState<?> listState = entry.getValue();
					if (null != listState) {
						listState = listState.deepCopy();
					}
					registeredOperatorStatesDeepCopies.put(entry.getKey(), listState);
				}
			}

            // 这里拷贝广播状态
			if (!registeredBroadcastStates.isEmpty()) {
				for (Map.Entry<String, BackendWritableBroadcastState<?, ?>> entry : registeredBroadcastStates.entrySet()) {
					BackendWritableBroadcastState<?, ?> broadcastState = entry.getValue();
					if (null != broadcastState) {
						broadcastState = broadcastState.deepCopy();
					}
					registeredBroadcastStatesDeepCopies.put(entry.getKey(), broadcastState);
				}
			}
		} finally {
			Thread.currentThread().setContextClassLoader(snapshotClassLoader);
		}

		AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>> snapshotCallable =
			new AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>>() {

				@Override
				protected SnapshotResult<OperatorStateHandle> callInternal() throws Exception {
				    // 这里创建了checkopint写入的文件流。
				    // checkpoin目录是: checkpointDir/chk-$id
				    // 文件名称是uuid生成的。
				    // operater state是排他性的,独属于某个checkpoint/savepoint
					CheckpointStreamFactory.CheckpointStateOutputStream localOut =
						streamFactory.createCheckp
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值