CANN/catlass基本矩阵乘法TLA访问器

Basic Matmul TLA Visitor

【免费下载链接】catlass 本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。 【免费下载链接】catlass 项目地址: https://gitcode.com/cann/catlass

Code path: include/catlass/gemm/kernel/basic_matmul_tla_visitor.hpp

Description

This is the kernel entry point for EVG utilizing the GM workspace path.

The execution workflow is as follows:

  1. AI Core completes the MMAD computation and writes the intermediate results to the GM workspace.
  2. AIV waits for the cross-core synchronization flag.
  3. AIV invokes BlockEpilogue to execute the EVG processing.

This entry point is designed for standard EVG scenarios and is used by most EVG examples in the current repository.

Template Parameters

template <
    class BlockMmad_,
    class BlockEpilogue_,
    class BlockScheduler_
>
class BasicMatmulTlaVisitor;
  • BlockMmad_: GEMM main loop implementation
  • BlockEpilogue_: EVG-specific epilogue, typically configured as BlockEpilogue<EpilogueVisitor<false>, ...>
  • BlockScheduler_: Block scheduler

Key Fields of Arguments

struct Arguments {
    GemmCoord problemShape;
    GM_ADDR ptrA; LayoutA layoutA;
    GM_ADDR ptrB; LayoutB layoutB;
    GM_ADDR ptrC; LayoutC layoutC;
    GM_ADDR ptrBias{nullptr};
    typename BlockEpilogue::EVG::Arguments evg_args;
};

Where:

  • ptrC and layoutC are still reserved in the public Arguments structure.
  • evg_args contains the specific execution parameters for the EVG graph.

Note that the ToUnderlyingArguments() implementation of the current visitor kernel does not consume ptrC or layoutC. The actual writeback logic and destination address are governed by the VisitorAuxStore contained within evg_args.

Workspace Rules

GetWorkspaceSize() returns:

sizeof(ElementC) * M * N + EVG::get_workspace_size(...)

The first term allocates space to store the MMAD results, while the second term accounts for the workspace required by individual EVG nodes.

Usage Conditions

  • BlockEpilogue::USE_UB_WORKSPACE must evaluate to false.
  • This is applicable to scenarios where MMAD results are written out to the GM before epilogue is executed.

【免费下载链接】catlass 本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。 【免费下载链接】catlass 项目地址: https://gitcode.com/cann/catlass

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值