cann/catlass块矩阵乘乒乓

Block Mmad Pingpong

【免费下载链接】catlass 本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。 【免费下载链接】catlass 项目地址: https://gitcode.com/cann/catlass

Code location

Description

A partial specialization implementation of BlockMmad designed for block-level MMAD computation. It does not perform bias computation, operates non-asynchronously, and does not use the TLA implementation.

Scheduling Policy

// Now ENABLE_UNIT_FLAG_ must be false when input element is int8
template <bool ENABLE_UNIT_FLAG_ = false>
struct MmadAtlasA2Pingpong : public MmadAtlasA2  {
    static constexpr uint32_t STAGES = 2;
    static constexpr bool ENABLE_UNIT_FLAG = ENABLE_UNIT_FLAG_;
};

When ENABLE_UNIT_FLAG_ is set to true, concurrent read-out and write-in for L0C are enabled, maximizing pipeline parallelism.

Example

Block Assembly

See basic_matmul.

constexpr bool enableUnitFlag = true;
using MmadDispatchPolicy = Gemm::MmadAtlasA2Pingpong<enableUnitFlag>;
using L1TileShape = GemmShape<128, 256, 256>;
using L0TileShape = GemmShape<128, 256, 64>;
using AType = Gemm::GemmType<half, LayoutA>;
using BType = Gemm::GemmType<half, LayoutB>;
using CType = Gemm::GemmType<half, LayoutC>;
using BlockMmad = Gemm::Block::BlockMmad<MmadDispatchPolicy, L1TileShape, L0TileShape, AType, BType, CType>;

Block Instantiation

Executed inside the void operator()<AscendC::AIC> core function of the kernel code by referring to basic_matmul.

Arch::Resource<ArchTag> resource;
BlockMmad blockMmad(resource);

Block Execution

Executed inside the void operator()<AscendC::AIC> core function of the kernel code by referring to basic_matmul.

blockMmad(gmA[gmOffsetA],       // GM start address of the tile block for matrix A
        params.layoutA,         // Storage layout of matrix A in GM
        gmB[gmOffsetB],         // GM start address of the tile block for matrix B
        params.layoutB,         // Storage layout of matrix B in GM
        gmC[gmOffsetC],         // GM start address of the tile block for matrix C
        params.layoutC,         // Storage layout of matrix C in GM
        actualBlockShape);      // Actual structural shape of the block

Constraints

  • The template parameter BiasType_ is not used in practice and does not support bias computation.

【免费下载链接】catlass 本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。 【免费下载链接】catlass 项目地址: https://gitcode.com/cann/catlass

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值