ld_st_reg_mask Example
Overview
This example implements UB (Unified Buffer) load/store operations for MaskReg (mask register) based on the Reg programming interface, as well as masked store operations using mask. It supports multiple scenarios. Select a scenario through the CMake build parameter SCENARIO_NUM.
| SCENARIO_NUM | Scenario Type |
| 1 | Basic transfer scenario: Uses LoadAlign, StoreAlign and other APIs to implement MaskReg load/store |
| 2 | Composite computation scenario: Uses LoadAlign to load MaskReg, then uses it as a mask for Select to complete selection computation |
Supported Products and CANN Versions
| Product | CANN Version |
|---|---|
| Ascend 950PR/Ascend 950DT | >= CANN 9.1.0 |
Directory Structure
├── reg_load_store_mask
│ ├── scripts
│ │ └── gen_data.py // Input data and ground truth generation script
│ ├── figures // Illustrations
│ ├── CMakeLists.txt // Build configuration file
│ ├── data_utils.h // Data read/write functions
│ ├── README.md // Example description
│ ├── ld_st_reg_mask_scenario_1.asc // Scenario 1: Ascend C implementation for basic transfer scenario
│ └── ld_st_reg_mask_scenario_2.asc // Scenario 2: Ascend C implementation for composite computation scenario
Example Description
Scenario 1: Basic Transfer Scenario
- Example functionality:
- Data load: Takes the first 256 bits of data from the input matrix and calls the LoadAlign API to load data from UB into MaskReg.
- Data store: Sets all bits of MaskReg to 1 and calls the StoreAlign API to store data from MaskReg to UB.
- Example specifications:
Example Type (OpType) AIV Example Example Input name shape data type x [1, 1024] uint8_t Example Output y [1, 1024] uint8_t Kernel Function Name ld_st_reg_mask_scenario_1 - Example implementation:
- In the CopyVF function, call the LoadAlign API to load 256 bits (32*uint8_t) of data from UB into MaskReg to achieve dynamic mask setting. In this example, the 32 uint8_t values are set to 1,0,...,1, meaning the first and last values are 1 (b'00000001). Since the chip reads numbers from the low bit each time, these 32 numbers are ultimately filled into MaskReg as b'1000...1...000.
- Call the Duplicate API for data filling. MaskReg can indicate which elements participate in computation. From step 1, the 1st bit and the 249th bit in MaskReg are 1. Using this mask, only the 1st and 249th numbers in RegTensor are filled with the value 2.
- Use the StoreAlign API to save the results from RegTensor to UB.
- Set all bits in MaskReg to 1, and save the data from MaskReg to UB (address = save address from step 3 + 256B) through the StoreAlign API, implementing the function of storing MaskReg data on UB. Corresponding to 32 uint8_t values, each bit of each value is 1, so each value is 255 (0xFF).
Scenario 2: Composite Computation Scenario
-
Example functionality: Selects values from the xReg or yReg vector at corresponding positions based on the bits in maskReg. When a mask bit is 1, the corresponding element from src0 is selected; when a mask bit is 0, the corresponding element from src1 is selected.
-
Example specifications:
Example Type (OpType) AIV Example Example Input name shape data type x [1, 256] float y [1, 256] float mask [1, 32] uint8_t Example Output z [1, 256] float Kernel Function Name ld_st_reg_mask_scenario_2 -
Example implementation: In the SelectVF function, call the LoadAlign API to load mask data from UB into MaskReg, then pass it to the Select API for computation, writing the result back to UB.
-
Invocation implementation Uses the kernel invocation syntax <<<>>> to call the kernel function.
Build and Run
Run the following steps in the root directory of this example to build and run it.
-
Configure environment variables Configure environment variables based on the installation method of the CANN development kit on the current environment.
source ${install_path}/cann/set_env.shNote:
${install_path}is the CANN package installation directory. When no installation directory is specified, the default installation path is/usr/local/Ascend. -
Run the example
Run the following commands in the example directory.
SCENARIO_NUM=1 # Set the scenario number mkdir -p build && cd build; # Create and enter the build directory cmake -DCMAKE_ASC_ARCHITECTURES=dav-3510 -DSCENARIO_NUM=$SCENARIO_NUM ..;make -j; # Build the project (default npu mode) python3 ../scripts/gen_data.py -scenarioNum=$SCENARIO_NUM # Generate test input data ./demo # Run the compiled executable to execute the exampleTo use CPU debug or NPU simulation mode, add the
-DCMAKE_ASC_RUN_MODE=cpuor-DCMAKE_ASC_RUN_MODE=simparameter.Examples:
cmake -DCMAKE_ASC_RUN_MODE=cpu -DCMAKE_ASC_ARCHITECTURES=dav-3510 -DSCENARIO_NUM=$SCENARIO_NUM ..;make -j; # CPU debug mode cmake -DCMAKE_ASC_RUN_MODE=sim -DCMAKE_ASC_ARCHITECTURES=dav-3510 -DSCENARIO_NUM=$SCENARIO_NUM ..;make -j; # NPU simulation modeNotice: Clear the cmake cache before switching build modes. Run
rm CMakeCache.txtin the build directory and re-run cmake. -
Build option description
Option Values Description CMAKE_ASC_RUN_MODEnpu(default),cpu,simRun mode: NPU execution, CPU debug, NPU simulation CMAKE_ASC_ARCHITECTURESdav-3510NPU architecture: dav-3510 corresponds to Ascend 950PR/Ascend 950DT SCENARIO_NUM1,2Scenario number: 1=basic transfer scenario, 2=composite computation scenario -
Execution result The following execution result indicates that the precision comparison is successful.
test pass!
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



