CANN/GE图引擎Agent指南-CSDN博客

AGENTS.md

【免费下载链接】ge GE（Graph Engine）是面向昇腾的图编译器和执行器，提供了计算图优化、多流并行、内存复用和模型下沉等技术手段，加速模型执行效率，减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力，并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge

This file provides guidance for agents working in this code repository.

Project Overview

GE (Graph Engine) is Huawei CANN's (Compute Architecture for Neural Networks) graph engine - a high-performance graph compiler and executor for Ascend AI processors. GE provides graph optimization, multi-stream parallel execution, memory reuse optimization, and model sinking capabilities.

Key Directories

Directory	Purpose
`api/`	Public API interfaces (ACL, ATC, session, Python bindings)
`base/`	Base components (graph structure, utilities, host CPU engine)
`compiler/`	Graph compilation (analyzers, engines, graph compiler, operator compiler)
`runtime/`	Runtime execution (V1 static executor, V2 dynamic executor RT2.0)
`dflow/`	Distributed flow framework (LLM data distribution, UDF)
`parser/`	Model format parsers (ONNX, PB, Caffe, MindSpore)
`graph_metadef/`	Graph metadata definitions and operator registration
`tests/`	Comprehensive test suite (UT, ST, benchmarks)
`examples/`	Usage examples and samples

Build Commands

Basic Build

# Build all components (ge_compiler, ge_executor, dflow)
bash build.sh

# Build specific components
bash build.sh --ge_compiler    # Build compiler package
bash build.sh --ge_executor    # Build executor package
bash build.sh --dflow          # Build dflow package

Build Options

bash build.sh -j16                  # 16 parallel threads (default: 8)
bash build.sh --build-type=Debug    # Debug build (default: Release)
bash build.sh --verbose             # Verbose output
bash build.sh --asan                # Enable AddressSanitizer
bash build.sh --cov                 # Enable code coverage
bash build.sh --output_path=<PATH>  # Set custom output path

Testing

DT Test Development

Use Skill: ge-dt-developer Applicable Scenarios: Writing and adding UT/ST test cases.

GE Unit Tests / System Tests

Use Skill: ge-dt-runner Applicable Scenarios: Compiling and running GE project unit tests (UT) and system tests (ST).

Clean Build Artifacts

Not necessary, avoid cleaning. Use incremental compilation whenever possible to save 90% compilation time

rm -rf build_ut/ build_st/ output/ build/ build_out/ cov/ build_cmake_gcov/

Feature Development and New Features

Trigger Words: Add new feature/requirement/capability, develop new feature/requirement/capability, implement feature/requirement/capability

Use Skill: superpower brainstorming skill

Architecture Document Loading

Important: When exploring the repository/project, answering questions, modifying code, outputting design documents, requirement specs, or conducting code reviews, load corresponding documents according to the table below. Each document only needs to be loaded once, triggered by any matching trigger word, involved directory, or scenario. Also, after modifying code, update corresponding documents under docs/en/design/features/ and docs/en/design/modules.

Document	Trigger Words	Involved Directories
`architecture.md`	Architecture overview, overall design, GE introduction, system architecture, compilation optimization, plugin extension	First time understanding the project
`ascend-ir.md`	AscendIR, graph structure, operator registration, Anchor, DAG	`base/`, `inc/` (graph structure related)
`compiler.md`	Compiler, optimization pass, fusion, engine partition, operator compilation	`compiler/` (non memory/split/stream subdirectories)
`runtime.md`	Dynamic runtime executor, model loading, model execution, Hybrid, v2 architecture	`runtime/` (overall architecture level)
`fusion_pattern_pass.md`	Fusion Pattern Pass, PatternFusionPass, DecomposePass, custom fusion Pass, MeetRequirements, CaptureTensor, Replacement, PatternMatcherConfig	`compiler/graph/fusion/`, `compiler/graph/passes/feature/`, `examples/fusion_pass/`
`datadump.md`	dump, overflow, disk write, datadump, exception dump	`common/dump/`, `runtime/*/dump/`
`external_weight.md`	External weight, external weight, FileConstant, weight separation, weight disk write	Use trigger words only
`constant_folding.md`	Constant folding, constant folding, constant folding optimization, constant expression evaluation	`constant_folding`
`dynamic_gear.md`	Dynamic gear, dynamic gear, gear, dynamic batch, dynamic resolution, dynamic_dims	`compiler/graph/preprocess/`, `compiler/graph/passes/multi_batch/`, `runtime/v1/executor/`
`memory_conflict.md`	Memory conflict, memory layout conflict, read-write conflict, Inplace conflict, subgraph address isolation	`compiler/graph/passes/memory_conflict/`, `mem_rw_conflict_optimize.cc`, `compiler/graph/optimize/mem_layout_conflict_optimize/`, `mem_inplace.cc`
`model_cache.md`	Model cache, model cache, compilation cache, OM cache, JIT cache	`model_cache`
`profiling.md`	profiling, performance analysis, performance collection, msprof, performance tuning	`profiling`
`so_in_om.md`	SO in OM, operator packaging, so packaging, self-contained model, operator dependency	`op_so_store`
`tensormove_delete.md`	TensorMove elimination, TensorMove deletion, redundant copy elimination	`tensor_move_delete`
`variable_manager.md`	Variable management, Variable, variable memory, VarRef, variable lifecycle, constant, FileConstant, external weight	`var_manager`, `variable_optimize`
`zero_copy.md`	Zero copy, zero copy, model input/output, user input/output, user memory/address	`zero_copy`
`concat_no_task.md`	Concat No Task, concat optimization, continuous memory concatenation, virtual operator, no Task generation	`concat_notask`
`ge_local_operator.md`	GE Local operator, local operator, GeLocal engine, NoOp, GeDeletedOp, PhonyConcat, PhonySplit	`local_engine`, `ge_local`
`engine.md`	Engine, Engine, engine selection, engine registration, engine partition, EnginePlacer, EnginePartitioner, DNNEngine, OpsKernelInfoStore	`compiler/engines/`, `engine_place`, `dnnengine`
`tiling_sink.md`	Tiling sink, tiling sink, AICPU Tiling, tiling_schedule_optimize	`tiling_sink`, `fe_gentask_utils`
`graph_splitter.md`	Graph split, Graph Split, dynamic-static split, DynamicShapePartitioner, EnginePartitioner, cluster, PartitionedCall	`compiler/graph/partition/`, `dynamic_shape_partition`
`known_shape_executor.md`	Static executor, Known Shape Executor, Task Sink, DavinciModel, address refresh, model sink	`runtime/v1/graph/load/model_manager/`
`unknown_shape_executor.md`	Dynamic executor, Unknown Shape Executor, RT2.0, Lowering, ExecuteGraph, ModelV2Executor, dynamic shape execution	`runtime/v2/`, `runtime/v1/hybrid/executor/`
`stream_allocator.md`	Stream allocation, stream, multi-stream, stream reuse, event synchronization, stream activation	`compiler/graph/build/stream/`
`infer_shape.md`	InferShape, Shape inference, OriginShape, StorageShape, dynamic Shape, symbolic inference	`infer_shape`, `symbolic_shape`
`infer_format.md`	Format inference, Format inference, InferFormat, OriginFormat, StorageFormat, TransData, format propagation	`format_refiner`, `format_optimize`

Key Feature Design Principles and Software Constraints	Trigger Words	Involved Directories
`memory-constraints.md`	Memory, memory reuse, block_mem, allocator, zero copy, continuous memory, memory layout conflict, memory release	`compiler/graph/build/memory/`, `compiler/graph/optimize/mem_layout_conflict_optimize/`
`rt2_runtime.md`	RT2, dynamic shape, rt2 executor, hybrid execution	`runtime/v2/`
`known_shape_runtime.md`	Static shape, known shape, davinci model, sink mode, address refresh	`runtime/v1/`
`graph_split.md`	Graph split, graph cutting, cluster, dynamic graph split, executor selection	`compiler/graph/split/`
`stream_allocator.md`	Stream allocation, stream, multi-stream, stream reuse, event synchronization, stream activation	`compiler/graph/build/stream/`
`graph_metadef.md`	Graph basic structure	`graph_metadef/`

Development Standards

gitcode pr/issue/ci operations @.claude/skills/default-skills/SKILL.md

Code Review Checklist

Trigger Words: Review code, review pr

Use Skill: ge-code-reviewer

Design Document Checklist

Trigger Words: Design document, design spec, spec output, design document, design solution output, brainstorming output document, write design document, write spec, write design, save spec, save spec, save design, write to docs/superpowers/specs, design solution, architecture design, technical solution

Any scenario that outputs design documents/specs (including but not limited to superpowers brainstorming skill, user directly requesting design document writing, design solution output), must first read the template file [docs/en/design/design_document_template.md], then output according to the template format. Each section of the template must be covered. Even if superpowers skill has its own format requirements, this template must be followed.

Also, must check the following items one by one:

Cross-feature impact (cross-feature-check): For all modules/directories involved in the design solution, must first read cross_feature_check.md, and analyze each scenario according to its guidance. Evaluate item by item according to the scenario table in cross_feature_check.md whether there are missing features/scenarios, and explicitly state in the design document
Key feature design principles and software constraints: Based on the directories involved in the design solution, load corresponding architecture documents from the table above "Key Feature Design Principles and Software Constraints", ensure the design solution is consistent with existing constraints. If existing constraints need to be broken, must explicitly state the reason and impact scope

Example output format:

### Design Document Review Results
- [x] Cross-feature impact: Analyzed each scenario according to cross_feature_check.md,
      solution involves runtime/v2/ and compiler/graph/build/memory/,
      checked rt2_runtime.md and memory-constraints.md, solution is compatible with existing memory model,
      does not affect RT2 dynamic shape execution flow
- [x] Key feature design principles: Loaded rt2_runtime.md, solution follows hybrid execution constraints,
      no breaking of existing design principles

Code Style

Follow Google open source code standards
if/for/while/do-while statements should use braces
Use GetPeerInDataAnchorsPtr instead of GetPeerInDataAnchors, the former does not need to construct smart pointers and has better performance. Similarly for GetNamePtr and GetName, prefer interfaces that do not return smart pointers.

Language

Answer questions in Chinese

Think Before Coding

Do not assume. Do not hide confusion. Put trade-offs on the table.

Before implementation:

Clearly state your assumptions. If uncertain, ask.
If multiple interpretations exist, list all of them - do not silently choose one.
If a simpler solution exists, say it. Raise objections when necessary.
If unclear areas exist, stop. Point out the confusion. Ask questions.

Simplicity First

Solve problems with minimal code. Do not make speculative designs.

Do not implement features beyond requirements.
Do not make abstractions for one-time code.
Do not add unrequested "flexibility" or "configurability".
Do not do error handling for impossible scenarios.
If you wrote 200 lines when 50 lines are enough, rewrite.

Ask yourself: "Would a senior engineer think this is too complex?" If yes, simplify it.

Precise Modifications

Only change what must be changed. Only clean up what you messed up.

When editing existing code:

Do not "improve" adjacent code, comments, or formatting.
Do not refactor things that are not broken.
Match existing style, even if you would write it differently.
If you notice unrelated dead code, point it out - do not delete it.

When your modification creates orphaned code:

Delete imports/variables/functions that became unused due to your modification.
Do not delete dead code that existed before, unless requested.

Test standard: Every modification should be traceable to the user request.

Goal-Driven Execution

Define success criteria. Iterate until goals are met.

Transform tasks into verifiable goals:

"Add validation" → "Write tests for invalid inputs, then make them pass"
"Fix bug" → "Write a test that reproduces the bug, then make it pass"
"Refactor X" → "Ensure tests pass before and after refactoring"

For multi-step tasks, briefly state the plan:

1. [Step] → Verify: [Check item]
2. [Step] → Verify: [Check item]
3. [Step] → Verify: [Check item]

Clear success criteria enable independent iteration. Vague criteria ("make it work") require continuous communication.

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考