CANN/GE图引擎Agent指南

AGENTS.md

【免费下载链接】ge GE(Graph Engine)是面向昇腾的图编译器和执行器,提供了计算图优化、多流并行、内存复用和模型下沉等技术手段,加速模型执行效率,减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力,并同时支持 onnx、pb 等主流模型格式的解析与编译。 【免费下载链接】ge 项目地址: https://gitcode.com/cann/ge

This file provides guidance for agents working in this code repository.

Project Overview

GE (Graph Engine) is Huawei CANN's (Compute Architecture for Neural Networks) graph engine - a high-performance graph compiler and executor for Ascend AI processors. GE provides graph optimization, multi-stream parallel execution, memory reuse optimization, and model sinking capabilities.

Key Directories

DirectoryPurpose
api/Public API interfaces (ACL, ATC, session, Python bindings)
base/Base components (graph structure, utilities, host CPU engine)
compiler/Graph compilation (analyzers, engines, graph compiler, operator compiler)
runtime/Runtime execution (V1 static executor, V2 dynamic executor RT2.0)
dflow/Distributed flow framework (LLM data distribution, UDF)
parser/Model format parsers (ONNX, PB, Caffe, MindSpore)
graph_metadef/Graph metadata definitions and operator registration
tests/Comprehensive test suite (UT, ST, benchmarks)
examples/Usage examples and samples

Build Commands

Basic Build

# Build all components (ge_compiler, ge_executor, dflow)
bash build.sh

# Build specific components
bash build.sh --ge_compiler    # Build compiler package
bash build.sh --ge_executor    # Build executor package
bash build.sh --dflow          # Build dflow package

Build Options

bash build.sh -j16                  # 16 parallel threads (default: 8)
bash build.sh --build-type=Debug    # Debug build (default: Release)
bash build.sh --verbose             # Verbose output
bash build.sh --asan                # Enable AddressSanitizer
bash build.sh --cov                 # Enable code coverage
bash build.sh --output_path=<PATH>  # Set custom output path

Testing

DT Test Development

Use Skill: ge-dt-developer Applicable Scenarios: Writing and adding UT/ST test cases.

GE Unit Tests / System Tests

Use Skill: ge-dt-runner Applicable Scenarios: Compiling and running GE project unit tests (UT) and system tests (ST).

Clean Build Artifacts

Not necessary, avoid cleaning. Use incremental compilation whenever possible to save 90% compilation time

rm -rf build_ut/ build_st/ output/ build/ build_out/ cov/ build_cmake_gcov/

Feature Development and New Features

Trigger Words: Add new feature/requirement/capability, develop new feature/requirement/capability, implement feature/requirement/capability

Use Skill: superpower brainstorming skill

Architecture Document Loading

Important: When exploring the repository/project, answering questions, modifying code, outputting design documents, requirement specs, or conducting code reviews, load corresponding documents according to the table below. Each document only needs to be loaded once, triggered by any matching trigger word, involved directory, or scenario. Also, after modifying code, update corresponding documents under docs/en/design/features/ and docs/en/design/modules.

DocumentTrigger WordsInvolved Directories
architecture.mdArchitecture overview, overall design, GE introduction, system architecture, compilation optimization, plugin extensionFirst time understanding the project
ascend-ir.mdAscendIR, graph structure, operator registration, Anchor, DAGbase/, inc/ (graph structure related)
compiler.mdCompiler, optimization pass, fusion, engine partition, operator compilationcompiler/ (non memory/split/stream subdirectories)
runtime.mdDynamic runtime executor, model loading, model execution, Hybrid, v2 architectureruntime/ (overall architecture level)
fusion_pattern_pass.mdFusion Pattern Pass, PatternFusionPass, DecomposePass, custom fusion Pass, MeetRequirements, CaptureTensor, Replacement, PatternMatcherConfigcompiler/graph/fusion/, compiler/graph/passes/feature/, examples/fusion_pass/
datadump.mddump, overflow, disk write, datadump, exception dumpcommon/dump/, runtime/*/dump/
external_weight.mdExternal weight, external weight, FileConstant, weight separation, weight disk writeUse trigger words only
constant_folding.mdConstant folding, constant folding, constant folding optimization, constant expression evaluation*constant_folding*
dynamic_gear.mdDynamic gear, dynamic gear, gear, dynamic batch, dynamic resolution, dynamic_dimscompiler/graph/preprocess/, compiler/graph/passes/multi_batch/, runtime/v1/executor/
memory_conflict.mdMemory conflict, memory layout conflict, read-write conflict, Inplace conflict, subgraph address isolationcompiler/graph/passes/memory_conflict/, mem_rw_conflict_optimize.cc, compiler/graph/optimize/mem_layout_conflict_optimize/, mem_inplace.cc
model_cache.mdModel cache, model cache, compilation cache, OM cache, JIT cache*model_cache*
profiling.mdprofiling, performance analysis, performance collection, msprof, performance tuning*profiling*
so_in_om.mdSO in OM, operator packaging, so packaging, self-contained model, operator dependency*op_so_store*
tensormove_delete.mdTensorMove elimination, TensorMove deletion, redundant copy elimination*tensor_move_delete*
variable_manager.mdVariable management, Variable, variable memory, VarRef, variable lifecycle, constant, FileConstant, external weight*var_manager*, *variable_optimize*
zero_copy.mdZero copy, zero copy, model input/output, user input/output, user memory/address*zero_copy*
concat_no_task.mdConcat No Task, concat optimization, continuous memory concatenation, virtual operator, no Task generation*concat_notask*
ge_local_operator.mdGE Local operator, local operator, GeLocal engine, NoOp, GeDeletedOp, PhonyConcat, PhonySplit*local_engine*, *ge_local*
engine.mdEngine, Engine, engine selection, engine registration, engine partition, EnginePlacer, EnginePartitioner, DNNEngine, OpsKernelInfoStorecompiler/engines/, *engine_place*, *dnnengine*
tiling_sink.mdTiling sink, tiling sink, AICPU Tiling, tiling_schedule_optimize*tiling_sink*, *fe_gentask_utils*
graph_splitter.mdGraph split, Graph Split, dynamic-static split, DynamicShapePartitioner, EnginePartitioner, cluster, PartitionedCallcompiler/graph/partition/, *dynamic_shape_partition*
known_shape_executor.mdStatic executor, Known Shape Executor, Task Sink, DavinciModel, address refresh, model sinkruntime/v1/graph/load/model_manager/
unknown_shape_executor.mdDynamic executor, Unknown Shape Executor, RT2.0, Lowering, ExecuteGraph, ModelV2Executor, dynamic shape executionruntime/v2/, runtime/v1/hybrid/executor/
stream_allocator.mdStream allocation, stream, multi-stream, stream reuse, event synchronization, stream activationcompiler/graph/build/stream/
infer_shape.mdInferShape, Shape inference, OriginShape, StorageShape, dynamic Shape, symbolic inference*infer_shape*, *symbolic_shape*
infer_format.mdFormat inference, Format inference, InferFormat, OriginFormat, StorageFormat, TransData, format propagation*format_refiner*, *format_optimize*
Key Feature Design Principles and Software ConstraintsTrigger WordsInvolved Directories
memory-constraints.mdMemory, memory reuse, block_mem, allocator, zero copy, continuous memory, memory layout conflict, memory releasecompiler/graph/build/memory/, compiler/graph/optimize/mem_layout_conflict_optimize/
rt2_runtime.mdRT2, dynamic shape, rt2 executor, hybrid executionruntime/v2/
known_shape_runtime.mdStatic shape, known shape, davinci model, sink mode, address refreshruntime/v1/
graph_split.mdGraph split, graph cutting, cluster, dynamic graph split, executor selectioncompiler/graph/split/
stream_allocator.mdStream allocation, stream, multi-stream, stream reuse, event synchronization, stream activationcompiler/graph/build/stream/
graph_metadef.mdGraph basic structuregraph_metadef/

Development Standards

gitcode pr/issue/ci operations @.claude/skills/default-skills/SKILL.md

Code Review Checklist

Trigger Words: Review code, review pr

Use Skill: ge-code-reviewer

Design Document Checklist

Trigger Words: Design document, design spec, spec output, design document, design solution output, brainstorming output document, write design document, write spec, write design, save spec, save spec, save design, write to docs/superpowers/specs, design solution, architecture design, technical solution

Any scenario that outputs design documents/specs (including but not limited to superpowers brainstorming skill, user directly requesting design document writing, design solution output), must first read the template file [docs/en/design/design_document_template.md], then output according to the template format. Each section of the template must be covered. Even if superpowers skill has its own format requirements, this template must be followed.

Also, must check the following items one by one:

  •  Cross-feature impact (cross-feature-check): For all modules/directories involved in the design solution, must first read cross_feature_check.md, and analyze each scenario according to its guidance. Evaluate item by item according to the scenario table in cross_feature_check.md whether there are missing features/scenarios, and explicitly state in the design document
  •  Key feature design principles and software constraints: Based on the directories involved in the design solution, load corresponding architecture documents from the table above "Key Feature Design Principles and Software Constraints", ensure the design solution is consistent with existing constraints. If existing constraints need to be broken, must explicitly state the reason and impact scope

Example output format:

### Design Document Review Results
- [x] Cross-feature impact: Analyzed each scenario according to cross_feature_check.md,
      solution involves runtime/v2/ and compiler/graph/build/memory/,
      checked rt2_runtime.md and memory-constraints.md, solution is compatible with existing memory model,
      does not affect RT2 dynamic shape execution flow
- [x] Key feature design principles: Loaded rt2_runtime.md, solution follows hybrid execution constraints,
      no breaking of existing design principles

Code Style

  • Follow Google open source code standards
  • if/for/while/do-while statements should use braces
  • Use GetPeerInDataAnchorsPtr instead of GetPeerInDataAnchors, the former does not need to construct smart pointers and has better performance. Similarly for GetNamePtr and GetName, prefer interfaces that do not return smart pointers.

Language

Answer questions in Chinese

Think Before Coding

Do not assume. Do not hide confusion. Put trade-offs on the table.

Before implementation:

  • Clearly state your assumptions. If uncertain, ask.
  • If multiple interpretations exist, list all of them - do not silently choose one.
  • If a simpler solution exists, say it. Raise objections when necessary.
  • If unclear areas exist, stop. Point out the confusion. Ask questions.

Simplicity First

Solve problems with minimal code. Do not make speculative designs.

  • Do not implement features beyond requirements.
  • Do not make abstractions for one-time code.
  • Do not add unrequested "flexibility" or "configurability".
  • Do not do error handling for impossible scenarios.
  • If you wrote 200 lines when 50 lines are enough, rewrite.

Ask yourself: "Would a senior engineer think this is too complex?" If yes, simplify it.

Precise Modifications

Only change what must be changed. Only clean up what you messed up.

When editing existing code:

  • Do not "improve" adjacent code, comments, or formatting.
  • Do not refactor things that are not broken.
  • Match existing style, even if you would write it differently.
  • If you notice unrelated dead code, point it out - do not delete it.

When your modification creates orphaned code:

  • Delete imports/variables/functions that became unused due to your modification.
  • Do not delete dead code that existed before, unless requested.

Test standard: Every modification should be traceable to the user request.

Goal-Driven Execution

Define success criteria. Iterate until goals are met.

Transform tasks into verifiable goals:

  • "Add validation" → "Write tests for invalid inputs, then make them pass"
  • "Fix bug" → "Write a test that reproduces the bug, then make it pass"
  • "Refactor X" → "Ensure tests pass before and after refactoring"

For multi-step tasks, briefly state the plan:

1. [Step] → Verify: [Check item]
2. [Step] → Verify: [Check item]
3. [Step] → Verify: [Check item]

Clear success criteria enable independent iteration. Vague criteria ("make it work") require continuous communication.

【免费下载链接】ge GE(Graph Engine)是面向昇腾的图编译器和执行器,提供了计算图优化、多流并行、内存复用和模型下沉等技术手段,加速模型执行效率,减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力,并同时支持 onnx、pb 等主流模型格式的解析与编译。 【免费下载链接】ge 项目地址: https://gitcode.com/cann/ge

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值