基于aarch64分析kernel源码三：启动代码分析

最新推荐文章于 2026-04-20 14:35:16 发布

原创

最新推荐文章于 2026-04-20 14:35:16 发布 · 1.7k 阅读

标签

#kernel

文章详细介绍了Linux内核的启动过程，包括内核启动入口点、primary_entry函数、record_mmu_state函数、preserve_boot_args函数、create_idmap函数、init_kernel_el函数和__cpu_setup函数。这些函数在启动过程中起到关键作用，如设置MMU状态、保存引导参数、创建ID映射、初始化内核执行级别以及处理地址空间布局等。

一、内核启动入口点

/*
 * Kernel startup entry point.
 * ---------------------------
 *
 * The requirements are:
 *   MMU = off, D-cache = off, I-cache = on or off,
 *   x0 = physical address to the FDT blob.
 * 这部分注释说明了内核启动入口点的要求和约束条件。
 * 要求包括：MMU（内存管理单元）关闭，数据缓存（D-cache）关闭，指令缓存（I-cache）可以开启或关闭，
 * 以及需要将物理地址传递给 FDT（平台设备树）blob 的寄存器 x0
 *
 * Note that the callee-saved registers are used for storing variables
 * that are useful before the MMU is enabled. The allocations are described
 * in the entry routines.
 * 请注意，被调用方保存的寄存器用于存储启用 MMU 之前有用的变量。分配在条目例程中描述。
 */
 	
 	/* 标识内核启动入口点 */
	__HEAD
	/*
	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
	 * 请勿修改。Linux 引导加载进程所需的映像标头。
	 */
	efi_signature_nop			// special NOP to identity as PE/COFF executable 特殊 NOP 标识为 PE/COFF 可执行文档
	b	primary_entry			// branch to kernel start, magic 跳转到内核启动代码
	/* 这部分代码是内核启动入口点后面的一条 .head 段，包含了用于 Linux 引导加载器期望的镜像头信息。其中包括偏移量、大小、标志等信息 */
	.quad	0				// Image load offset from start of RAM, little-endian 镜像加载偏移量
	le64sym	_kernel_size_le			// Effective size of kernel image, little-endian 内核镜像的有效大小
	le64sym	_kernel_flags_le		// Informative flags, little-endian 信息标志
	.quad	0				// reserved
	.quad	0				// reserved
	.quad	0				// reserved
	.ascii	ARM64_IMAGE_MAGIC		// Magic number 幻数
	.long	.Lpe_header_offset		// Offset to the PE header. PE头的偏移量

	__EFI_PE_HEADER

	/* 这行代码标识接下来的代码是 EFI（扩展固件接口）PE（可执行文件）头部信息 */
	.section ".idmap.text","a"

二、primary_entry

	/* 这行代码标识接下来的代码是 EFI（扩展固件接口）PE（可执行文件）头部信息 */
	.section ".idmap.text","a"
	
	/*
	 * The following callee saved general purpose registers are used on the
	 * primary lowlevel boot path:
	 * 以下被调用者保存的通用寄存器用于主低级引导路径
	 *
	 *  寄存器		 上下文					  目标
	 *  Register   Scope                      Purpose
	 *  x19        primary_entry() .. start_kernel()        whether we entered with the MMU on
	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode 
	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
	 *  x24        __primary_switch()                       linear map KASLR seed
	 *  x25        primary_entry() .. start_kernel()        supported VA size
	 *  x28        create_idmap()                           callee preserved temp register
	 */
SYM_CODE_START(primary_entry)
	bl	record_mmu_state
	bl	preserve_boot_args
	bl	create_idmap

	/*
	 * If we entered with the MMU and caches on, clean the ID mapped part
	 * of the primary boot code to the PoC so we can safely execute it with
	 * the MMU off.
	 * 如果我们在 MMU 和缓存打开的情况下输入，请清理主启动代码的 ID 映射部分到 PoC，
	 * 以便我们可以在关闭 MMU 的情况下安全地执行它。
	 */
	cbz	x19, 0f
	adrp	x0, __idmap_text_start
	adr_l	x1, __idmap_text_end
	adr_l	x2, dcache_clean_poc
	blr	x2
0:	mov	x0, x19
	bl	init_kernel_el			// w0=cpu_boot_mode
	mov	x20, x0

	/*
	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
	 * details.
	 * 下面调用 CPU 设置代码，请参阅 arch/arm64/mm/proc.S 了解详细信息。
	 * On return, the CPU will be ready for the MMU to be turned on and
	 * the TCR will have been set.
	 * 在返回时，CPU将准备好MMU被打开，TCR将被设置。
	 */
#if VA_BITS > 48
	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
	tst	x0, #0xf << ID_AA64MMFR2_EL1_VARange_SHIFT
	mov	x0, #VA_BITS
	mov	x25, #VA_BITS_MIN
	csel	x25, x25, x0, eq
	mov	x0, x25
#endif
	bl	__cpu_setup			// initialise processor 初始化处理器
	b	__primary_switch
SYM_CODE_END(primary_entry)

这段代码是主要的入口点函数 primary_entry，在启动过程中执行一系列操作后将控制权切换到内核。

以下是对每行代码的解释：

SYM_CODE_START(primary_entry)

这行代码表示这是一个本地符号，标识 primary_entry 函数的开始位置。

bl	record_mmu_state
bl	preserve_boot_args
bl	create_idmap

这三行代码调用了三个不同的函数。首先调用 record_mmu_state 函数来记录 MMU 的状态，然后调用 preserve_boot_args 函数来保留引导参数，最后调用 create_idmap 函数来创建 ID 映射（identity mapping）。

cbz	x19, 0f
adrp	x0, __idmap_text_start
adr_l	x1, __idmap_text_end
adr_l	x2, dcache_clean_poc
blr	x2
0:	mov	x0, x19
bl	init_kernel_el			// w0=cpu_boot_mode
mov	x20, x0

这几行代码根据 x19 寄存器的值进行条件分支。如果 x19 为零（等于零），则跳转到标号 0 处。在标号 0 处，会将 __idmap_text_start 和 __idmap_text_end 地址计算给寄存器 x0 和 x1，并将 dcache_clean_poc 地址计算给寄存器 x2，然后通过 blr 指令将控制权转移给 dcache_clean_poc 函数进行 ID 映射部分的清理操作。如果 x19 不为零，则继续执行下一行代码，将 x19 的值传递给 init_kernel_el 函数，并将函数的返回值存储在 x20 寄存器中。

#if VA_BITS > 48
mrs_s	x0, SYS_ID_AA64MMFR2_EL1
tst	x0, #0xf << ID_AA64MMFR2_EL1_VARange_SHIFT
mov	x0, #VA_BITS
mov	x25, #VA_BITS_MIN
csel	x25, x25, x0, eq
mov	x0, x25
#endif
bl	__cpu_setup			// initialise processor
b	__primary_switch

这部分代码是一个条件编译块，在条件编译宏 VA_BITS 大于 48 时才会被编译。它首先读取 SYS_ID_AA64MMFR2_EL1 寄存器的值到 x0 中，然后根据位偏移和掩码进行测试，并根据结果设置 x0 的值。接下来，调用 __cpu_setup 函数来初始化处理器。最后，通过无条件分支指令 b 转移到 __primary_switch 位置。

SYM_CODE_END(primary_entry)

这行代码表示 primary_entry 函数的结束位置。

通过这段代码，可以看出 primary_entry 函数在启动过程中调用了几个辅助函数，其中包括记录 MMU 状态、保留引导参数和创建 ID 映射。然后根据条件分支和处理器初始化，将控制权切换到内核的 __primary_switch 位置，最终完成内核的初始化和启动。

三、record_mmu_state

SYM_CODE_START_LOCAL(record_mmu_state)
	mrs	x19, CurrentEL
	cmp	x19, #CurrentEL_EL2
	mrs	x19, sctlr_el1
	b.ne	0f
	mrs	x19, sctlr_el2
0:
CPU_LE( tbnz	x19, #SCTLR_ELx_EE_SHIFT, 1f	)
CPU_BE( tbz	x19, #SCTLR_ELx_EE_SHIFT, 1f	)
	tst	x19, #SCTLR_ELx_C		// Z := (C == 0)
	and	x19, x19, #SCTLR_ELx_M		// isolate M bit
	csel	x19, xzr, x19, eq		// clear x19 if Z
	ret

	/*
	 * Set the correct endianness early so all memory accesses issued
	 * before init_kernel_el() occur in the correct byte order. Note that
	 * this means the MMU must be disabled, or the active ID map will end
	 * up getting interpreted with the wrong byte order.
	 * 尽早设置正确的字节序，以便在 init_kernel_el() 之前发出的所有内存访问都以正确的字节顺序发生。
	 * 请注意，这意味着必须禁用 MMU，否则活动 ID 映射最终将以错误的字节顺序进行解释。
	 */
1:	eor	x19, x19, #SCTLR_ELx_EE
	bic	x19, x19, #SCTLR_ELx_M
	b.ne	2f
	pre_disable_mmu_workaround
	msr	sctlr_el2, x19
	b	3f
2:	pre_disable_mmu_workaround
	msr	sctlr_el1, x19
3:	isb
	mov	x19, xzr
	ret
SYM_CODE_END(record_mmu_state)

这段代码是一个本地符号 record_mmu_state，该函数的主要目的是记录 MMU（内存管理单元）的状态。下面是对每行代码的解释：

SYM_CODE_START_LOCAL(record_mmu_state)

这行代码表示这是一个本地符号，用于定义 record_mmu_state 函数的开始位置。

mrs x19, CurrentEL
cmp x19, #CurrentEL_EL2

这两行代码将当前异常级别的值读取到寄存器 x19 中，并与宏 CurrentEL_EL2 进行比较，以检查是否处于 EL2 异常级别。

mrs x19, sctlr_el1
b.ne 0f
mrs x19, sctlr_el2
0:

这几行代码根据上述比较结果，如果不在 EL2 异常级别，则将 sctlr_el1 寄存器的值读取到寄存器 x19 中；否则，跳转到标号 0 处并将 sctlr_el2 寄存器的值读取到寄存器 x19 中。

CPU_LE( tbnz x19, #SCTLR_ELx_EE_SHIFT, 1f )
CPU_BE( tbz x19, #SCTLR_ELx_EE_SHIFT, 1f )

这两行代码根据 CPU 的大小端模式（little-endian或big-endian），分别进行条件分支。根据 SCTLR_ELx_EE 寄存器的值，如果满足条件，则跳转到标号 1 处。

tst x19, #SCTLR_ELx_C          // Z := (C == 0)
and x19, x19, #SCTLR_ELx_M     // isolate M bit
csel x19, xzr, x19, eq

最低0.47元/天解锁文章

基于aarch64分析kernel源码 三：启动代码分析

一、内核启动入口点

二、primary_entry

三、record_mmu_state

基于aarch64分析kernel源码三：启动代码分析