一文讲透linux内核抢占

最新推荐文章于 2024-04-02 15:18:34 发布

原创最新推荐文章于 2024-04-02 15:18:34 发布 · 1.6k 阅读

7 ·

本内容遵循CC 4.0 BY-SA版权协议

linux内核杂谈专栏收录该内容

6 篇文章

订阅专栏

本文详细介绍了Linux内核的三种抢占模型：非抢占式（适合服务器）、抢占式（适合桌面系统）和自愿内核抢占（介于两者之间）。通过实验展示了不同模型下进程的执行情况，并分析了代码实现，解释了抢占点的增加如何使得内核在特定情况下变为可抢占。

三种抢占模型概述

在linux内核选项中存在存在三种抢占模型：

       │ │       ( ) No Forced Preemption (Server)                   │ │  
       │ │       (X) Voluntary Kernel Preemption (Desktop)           │ │  
       │ │       ( ) Preemptible Kernel (Low-Latency Desktop)

No Forced Preemption (Server)
非抢占式，适合server系统，这是因为非抢占式内核会减少进程上下文切换的次数，从而能将节省下来的这部分开销用在其他有用的任务上。另外这里要注意的是非抢占是指内核态任务非抢占，用户态任务是可以抢占的，试想如果用户态的任务都无法抢占，linux怎么还能称之为多任务操作系统。
Preemptible Kernel (Low-Latency Desktop)
抢占式，是指内核态任务是可以抢占的，适合桌面系统，这是因为桌面系统比较注重响应速度，所以该抢占时就要抢占。
Voluntary Kernel Preemption (Desktop)
自愿内核抢占，也就是说内核可以自愿被抢占，也可用于桌面系统，介于可抢占和不可抢占之间。

亲身感受抢占与非抢占内核

通过如下三个实验先来感受一下这三种抢占模型的效果：

实验代码代码内核部分

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/device.h>
#include <linux/io.h>
#include <linux/cdev.h>
#include <linux/fs.h>
#include <linux/types.h>
#include <linux/delay.h>

struct cdev cdev;
dev_t devno;
struct class *class;

int my_open(struct inode *inode, struct file *file)
{
	printk("open\n");
	mdelay(5000);
	return 0;
}

static const struct file_operations my_ops = {
	.owner =  THIS_MODULE,
	.open  =  my_open,
};
static int __init my_test_init(void)
{
	unsigned int major;
	
	printk("my test init\n");
	
	alloc_chrdev_region(&devno, 0, 1, "mytest");
	major = MAJOR(devno);
	cdev_init(&cdev, &my_ops);
	cdev_add(&cdev, devno, 1);
	
	class = class_create(THIS_MODULE, "mytest");	
	device_create(class, NULL, MKDEV(major, 0), NULL, "mytest");
	return 0;
}

static void __exit my_test_exit(void)
{
	cdev_del(&cdev);
	unregister_chrdev_region(devno, 1);
}


module_init(my_test_init);
module_exit(my_test_exit);

MODULE_AUTHOR("HanterLiu");
MODULE_DESCRIPTION("test");
MODULE_LICENSE("GPL v2");

代码很简单，创建了一个字符设备，（exit接口没有删干净，偷懒了）。打开该字符设备的时候，会忙等5秒钟。根据抢占和非抢占的定义，如果是抢占式内核，用户程序A打开该设备时，其他用户程序依然可以执行；如果是非抢占式内核，用户程序A打开该设备时，其他用户程序无法执行。

非抢占式内核体验
cat /dev/mytest &
虽然是在后台执行，但是执行该命令后，命令行已经不动了，5秒后，命令行才恢复。这是因为cat进程陷入内核态后，忙等5秒，这5秒内是无法抢占的，这显然无法被桌面系统接受。
抢占式内核体验
cat /dev/mytest &
执行该命令后，命令行依然可以相应其他命令。
自愿内核抢占
cat /dev/mytest &
效果跟非抢占式内核一样，why？
其实自愿内核抢占的意思是内核自愿被抢占，所以内核要自己声明可以被抢占。将my_open函数做如下修改：

int my_open(struct inode *inode, struct file *file)
{
	printk("open\n");
	mdelay(1000);
	_cond_resched();
	mdelay(4000);
	return 0;
}

再次执行cat /dev/mytest &命令，一直敲回车键，会发现1秒后命令行响应了一下，然后又不动了，4秒后，命令行恢复，也就是说1秒后，允许内核被抢占了一次。

简单撸一下代码

上面的实验能给人直观的感受，但是要想过瘾，还是要撸代码。进程调度/抢占这一块的代码很多，乍一看很难下手。幸运的是，linux内核主调度器给了明确的注释。

/*
 * __schedule() is the main scheduler function.
 *
 * The main means of driving the scheduler and thus entering this function are:
 *
 *   1. Explicit blocking: mutex, semaphore, waitqueue, etc.
 *
 *   2. TIF_NEED_RESCHED flag is checked on interrupt and userspace return
 *      paths. For example, see arch/x86/entry_64.S.
 *
 *      To drive preemption between tasks, the scheduler sets the flag in timer
 *      interrupt handler scheduler_tick().
 *
 *   3. Wakeups don't really cause entry into schedule(). They add a
 *      task to the run-queue and that's it.
 *
 *      Now, if the new task added to the run-queue preempts the current
 *      task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets
 *      called on the nearest possible occasion:
 *
 *       - If the kernel is preemptible (CONFIG_PREEMPT=y):
 *
 *         - in syscall or exception context, at the next outmost
 *           preempt_enable(). (this might be as soon as the wake_up()'s
 *           spin_unlock()!)
 *
 *         - in IRQ context, return from interrupt-handler to
 *           preemptible context
 *
 *       - If the kernel is not preemptible (CONFIG_PREEMPT is not set)
 *         then at the next:
 *
 *          - cond_resched() call
 *          - explicit schedule() call
 *          - return from syscall or exception to user-space
 *          - return from interrupt-handler to user-space
 *
 * WARNING: must be called with preemption disabled!
 */

注释说的非常清楚，timer tick中断函数里面判断是否需要重新调度，如果需要的话设置TIF_NEED_RESCHED，之后会在某些特定的点执行调度，也就是意味着可以抢占。

未设置CONFIG_PREEMPT的情况

有四种情况会触发重新调度

执行cond_resched
执行schedule
从系统调用或者异常返回用户空间
以ARM64为例，对应的代码是：

ret_fast_syscall:
	disable_irq				// disable interrupts
	str	x0, [sp, #S_X0]			// returned x0
	ldr	x1, [tsk, #TI_FLAGS]		// re-check for syscall tracing
	and	x2, x1, #_TIF_SYSCALL_WORK
	cbnz	x2, ret_fast_syscall_trace
	and	x2, x1, #_TIF_WORK_MASK
	cbnz	x2, work_pending

从中断返回用户空间
以ARM64为例，对应的代码是

ret_to_user:
	disable_irq				// disable interrupts
	ldr	x1, [tsk, #TI_FLAGS]
	and	x2, x1, #_TIF_WORK_MASK
	cbnz	x2, work_pending

非常明显了，如果从用户态入中断或者异常，则在退出中断或者异常的时候可以重新调度，也就是说用户态的任务可以被抢占，如果当前工作在内核态，则无法重新调度，也就是所谓的内核不可抢占。

设置了CONFIG_PREEMPT的情况

针对内核态新增了一些抢占点：

调用preempt_enable使能抢占的时候
从中断返回的时候，这里其实主要增加了从内核态进入中断时，退出中断的时候可以重新调度，这也就意味着内核态的任务可以被抢占。以ARM64为例，代码如下：

el1_irq:
	kernel_entry 1
	enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
	bl	trace_hardirqs_off
#endif

	irq_handler

#ifdef CONFIG_PREEMPT
	ldr	w24, [tsk, #TI_PREEMPT]		// get preempt count
	cbnz	w24, 1f				// preempt count != 0
	ldr	x0, [tsk, #TI_FLAGS]		// get flags
	tbz	x0, #TIF_NEED_RESCHED, 1f	// needs rescheduling?
	bl	el1_preempt
1:
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
	bl	trace_hardirqs_on
#endif
	kernel_exit 1
ENDPROC(el1_irq)