文章目录
1 watermark简介
linux物理内存中的每个zone都有自己独立的3个档位的watermark值:
- 最低水线(WMARK_MIN):如果内存区域的空闲页数小于最低水线,说明该内存区域的内存严重不足
- 低水线(WMARK_LOW):空闲页数小数低水线,说明该内存区域的内存轻微不足。默认情况下,该值为WMARK_MIN的125%
- 高水线(WMARK_HIGH):如果内存区域的空闲页数大于高水线,说明该内存区域水线充足。默认情况下,该值为WMARK_MAX的150%
在进行内存分配的时候,如果分配器(比如buddy allocator)发现当前空余内存的值低于”low”但高于”min”,说明现在内存面临一定的压力,那么在此次内存分配完成后,kswapd将被唤醒,以执行内存回收操作。在这种情况下,内存分配虽然会触发内存回收,但不存在被内存回收所阻塞的问题,两者的执行关系是异步的
对于kswapd来说,要回收多少内存才算完成任务呢?只要把空余内存的大小恢复到”high”对应的watermark值就可以了,当然,这取决于当前空余内存和”high”值之间的差距,差距越大,需要回收的内存也就越多。”low”可以被认为是一个警戒水位线,而”high”则是一个安全的水位线。
如果内存分配器发现空余内存的值低于了”min”,说明现在内存严重不足。这里要分两种情况来讨论,一种是默认的操作,此时分配器将同步等待内存回收完成,再进行内存分配,也就是direct reclaim。还有一种特殊情况,如果内存分配的请求是带了PF_MEMALLOC标志位的,并且现在空余内存的大小可以满足本次内存分配的需求,那么也将是先分配,再回收。
2 watermark相关结构体
每个zone如何记录各个档位的水线和如何获取每个zone各个档位的水线值???
struct zone {
/* Read-mostly fields */
/*
*水位值,WMARK_MIN/WMARK_LOV/WMARK_HIGH,页面分配器和kswapd页面回收中会用到,访问用
*_wmark_pages(zone) 宏
*/
unsigned long watermark[NR_WMARK];
unsigned long nr_reserved_highatomic;
//zone内存区域中预留的内存
long lowmem_reserve[MAX_NR_ZONES];
...
...
...
}
#define min_wmark_pages(z) (z->watermark[WMARK_MIN])
#define low_wmark_pages(z) (z->watermark[WMARK_LOW])
#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])
enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
NR_WMARK
};
3 watermark初始化
每个zone对应的3个档位的水线值是如何计算出来的呢?
在计算之前我们需要了解内核中几个全局变量值对应的意义
3.1 managed_pages,spanned_pages,present_pages三个值对应的意义
/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calculated as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pages in holes);
*
* managed_pages is present pages managed by the buddy system, which
* is calculated as (reserved_pages includes pages allocated by the
* bootmem allocator):
* managed_pages = present_pages - reserved_pages;
*/
unsigned long managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
- spanned_pages: 代表的是这个zone中所有的页,包含空洞,计算公式是: zone_end_pfn - zone_start_pfn
- present_pages: 代表的是这个zone中可用的所有物理页,计算公式是:spanned_pages-hole_pages
- managed_pages: 代表的是通过buddy管理的所有可用的页,计算公式是:present_pages - reserved_pages
- 三者的关系是: spanned_pages > present_pages > managed_pages
3.2 什么是min_free_kbytes
min_free_kbytes:
This is used to force the Linux VM to keep a minimum number
of kilobytes free. The VM uses this number to compute a
watermark[WMARK_MIN] value for each lowmem zone in the system.
Each lowmem zone gets a number of reserved free pages based
proportionally on its size.
Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations; if you set this to lower than 1024KB, your system will
become subtly broken, and prone to deadlock under high loads.
Setting this too high will OOM your machine instantly.
由上可知
- min_free_kbyes代表的是系统保留空闲内存的最低限
- watermark[WMARK_MIN]的值是通过min_free_kbytes计算出来的
内核是在函数init_per_zone_wmark_min中完成min_free_kbyes的初始化,这里的min_free_kbytes值有个下限和上限,就是最小不能低于128KiB,最大不能超过65536KiB。在实际应用中,通常建议为不低于1024KiB
//计算DMA_ZONE和NORAML_ZONE中超过高水位页的个数
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
3.3 Watermark的low,min和high这3档位初始化
3.3.1 内存水线初始化过程分析
ZONE_HIGHMEM的watermark值比较特殊,对于64位系统因为不再使用ZONE_HIGHMEM此处我们不做过多详细介绍,计算方法参考下图。
在非ZONE_HIGHMEM区域中,他们的watermark的min档位值是由计算出的min_free_kbytes来获得的。然后watermark的low和high档位值可以根据上面计算出的min档位值来获得。具体计算过程可以参考下图。

ps:根据上图可知lowmem_pages计算涉及到zone区域的高水线值,但是目前初始化阶段所有zone区域的高水线值还未初始化,所有估计此处每个zone的高水线值都为0,及lowmem_pages = ZONE_NORMAL->managed_pages + ZONE_DMA->managed_pages(正确性有待考证).
3.3.2 内存水线初始化内核代码分析
先上函数流程图

init_per_zone_wmark_min函数
//mm/page_alloc.c
/*
* Initialise min_free_kbytes.
*
* For small machines we want it small (128k min). For large machines
* we want it large (64MB max). But it is not linear, because network
* bandwidth does not increase linearly with machine size. We use
*
* min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
* min_free_kbytes = sqrt(lowmem_kbytes * 16)
*/
int __meminit init_per_zone_wmark_min(void)
{
unsigned long lowmem_kbytes;
int new_min_free_kbytes;
/*
*nr_free_buffer_pages是获取ZONE_DMA和ZONE_NORMAL区中高于high水位的总页数
*nr_free_buffer_pages = managed_pages - high_pages
*/
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
//根据本函数上的注释计算new_min_free_kbytes值
new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
/*
*根据new_min_free_kbytes值大小情况设置最终的min_free_kbytes(user_min_free_kbytes值未用户通过proc接口设
*置的值):
* 1.若new_min_free_kbytes > user_min_free_kbytes,则min_free_kbytes取new_min_free_kbytes对应的值,但大
* 小需控制在[128k,65536k]区间
* 2.若new_min_free_kbytes <= user_min_free_kbytes,则min_free_kbytes取user_min_free_kbytes对应的值.
*/
if (new_min_free_kbytes > user_min_free_kbytes) {
min_free_kbytes = new_min_free_kbytes;
if (min_free_kbytes < 128)
min_free_kbytes = 128;
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
} else {
pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
new_min_free_kbytes, user_min_free_kbytes);
}
//通过初始化后的min_free_kbytes计算各个zone的min,low,high值
setup_per_zone_wmarks();
refresh_zone_stat_thresholds();
setup_per_zone_lowmem_reserve();
#ifdef CONFIG_NUMA
setup_min_unmapped_ratio();
setup_min_slab_ratio();
#endif
return 0;
}
core_initcall(init_per_zone_wmark_min)
nr_free_buffer_pages函数
//该函数计算DMA_ZONE和NORAML_ZONE中超过高水位页的个数,初始化时zone高水位线为0
unsigned long nr_free_buffer_pages(void)
{
return nr_free_zone_pages(gfp_zone(GFP_USER));
}
EXPORT_SYMBOL_GPL(nr_free_buffer_pages);
/*
*对每个zone做计算,将每个zone中超过high水位的值放到sum中。超过高水位的页数计算方法是:managed_pages减去
*watermark[HIGH], 这样就可以获取到系统中各个zone超过高水位页的总和
*/
static unsigned long nr_free_zone_pages(int offset)
{
struct zoneref *z;
struct zone *zone;
/* Just pick one node, since fallback list is circular */
unsigned long sum = 0;
struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL);
for_each_zone_zonelist(zone, z, zonelist, offset) {
unsigned long size = zone->managed_pages;
unsigned long high = high_wmark_pages(zone);
if (size > high)
sum += size - high

本文解析了Linux内核中watermark的结构、初始化过程、min_free_kbytes和watermark_scale_factor的作用,以及用户如何自定义调节。重点介绍了zone的watermark判断机制,及其在存储服务、混合部署中的应用场景。

4685

被折叠的 条评论
为什么被折叠?



