libevent源码剖析之evbuffer

1 简介

  • evbuffer 是 libevent 提供的一个动态缓冲区,用于高效地管理和处理 I/O 数据流。evbuffer 在网络编程中非常有用,特别是在处理异步 I/O 和流式数据时。它抽象出了数据的读写缓冲区,实现了灵活的数据存储与管理机制,同时避免了频繁的内存拷贝操作。
  • 如果说select / poll / epoll / kqueue / devpoll / iocp多路复用解决了IO检测的问题话,那么reactor则解决1个client一个thread or process的问题,one loop per thread,即一个事件循环便可实现高并发;
  • 有了reactor事件驱动框架之后,程序员便可以编写高并发应用程序了,但是仍美中不足,开发者不能只将精力放在业务逻辑上处理上,还要处理网络IO操作、设计自己buffer的并提供简单易用的api接口,也就是说还要在reactor事件驱动框架之上再封装一层方可进行业务开发;
  • 而这正是libevent之evbuffer所要解决的问题,有了evbufferbufferevent模块之后,libevent便完全屏蔽了网络操作细节,我们称之为网络库,开发者便可把更多的精力放在业务逻辑的处理上。

1.1 evbuffer 作用

  • 动态缓冲区:evbuffer 能自动扩展缩小,灵活管理数据的增长。
  • 高效 I/O 操作:避免了直接的内存管理,减少了内存拷贝系统调用次数。
  • 支持分段数据:evbuffer 内部以链表形式组织,可以有效管理不连续内存块
  • 数据读取和写入:提供了方便的 API,用于从缓冲区读取数据或向缓冲区写入数据。

1.2 主要API

  • evbuffer_new():创建一个新的 evbuffer 缓冲区。

  • evbuffer_free():释放缓冲区。

  • evbuffer_add():向缓冲区追加数据。

  • evbuffer_remove():从缓冲区中移除数据。

  • evbuffer_add_buffer():将一个 evbuffer 的内容追加到另一个 evbuffer 中。

  • evbuffer_read():从文件描述符或 socket 读取数据到缓冲区。

  • evbuffer_write():将缓冲区中的数据写到文件描述符或 socket

  • evbuffer_add_vprintf():向缓存区中添加可变参数的内容,类似printf() / snprintf()。

  • evbuffer_add_iovec():iovec是一种用于在单次操作中处理多个数据块的结构体,它用于减少多次系统调用(如 readwrite)带来的开销,特别是在处理大量小数据块时效果显著。

  • evbuffer_add_file():用于读取file内容到evbuffer中,支持读取完整或部分file内容,支持sendfile / mmap / lseek&read读写。

  • evbuffer_add_file_segment():用以读取file的部分内容到evbuffer中,evbuffer_add_file()内部也是调用的此函数来实现对应功能。

    evbuffer在libevent网络库中所处层次如下: 

 evbuffer层次图

2 原理

2.1 数据存储结构

    evbuffer 通过链表结构组织数据块,每个数据块都是一个不连续的内存区域(也称为evbuffer_chain),这使得它在处理不连续数据时非常灵活。libevent 通过这一链表结构避免了频繁的内存拷贝,也提高了处理大数据流的效率。

struct evbuffer {
	/** The first chain in this buffer's linked list of chains. */
    // evbuffer的首个chain
	struct evbuffer_chain *first;
	/** The last chain in this buffer's linked list of chains. */
    // evbuffer最后一个chain
	struct evbuffer_chain *last;

	/** Pointer to the next pointer pointing at the 'last_with_data' chain.
	 *
	 * To unpack:
	 *
	 * The last_with_data chain is the last chain that has any data in it.
	 * If all chains in the buffer are empty, it is the first chain.
	 * If the buffer has no chains, it is NULL.
	 *
	 * The last_with_datap pointer points at _whatever 'next' pointer_
	 * points at the last_with_datap chain.  If the last_with_data chain
	 * is the first chain, or it is NULL, then the last_with_datap pointer
	 * is &buf->first.
	 */
    // 此chain指向evbuffer连续chain中最后1个有数据的chain
	struct evbuffer_chain **last_with_datap;

	/** Total amount of bytes stored in all chains.*/
    // evbuffer数据大小,单位byte
	size_t total_len;

	/** Number of bytes we have added to the buffer since we last tried to
	 * invoke callbacks. */
    // 自上次唤醒回调以来向evbuffer中新增的数据内容,单位byte
	size_t n_add_for_cb;
	/** Number of bytes we have removed from the buffer since we last
	 * tried to invoke callbacks. */
    // 自上次唤醒回调以来从evbuffer中山区的数据内容,单位byte
	size_t n_del_for_cb;

#ifndef EVENT__DISABLE_THREAD_SUPPORT
	/** A lock used to mediate access to this buffer. */
	void *lock;
#endif
	/** True iff we should free the lock field when we free this
	 * evbuffer. */
    // 释放evbuffer时,若应当free lock时为1
	unsigned own_lock : 1;
	/** True iff we should not allow changes to the front of the buffer
	 * (drains or prepends). */
    // 若不要改变evbuffer的front内容时为1
	unsigned freeze_start : 1;
	/** True iff we should not allow changes to the end of the buffer
	 * (appends) */
    // 若不能改变evbuffer的end内容时为1
	unsigned freeze_end : 1;
	/** True iff this evbuffer's callbacks are not invoked immediately
	 * upon a change in the buffer, but instead are deferred to be invoked
	 * from the event_base's loop.	Useful for preventing enormous stack
	 * overflows when we have mutually recursive callbacks, and for
	 * serializing callbacks in a single thread. */
    // 需要延迟再eventloop线程中执行的回调为1
	unsigned deferred_cbs : 1;
#ifdef _WIN32
	/** True iff this buffer is set up for overlapped IO. */
    // 表示使用IOCP
	unsigned is_overlapped : 1;
#endif
	/** Zero or more EVBUFFER_FLAG_* bits */
	ev_uint32_t flags;

	/** Used to implement deferred callbacks. */
    // 此evbuffer所属event_base实例,用来执行deferred回调
	struct event_base *cb_queue;

	/** A reference count on this evbuffer.	 When the reference count
	 * reaches 0, the buffer is destroyed.	Manipulated with
	 * evbuffer_incref and evbuffer_decref_and_unlock and
	 * evbuffer_free. */
    // 此evbuffer引用计数,=0时此evbuffer可销毁
	int refcnt;

	/** A struct event_callback handle to make all of this buffer's callbacks
	 * invoked from the event loop. */
    // 此deferred回调时需要唤醒eventloop线程并执行的
	struct event_callback deferred;

	/** A doubly-linked-list of callback functions */
    // 此evbuffer上的callback,是一个双向链表
	LIST_HEAD(evbuffer_cb_queue, evbuffer_cb_entry) callbacks;

	/** The parent bufferevent object this evbuffer belongs to.
	 * NULL if the evbuffer stands alone. */
    // 此evbuffer所属bufferevent
	struct bufferevent *parent;
};

2.2 数据块管理 (evbuffer_chain)

  • evbuffer_chain 结构表示 evbuffer 中的每一个数据块。每个数据块包含其实际的数据内容,以及该块的已用长度、可用长度等元数据

  • evbuffer_chain 的核心思想是分段存储:数据被分为多个块,每个块可以独立地被管理。这样在高效处理大数据流时,可以避免频繁的内存分配和释放。

    evbuffer_chain结构体定义: 

/** A single item in an evbuffer. */
struct evbuffer_chain {
	/** points to next buffer in the chain */
    // evbuffer是由一个个evbuffer_chain所组成,串起来组成链表
	struct evbuffer_chain *next;

	/** total allocation available in the buffer field. */
    // 此evbuffer_chain所分配的容量
	size_t buffer_len;

	/** unused space at the beginning of buffer or an offset into a
	 * file for sendfile buffers. */
    // 指向buffer未使用的首个byte内容
    // 或当用以加载file内容时,表示要便宜到misalign位置读取
	ev_misalign_t misalign;

	/** Offset into buffer + misalign at which to start writing.
	 * In other words, the total number of bytes actually stored
	 * in buffer. */
    // 此evbuffer_chain实际存储的数据大小
	size_t off;

	/** Set if special handling is required for this chain */
    // 此为file相关操作或内存pin
	unsigned flags;
#define EVBUFFER_FILESEGMENT	0x0001  /**< A chain used for a file segment */
#define EVBUFFER_SENDFILE	0x0002	/**< a chain used with sendfile */
#define EVBUFFER_REFERENCE	0x0004	/**< a chain with a mem reference */
#define EVBUFFER_IMMUTABLE	0x0008	/**< read-only chain */
	/** a chain that mustn't be reallocated or freed, or have its contents
	 * memmoved, until the chain is un-pinned. */
#define EVBUFFER_MEM_PINNED_R	0x0010
#define EVBUFFER_MEM_PINNED_W	0x0020
#define EVBUFFER_MEM_PINNED_ANY (EVBUFFER_MEM_PINNED_R|EVBUFFER_MEM_PINNED_W)
	/** a chain that should be freed, but can't be freed until it is
	 * un-pinned. */
#define EVBUFFER_DANGLING	0x0040
	/** a chain that is a referenced copy of another chain */
#define EVBUFFER_MULTICAST	0x0080

	/** number of references to this chain */
    // 此为此evbuffer_chain引用计数,为0时free
	int refcnt;

	/** Usually points to the read-write memory belonging to this
	 * buffer allocated as part of the evbuffer_chain allocation.
	 * For mmap, this can be a read-only buffer and
	 * EVBUFFER_IMMUTABLE will be set in flags.  For sendfile, it
	 * may point to NULL.
	 */
    // 此为此evbuffer_chain真实的数据内容
	unsigned char *buffer;
};

2.3 内存管理与扩展

  • 动态扩展:当向 evbuffer 中添加数据时,如果当前的数据块不能容纳新数据,libevent 会动态创建新的数据块并将其添加到链表中。

  • 数据拼接evbuffer 支持将不同的数据块拼接在一起,也支持从多个源(如其他 evbuffer)追加数据。

2.4 高效的数据移动

  • 分块读写:libevent 不会为了读取或写入操作频繁地移动数据,而是尽可能地通过内存指针管理来优化数据读写。比如,evbuffer_add() 和 evbuffer_remove() 不会复制数据,而是简单地调整内部的偏移量

  • 避免拷贝:通过使用 evbuffer_add_buffer() 等函数,可以将一个缓冲区的内容高效地移动到另一个缓冲区,而不需要实际的内存拷贝。

2.5 回调机制

evbuffer 支持在缓冲区的内容改变时触发回调函数。通过 evbuffer_add_cb() 注册回调函数,可以在特定的条件下(例如,缓冲区中数据长度达到某个阈值时)触发用户定义的处理函数。

2.6 evbuffer 的特点

  1. 分段存储,减少内存复制:由于 evbuffer 采用单链表组织数据,它避免了在数据拼接和拆分时进行不必要的内存复制操作。这在处理大数据流(如文件传输视频流等)时尤为重要。

  2. 自动内存管理:缓冲区会根据需要自动扩展缩小,用户无需手动管理内存分配和释放,降低了内存管理的复杂性。

  3. 零拷贝:通过 evbuffer 的链表结构和相关 API,libevent 提供了零拷贝的数据操作功能。例如,多个 evbuffer 之间的数据传递可以通过指针操作完成,而不需要实际的数据复制。

  4. 异步 I/O 支持:evbuffer 作为 libevent 的核心组件,完美支持异步 I/O 操作。它与 libevent 的 bufferevent 组件结合,可以处理网络连接中的非阻塞数据读写。

3 源码剖析

3.1 evbuffer结构

    evbuff结构如下图:

evbuffer结构图

解释如下: 

  • evbuffer不存储实际的buffer内容,它是用来管理存储实际数据的evbuffer_chain单链表的;
  • evbuffer主要有1个指向evbuffer_chain的首尾指针first / last,以及last_with_datap指针,该指针指向evbuffer中最后一个带有数据的evbuffer_chain节点的;
  • evbuffer_chain,是evbuffer的一个节点,实际数据内容存储在此,其unsigned char* buffer指向实际数据内容,len表示buffer容量,off表示实际长度,misalign表示偏移处,再有一个evbuffer_chain *next指针指向此单链表的下一个evbuffer_chain节点;

    由于evbuffer相关api接口较多,本文只介绍几个重要且常用的api实现: 

3.2 evbuffer_add

  • 向evbuffer中追加数据,只接受const void* data_in,换言之,接受堆上或栈上内存的追加;
  • 在evbuffer尾部上找合适空间,如果空间足够则copy走,不然则分配一个evbuffer_chain节点来追加数据,并将此节点追加的evbuffer尾部;
  • evbuffer_add的添加,不可避免的要copy数据;
/* Adds data to an event buffer */

int
evbuffer_add(struct evbuffer *buf, const void *data_in, size_t datlen)
{
	struct evbuffer_chain *chain, *tmp;
	const unsigned char *data = data_in;
	size_t remain, to_alloc;
	int result = -1;

	EVBUFFER_LOCK(buf);

	if (buf->freeze_end) {
		goto done;
	}
	/* Prevent buf->total_len overflow */
	if (datlen > EV_SIZE_MAX - buf->total_len) {
		goto done;
	}

	if (*buf->last_with_datap == NULL) {
		chain = buf->last;
	} else {
		chain = *buf->last_with_datap;
	}

	/* If there are no chains allocated for this buffer, allocate one
	 * big enough to hold all the data. */
    // 此evbuffer无空间了,需要重新分配一个evbuffer_chain节点来存储
    // 并将所分配的evbuffer_chain节点追加到evbuffer尾部
	if (chain == NULL) {
		chain = evbuffer_chain_new(datlen);
		if (!chain)
			goto done;
		evbuffer_chain_insert(buf, chain);
	}

	if ((chain->flags & EVBUFFER_IMMUTABLE) == 0) {
		/* Always true for mutable buffers */
		EVUTIL_ASSERT(chain->misalign >= 0 &&
		    (ev_uint64_t)chain->misalign <= EVBUFFER_CHAIN_MAX);
		remain = chain->buffer_len - (size_t)chain->misalign - chain->off;
		if (remain >= datlen) {
			/* there's enough space to hold all the data in the
			 * current last chain */
            // 此evbuffer_chain节点空间不够
			memcpy(chain->buffer + chain->misalign + chain->off,
			    data, datlen);
			chain->off += datlen;
			buf->total_len += datlen;
			buf->n_add_for_cb += datlen;
			goto out;
		} else if (!CHAIN_PINNED(chain) &&
		    evbuffer_chain_should_realign(chain, datlen)) {
			/* we can fit the data into the misalignment */
			evbuffer_chain_align(chain);

			memcpy(chain->buffer + chain->off, data, datlen);
			chain->off += datlen;
			buf->total_len += datlen;
			buf->n_add_for_cb += datlen;
			goto out;
		}
	} else {
		/* we cannot write any data to the last chain */
        // 此evbuffer_chain节点不可修改,需要重新分配evbuffer_chain节点来存储
        // 将新分配的evbuffer_chain节点追加的evbuffer尾部
		remain = 0;
	}

	/* we need to add another chain */
	to_alloc = chain->buffer_len;
    // 新evbuffer_chain节点最多分配2048字节的空间
	if (to_alloc <= EVBUFFER_CHAIN_MAX_AUTO_SIZE/2)
		to_alloc <<= 1;
    // 实际长度>to_alloc,则以实际datalen长度分配evbuffer_chain节点
	if (datlen > to_alloc)
		to_alloc = datlen;
	tmp = evbuffer_chain_new(to_alloc);
	if (tmp == NULL)
		goto done;

	if (remain) {
        // 此evbuffer_chain节点空间不够存储datalen内容
        // 剩下的重新分配evbuffer_chain节点来存储,并将此节点追加的evbuffer尾部
		memcpy(chain->buffer + chain->misalign + chain->off,
		    data, remain);
		chain->off += remain;
		buf->total_len += remain;
		buf->n_add_for_cb += remain;
	}

	data += remain;
	datlen -= remain;

	memcpy(tmp->buffer, data, datlen);
	tmp->off = datlen;
    // 最后将此evbuffer_chain节点追加到evbuffer尾部
	evbuffer_chain_insert(buf, tmp);
	buf->n_add_for_cb += datlen;

out:
	evbuffer_invoke_callbacks_(buf);
	result = 0;
done:
	EVBUFFER_UNLOCK(buf);
	return result;
}

3.3 evbuffer_add_buffer

  • 将1个evbuffer中追加到1个指定的evbuffer尾部,2个变为1个,拼起来;
  • evbuffer_add_buffer() 会将源 evbuffer(src)的所有数据链表(evbuffer_chain)移动或复制到目标 evbuffer(dst)中,并确保 src 变为空,所有的数据都转移到 dst 中。这意味着它不会进行实际的内存拷贝,而是直接操作链表指针,尽可能避免数据的深度复制以提升性能。

int
evbuffer_add_buffer(struct evbuffer *outbuf, struct evbuffer *inbuf)
{
	struct evbuffer_chain *pinned, *last;
	size_t in_total_len, out_total_len;
	int result = 0;

	EVBUFFER_LOCK2(inbuf, outbuf);
	in_total_len = inbuf->total_len;
	out_total_len = outbuf->total_len;

	if (in_total_len == 0 || outbuf == inbuf)
		goto done;

	if (outbuf->freeze_end || inbuf->freeze_start) {
		result = -1;
		goto done;
	}

	if (PRESERVE_PINNED(inbuf, &pinned, &last) < 0) {
		result = -1;
		goto done;
	}

	if (out_total_len == 0) {
		/* There might be an empty chain at the start of outbuf; free
		 * it. */
        // 若目标evbuffer是空的,直接copy即可
        // 此处的copy只copy管理元数据而已,时间复杂度为O(1)
		evbuffer_free_all_chains(outbuf->first);
		COPY_CHAIN(outbuf, inbuf);
	} else {
        // 将inbuf的evbuffer追加到outbuf的evbuffer中
        // 单链表追加,时间复杂度O(1)
		APPEND_CHAIN(outbuf, inbuf);
	}

	RESTORE_PINNED(inbuf, pinned, last);

	inbuf->n_del_for_cb += in_total_len;
	outbuf->n_add_for_cb += in_total_len;

    // evbuffer变化之后回调
	evbuffer_invoke_callbacks_(inbuf);
	evbuffer_invoke_callbacks_(outbuf);

done:
	EVBUFFER_UNLOCK2(inbuf, outbuf);
	return result;
}

3.4 evbuffer_add_vprintf

evbuffer_add_vprintf() 是 libevent 中的一个函数,用于将格式化的字符串(类似于 vprintf 的功能)追加到 evbuffer 中。它的实现原理基于 vsnprintf 函数来进行格式化输出,并通过 evbuffer_add 将数据写入到 evbuffer。

  • 初步格式化evbuffer_add_vprintf() 首先调用 vsnprintf() 对格式化字符串进行尝试性的格式化,并计算出格式化结果的所需长度。vsnprintf() 不会实际写入数据,而是通过返回值来告诉你最终生成的字符串需要多大的缓冲区。

  • 分配/扩展缓冲区:根据 vsnprintf() 返回的长度,函数会判断 evbuffer 是否有足够的空间来容纳该格式化字符串。如果现有的 evbuffer 容量不够,它会自动扩展缓冲区,以便存储数据。

  • 再次格式化并写入数据:当缓冲区准备好之后,再次调用 vsnprintf(),这一次将格式化后的数据写入到 evbuffer 中。通过 evbuffer_add() 函数将格式化的字符串内容追加到 evbuffer 的尾部。

  • 返回值:evbuffer_add_vprintf() 最终返回写入到 evbuffer 的字节数,即格式化后的字符串长度。

int
evbuffer_add_vprintf(struct evbuffer *buf, const char *fmt, va_list ap)
{
	char *buffer;
	size_t space;
	int sz, result = -1;
	va_list aq;
	struct evbuffer_chain *chain;


	EVBUFFER_LOCK(buf);

	if (buf->freeze_end) {
		goto done;
	}

	/* make sure that at least some space is available */
    // 由于是可变参数,起初并不知道其长度,通过试探性得出
    // 因此,开始expand 64个byte,若不够则根据vsnprintf返回长度继续处理
	if ((chain = evbuffer_expand_singlechain(buf, 64)) == NULL)
		goto done;

	for (;;) {
#if 0
		size_t used = chain->misalign + chain->off;
		buffer = (char *)chain->buffer + chain->misalign + chain->off;
		EVUTIL_ASSERT(chain->buffer_len >= used);
		space = chain->buffer_len - used;
#endif
		buffer = (char*) CHAIN_SPACE_PTR(chain);
		space = (size_t) CHAIN_SPACE_LEN(chain);

#ifndef va_copy
#define	va_copy(dst, src)	memcpy(&(dst), &(src), sizeof(va_list))
#endif
		va_copy(aq, ap);

		sz = evutil_vsnprintf(buffer, space, fmt, aq);

		va_end(aq);

		if (sz < 0)
			goto done;
		if (INT_MAX >= EVBUFFER_CHAIN_MAX &&
		    (size_t)sz >= EVBUFFER_CHAIN_MAX)
			goto done;
		if ((size_t)sz < space) {
            // 已copy完,则更新元数据,唤醒evbuffer回调
			chain->off += sz;
			buf->total_len += sz;
			buf->n_add_for_cb += sz;

			advance_last_with_data(buf);
			evbuffer_invoke_callbacks_(buf);
			result = sz;
			goto done;
		}
        // 根据vsnprintf返回值扩容sz + 1 byte
		if ((chain = evbuffer_expand_singlechain(buf, sz + 1)) == NULL)
			goto done;
	}
	/* NOTREACHED */

done:
	EVBUFFER_UNLOCK(buf);
	return result;
}

3.5 evbuffer_add_iovec

    evbuffer_add_iovec() 主要是为了高效处理分散内存数据块,而不需要将这些块合并到一个连续的内存区域中再追加到 evbuffer。它使用 POSIX 标准的 iovec 结构,它允许我们一次性描述多个不连续内存块,并将这些块写入目标缓冲区。

  • 遍历 iovec:iovec 是一个结构数组,每个结构包含一个内存指针和该块内存的大小。evbuffer_add_iovec() 通过遍历所有的 iovec,逐个将它们的内容追加到 evbuffer。

  • 检查 evbuffer 的可用空间:在向 evbuffer 中追加数据时,必须确保它的缓冲区有足够的空间容纳即将写入的 iovec 数据。如果现有的 evbuffer 空间不足,它会动态扩展其链表中的缓冲区。

  • evbuffer_chain 追加数据:evbuffer 由多个 evbuffer_chain 组成。每个链表节点 (evbuffer_chain) 包含一个缓冲区以及相关的元数据。evbuffer_add_iovec() 会将 iovec 的数据追加到当前的 evbuffer_chain,如果当前链表节点的空间不足,则会创建一个新的 evbuffer_chain,并将数据写入新链表节点。

  • 避免不必要的内存拷贝:使用 iovec 结构,可以避免多次小块内存的拷贝。在适合的情况下,它会直接将 iovec 中的内存指针作为 evbuffer_chain 的一部分,而不是进行一次性的内存拷贝。这减少了内存复制操作,从而提高了性能。

  • 更新 evbuffer 的元数据:每追加一段数据,都会更新 evbuffer 的总长度(total_len)和链表中的每个 evbuffer_chain 的偏移量等元数据。通过这些元数据,libevent 可以管理缓冲区中的数据和内存结构。

size_t
evbuffer_add_iovec(struct evbuffer * buf, struct evbuffer_iovec * vec, int n_vec) {
	int n;
	size_t res;
	size_t to_alloc;

	EVBUFFER_LOCK(buf);

	res = to_alloc = 0;

	for (n = 0; n < n_vec; n++) {
		to_alloc += vec[n].iov_len;
	}

	if (evbuffer_expand_fast_(buf, to_alloc, 2) < 0) {
		goto done;
	}

	for (n = 0; n < n_vec; n++) {
		/* XXX each 'add' call here does a bunch of setup that's
		 * obviated by evbuffer_expand_fast_, and some cleanup that we
		 * would like to do only once.  Instead we should just extract
		 * the part of the code that's needed. */

		if (evbuffer_add(buf, vec[n].iov_base, vec[n].iov_len) < 0) {
			goto done;
		}

		res += vec[n].iov_len;
	}

done:
    EVBUFFER_UNLOCK(buf);
    return res;
}

3.6 evbuffer_add_file

     evbuffer_add_file() 是 libevent 中用于将文件数据追加到 evbuffer 的函数。它的实现允许将文件内容高效地添加到 evbuffer 中,通常使用的是 零拷贝 技术(在支持的操作系统上,如 Linux)。这样可以避免将文件内容加载到用户态再拷贝到 evbuffer,而是直接在内核态中操作文件数据。这一机制在性能上具有显著优势,尤其是在处理大文件时。

    evbuffer_add_file() 的实现基于不同操作系统的文件读写优化机制,如 Linux 的 sendfile() 系统调用。其核心思想是将文件数据直接传输到网络套接字或缓冲区,而不通过用户态复制数据,从而提高效率。

  • 打开文件并准备读取:函数接受一个文件描述符 fd,并读取文件中的指定数据范围。它会首先确定文件的大小或需要读取的字节范围。

  • 检查操作系统类型:根据操作系统不同,Libevent 可以采用不同的策略。对某些操作系统如 Linux,它会尝试使用零拷贝(如 sendfile())。而在其他操作系统(如 Windows 或不支持 sendfile() 的系统),则使用传统的 read() 系统调用来读取文件数据。

  • 零拷贝优化(Linux 使用 sendfile()):在 Linux 上,evbuffer_add_file() 通过 sendfile() 系统调用,直接将文件数据从磁盘传输到套接字或 evbuffer,而不经过用户空间。这是一种零拷贝的传输方式,避免了数据在内核空间和用户空间之间的复制。 如果 sendfile() 成功,则文件数据被“映射”到 evbuffer 的输出链中,后续网络传输时会直接使用内核态的文件数据。

  • 传统文件读取(不支持 sendfile() 的系统):如果 sendfile() 不可用,Libevent 则采用标准的 read() 或 pread() 调用。它会读取文件内容到一个用户态缓冲区中,然后通过 evbuffer_add() 将该缓冲区中的数据追加到 evbuffer 中。

  • 处理错误情况:如果读取文件过程中出现错误,evbuffer_add_file() 会返回一个负值并停止操作。它可能会处理诸如文件描述符无效、权限问题或磁盘读取错误等情况。

int
evbuffer_add_file(struct evbuffer *buf, int fd, ev_off_t offset, ev_off_t length)
{
	struct evbuffer_file_segment *seg;
	unsigned flags = EVBUF_FS_CLOSE_ON_FREE;
	int r;

	seg = evbuffer_file_segment_new(fd, offset, length, flags);
	if (!seg)
		return -1;
	r = evbuffer_add_file_segment(buf, seg, 0, length);
	if (r == 0)
		evbuffer_file_segment_free(seg);
	return r;
}

3.7 evbuffer_add_file_segment 

  • evbuffer_add_file_segment() 是 Libevent 中的一个用于将文件片段(file segment)追加到 evbuffer 的函数。相比于 evbuffer_add_file(),它允许指定文件的特定片段(即偏移和长度),从而更加灵活地处理文件内容。

  • evbuffer_add_file_segment() 的设计同样基于高效的数据传输机制,特别是支持零拷贝的系统,如 Linux 上的 sendfile()。它允许用户将文件的一个片段(从指定的偏移量开始,读取特定的字节数)添加到 evbuffer 中,并进行高效的传输。

    基本步骤:

  • 文件片段的定义:文件片段由一个 evbuffer_file_segment 结构体定义,包含文件描述符、偏移量、片段长度以及文件的引用计数。evbuffer_file_segment 用于表示文件的一个逻辑片段,并可以被多个 evbuffer 实例引用,从而减少多次加载文件的开销。

  • 引用计数机制:文件片段的使用支持引用计数(reference counting),因此可以多次使用同一个文件片段,且只有在所有引用被释放时才会关闭文件或释放资源。

  • 添加到 evbuffer:文件片段通过 evbuffer_add_file_segment() 函数被添加到 evbuffer 中。此时,文件片段并不会立即被加载到内存,而是在真正需要将 evbuffer 中的数据发送出去时才会进行文件读取或传输。在支持零拷贝的系统中(如 Linux 上的 sendfile()),可以直接通过内核将文件片段传输到网络套接字,避免了将文件数据加载到用户空间的步骤。

  • 操作系统适配:类似于 evbuffer_add_file(),evbuffer_add_file_segment() 也会根据操作系统的特性来决定如何处理文件传输。在 Linux 上,会优先尝试使用 sendfile() 进行零拷贝传输;如果不可用,则回退到传统的 read() 或 pread() 来手动读取文件片段。

int
evbuffer_add_file_segment(struct evbuffer *buf,
    struct evbuffer_file_segment *seg, ev_off_t offset, ev_off_t length)
{
	struct evbuffer_chain *chain;
	struct evbuffer_chain_file_segment *extra;
	int can_use_sendfile = 0;

	EVBUFFER_LOCK(buf);
	EVLOCK_LOCK(seg->lock, 0);
	if (buf->flags & EVBUFFER_FLAG_DRAINS_TO_FD) {
		can_use_sendfile = 1;
	} else {
		if (!seg->contents) {
			if (evbuffer_file_segment_materialize(seg)<0) {
				EVLOCK_UNLOCK(seg->lock, 0);
				EVBUFFER_UNLOCK(buf);
				return -1;
			}
		}
	}
	++seg->refcnt;
	EVLOCK_UNLOCK(seg->lock, 0);

	if (buf->freeze_end)
		goto err;

	if (length < 0) {
		if (offset > seg->length)
			goto err;
		length = seg->length - offset;
	}

	/* Can we actually add this? */
	if (offset+length > seg->length)
		goto err;

	chain = evbuffer_chain_new(sizeof(struct evbuffer_chain_file_segment));
	if (!chain)
		goto err;
	extra = EVBUFFER_CHAIN_EXTRA(struct evbuffer_chain_file_segment, chain);

	chain->flags |= EVBUFFER_IMMUTABLE|EVBUFFER_FILESEGMENT;
	if (can_use_sendfile && seg->can_sendfile) {
		chain->flags |= EVBUFFER_SENDFILE;
		chain->misalign = seg->file_offset + offset;
		chain->off = length;
		chain->buffer_len = chain->misalign + length;
	} else if (seg->is_mapping) {
#ifdef _WIN32
		ev_uint64_t total_offset = seg->mmap_offset+offset;
		ev_uint64_t offset_rounded=0, offset_remaining=0;
		LPVOID data;
		if (total_offset) {
			SYSTEM_INFO si;
			memset(&si, 0, sizeof(si)); /* cargo cult */
			GetSystemInfo(&si);
			offset_remaining = total_offset % si.dwAllocationGranularity;
			offset_rounded = total_offset - offset_remaining;
		}
		data = MapViewOfFile(
			seg->mapping_handle,
			FILE_MAP_READ,
			offset_rounded >> 32,
			offset_rounded & 0xfffffffful,
			length + offset_remaining);
		if (data == NULL) {
			mm_free(chain);
			goto err;
		}
		chain->buffer = (unsigned char*) data;
		chain->buffer_len = length+offset_remaining;
		chain->misalign = offset_remaining;
		chain->off = length;
#else
		chain->buffer = (unsigned char*)(seg->contents + offset);
		chain->buffer_len = length;
		chain->off = length;
#endif
	} else {
		chain->buffer = (unsigned char*)(seg->contents + offset);
		chain->buffer_len = length;
		chain->off = length;
	}

	extra->segment = seg;
	buf->n_add_for_cb += length;
	evbuffer_chain_insert(buf, chain);

	evbuffer_invoke_callbacks_(buf);

	EVBUFFER_UNLOCK(buf);

	return 0;
err:
	EVBUFFER_UNLOCK(buf);
	evbuffer_file_segment_free(seg); /* Lowers the refcount */
	return -1;
}

3.8 defer callback

    再来分析下evbufferdefer callback延迟回调。所谓defer callback,即是需要延迟到eventloop事件循环线程下次cycle中执行的回调。

    以下是延迟回调相关接口实现:

/*
 * Copyright (c) 2009-2012 Niels Provos and Nick Mathewson
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. The name of the author may not be used to endorse or promote products
 *    derived from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
#ifndef DEFER_INTERNAL_H_INCLUDED_
#define DEFER_INTERNAL_H_INCLUDED_

#ifdef __cplusplus
extern "C" {
#endif

#include "event2/event-config.h"
#include "evconfig-private.h"

#include <sys/queue.h>

struct event_callback;
typedef void (*deferred_cb_fn)(struct event_callback *, void *);

/**
   Initialize an empty, non-pending event_callback.

   @param deferred The struct event_callback structure to initialize.
   @param priority The priority that the callback should run at.
   @param cb The function to run when the struct event_callback executes.
   @param arg The function's second argument.
 */
void event_deferred_cb_init_(struct event_callback *, ev_uint8_t, deferred_cb_fn, void *);
/**
   Change the priority of a non-pending event_callback.
 */
void event_deferred_cb_set_priority_(struct event_callback *, ev_uint8_t);
/**
   Cancel a struct event_callback if it is currently scheduled in an event_base.
 */
void event_deferred_cb_cancel_(struct event_base *, struct event_callback *);
/**
   Activate a struct event_callback if it is not currently scheduled in an event_base.

   Return true if it was not previously scheduled.
 */
int event_deferred_cb_schedule_(struct event_base *, struct event_callback *);

#ifdef __cplusplus
}
#endif

#endif /* EVENT_INTERNAL_H_INCLUDED_ */

    先来介绍下延迟回调的原理: 

  • 调用event_deferred_cb_init_先对evbuffer注册一个延迟回调及其优先级,;
  • 在evbuffer内容发生变更是会调用evbuffer_invoke_callbacks_,这个方法会调用到event_deferred_cb_schedule_,此函数会把延迟回调挪到struct evcallback_list activequeues中,等待eventloop线程下次cycle执行;
  • 延迟回调是在eventloop线程中执行的,如果换起时evbuffer_invoke_callbacks_不在当先eventloop线程则会将执行延迟回调的eventloop线程唤醒;
  • 如果延迟回调函数个数>32,延迟回调挪到struct evcallback_list active_later_queue;中,相反则是挪到struct evcallback_list activequeues,前者执行的时序晚于后者,在后者的下次cycle中执行;

    evbuffer之defer callback延迟回调设置对外接口: 

int
evbuffer_defer_callbacks(struct evbuffer *buffer, struct event_base *base)
{
	EVBUFFER_LOCK(buffer);
	buffer->cb_queue = base;
	buffer->deferred_cbs = 1;
    // evbuffer延迟回调初始化操作,并设置执行的优先级
	event_deferred_cb_init_(&buffer->deferred,
	    event_base_get_npriorities(base) / 2,
	    evbuffer_deferred_callback, buffer);
	EVBUFFER_UNLOCK(buffer);
	return 0;
}

     当evbuffer变化时,会调用evbuffer_invoke_callbacks_换起延迟回调的执行:

void
evbuffer_invoke_callbacks_(struct evbuffer *buffer)
{
	if (LIST_EMPTY(&buffer->callbacks)) {
		buffer->n_add_for_cb = buffer->n_del_for_cb = 0;
		return;
	}

	if (buffer->deferred_cbs) {
		if (event_deferred_cb_schedule_(buffer->cb_queue, &buffer->deferred)) {
			evbuffer_incref_and_lock_(buffer);
			if (buffer->parent)
				bufferevent_incref_(buffer->parent);
		}
		EVBUFFER_UNLOCK(buffer);
	}

	evbuffer_run_callbacks(buffer, 0);
}

    再展开event_deferred_cb_schedule_看其实现: 

#de
fine MAX_DEFERREDS_QUEUED 32
// 延迟回调调度
int
event_deferred_cb_schedule_(struct event_base *base, struct event_callback *cb)
{
	int r = 1;
	if (!base)
		base = current_base;
	EVBASE_ACQUIRE_LOCK(base, th_base_lock);
    // 单次eventloop线程cycle最多执行32个延迟回调
	if (base->n_deferreds_queued > MAX_DEFERREDS_QUEUED) {
		r = event_callback_activate_later_nolock_(base, cb);
	} else {
		r = event_callback_activate_nolock_(base, cb);
		if (r) {
			++base->n_deferreds_queued;
		}
	}
	EVBASE_RELEASE_LOCK(base, th_base_lock);
	return r;
}

// 将延迟回调挪到struct evcallback_list active_later_queue队列中
int
event_callback_activate_later_nolock_(struct event_base *base,
    struct event_callback *evcb)
{
	if (evcb->evcb_flags & (EVLIST_ACTIVE|EVLIST_ACTIVE_LATER))
		return 0;

	event_queue_insert_active_later(base, evcb);
	if (EVBASE_NEED_NOTIFY(base))
		evthread_notify_base(base);
	return 1;
}

// 将延迟回调挪到struct evcallback_list activequeues队列中
int
event_callback_activate_nolock_(struct event_base *base,
    struct event_callback *evcb)
{
	int r = 1;

	if (evcb->evcb_flags & EVLIST_FINALIZING)
		return 0;

	switch (evcb->evcb_flags & (EVLIST_ACTIVE|EVLIST_ACTIVE_LATER)) {
	default:
		EVUTIL_ASSERT(0);
	case EVLIST_ACTIVE_LATER:
		event_queue_remove_active_later(base, evcb);
		r = 0;
		break;
	case EVLIST_ACTIVE:
		return 0;
	case 0:
		break;
	}

	event_queue_insert_active(base, evcb);

    // 唤醒eventloop线程
	if (EVBASE_NEED_NOTIFY(base))
		evthread_notify_base(base);

	return r;
}

    以上关于evbuffer的延迟回调是需要在libevent之外发起,而在libevent中,对延迟回调的使用,z bufferevent中已经有了例子: 

int
bufferevent_init_common_(struct bufferevent_private *bufev_private,
    struct event_base *base,
    const struct bufferevent_ops *ops,
    enum bufferevent_options options)
{
	struct bufferevent *bufev = &bufev_private->bev;

	if (!bufev->input) {
		if ((bufev->input = evbuffer_new()) == NULL)
			return -1;
	}

	if (!bufev->output) {
		if ((bufev->output = evbuffer_new()) == NULL) {
			evbuffer_free(bufev->input);
			return -1;
		}
	}

	bufev_private->refcnt = 1;
	bufev->ev_base = base;

	/* Disable timeouts. */
	evutil_timerclear(&bufev->timeout_read);
	evutil_timerclear(&bufev->timeout_write);

	bufev->be_ops = ops;

	bufferevent_ratelim_init_(bufev_private);

	/*
	 * Set to EV_WRITE so that using bufferevent_write is going to
	 * trigger a callback.  Reading needs to be explicitly enabled
	 * because otherwise no data will be available.
	 */
	bufev->enabled = EV_WRITE;

#ifndef EVENT__DISABLE_THREAD_SUPPORT
	if (options & BEV_OPT_THREADSAFE) {
		if (bufferevent_enable_locking_(bufev, NULL) < 0) {
			/* cleanup */
			evbuffer_free(bufev->input);
			evbuffer_free(bufev->output);
			bufev->input = NULL;
			bufev->output = NULL;
			return -1;
		}
	}
#endif
	if ((options & (BEV_OPT_DEFER_CALLBACKS|BEV_OPT_UNLOCK_CALLBACKS))
	    == BEV_OPT_UNLOCK_CALLBACKS) {
		event_warnx("UNLOCK_CALLBACKS requires DEFER_CALLBACKS");
		return -1;
	}
	if (options & BEV_OPT_UNLOCK_CALLBACKS)
        // 单线程版本,初始化延迟回调
		event_deferred_cb_init_(
		    &bufev_private->deferred,
		    event_base_get_npriorities(base) / 2,
		    bufferevent_run_deferred_callbacks_unlocked,
		    bufev_private);
	else
        // 线程安全版本,初始化延迟回调
		event_deferred_cb_init_(
		    &bufev_private->deferred,
		    event_base_get_npriorities(base) / 2,
		    bufferevent_run_deferred_callbacks_locked,
		    bufev_private);

	bufev_private->options = options;

	evbuffer_set_parent_(bufev->input, bufev);
	evbuffer_set_parent_(bufev->output, bufev);

	return 0;
}

     当reactor检测到事件就绪时,会通过唤起eventloop线程执行延迟回调:读事件就绪回调函数、写事件就绪回调函数、event事件(connect成功或出错)就绪回调函数:

// 线程安全版本
// 当事件就绪时调度延迟回调,就会执行此函数
static void
bufferevent_run_deferred_callbacks_locked(struct event_callback *cb, void *arg)
{
	struct bufferevent_private *bufev_private = arg;
	struct bufferevent *bufev = &bufev_private->bev;

	BEV_LOCK(bufev);
	if ((bufev_private->eventcb_pending & BEV_EVENT_CONNECTED) &&
	    bufev->errorcb) {
		/* The "connected" happened before any reads or writes, so
		   send it first. */
		bufev_private->eventcb_pending &= ~BEV_EVENT_CONNECTED;
		bufev->errorcb(bufev, BEV_EVENT_CONNECTED, bufev->cbarg);
	}
	if (bufev_private->readcb_pending && bufev->readcb) {
		bufev_private->readcb_pending = 0;
		bufev->readcb(bufev, bufev->cbarg);
	}
	if (bufev_private->writecb_pending && bufev->writecb) {
		bufev_private->writecb_pending = 0;
		bufev->writecb(bufev, bufev->cbarg);
	}
	if (bufev_private->eventcb_pending && bufev->errorcb) {
		short what = bufev_private->eventcb_pending;
		int err = bufev_private->errno_pending;
		bufev_private->eventcb_pending = 0;
		bufev_private->errno_pending = 0;
		EVUTIL_SET_SOCKET_ERROR(err);
		bufev->errorcb(bufev, what, bufev->cbarg);
	}
	bufferevent_decref_and_unlock_(bufev);
}

// 此函数是上面函数的单线程版本
static void
bufferevent_run_deferred_callbacks_unlocked(struct event_callback *cb, void *arg)
{
	struct bufferevent_private *bufev_private = arg;
	struct bufferevent *bufev = &bufev_private->bev;

	BEV_LOCK(bufev);
#define UNLOCKED(stmt) \
	do { BEV_UNLOCK(bufev); stmt; BEV_LOCK(bufev); } while(0)

	if ((bufev_private->eventcb_pending & BEV_EVENT_CONNECTED) &&
	    bufev->errorcb) {
		/* The "connected" happened before any reads or writes, so
		   send it first. */
		bufferevent_event_cb errorcb = bufev->errorcb;
		void *cbarg = bufev->cbarg;
		bufev_private->eventcb_pending &= ~BEV_EVENT_CONNECTED;
		UNLOCKED(errorcb(bufev, BEV_EVENT_CONNECTED, cbarg));
	}
	if (bufev_private->readcb_pending && bufev->readcb) {
		bufferevent_data_cb readcb = bufev->readcb;
		void *cbarg = bufev->cbarg;
		bufev_private->readcb_pending = 0;
		UNLOCKED(readcb(bufev, cbarg));
	}
	if (bufev_private->writecb_pending && bufev->writecb) {
		bufferevent_data_cb writecb = bufev->writecb;
		void *cbarg = bufev->cbarg;
		bufev_private->writecb_pending = 0;
		UNLOCKED(writecb(bufev, cbarg));
	}
	if (bufev_private->eventcb_pending && bufev->errorcb) {
		bufferevent_event_cb errorcb = bufev->errorcb;
		void *cbarg = bufev->cbarg;
		short what = bufev_private->eventcb_pending;
		int err = bufev_private->errno_pending;
		bufev_private->eventcb_pending = 0;
		bufev_private->errno_pending = 0;
		EVUTIL_SET_SOCKET_ERROR(err);
		UNLOCKED(errorcb(bufev,what,cbarg));
	}
	bufferevent_decref_and_unlock_(bufev);
#undef UNLOCKED
}

// 调度延迟回调
#define SCHEDULE_DEFERRED(bevp)						\
	do {								\
		if (event_deferred_cb_schedule_(			\
			    (bevp)->bev.ev_base,			\
			&(bevp)->deferred))				\
			bufferevent_incref_(&(bevp)->bev);		\
	} while (0)


// 唤起bufferevent读回调
void
bufferevent_run_readcb_(struct bufferevent *bufev, int options)
{
	/* Requires that we hold the lock and a reference */
	struct bufferevent_private *p =
	    EVUTIL_UPCAST(bufev, struct bufferevent_private, bev);
	if (bufev->readcb == NULL)
		return;
	if ((p->options|options) & BEV_OPT_DEFER_CALLBACKS) {
		p->readcb_pending = 1;
        // 通知eventloop线程执行延迟回调
		SCHEDULE_DEFERRED(p);
	} else {
        // 在当前线程执行
		bufev->readcb(bufev, bufev->cbarg);
	}
}

// 唤起bufferevent写回调
void
bufferevent_run_writecb_(struct bufferevent *bufev, int options)
{
	/* Requires that we hold the lock and a reference */
	struct bufferevent_private *p =
	    EVUTIL_UPCAST(bufev, struct bufferevent_private, bev);
	if (bufev->writecb == NULL)
		return;
	if ((p->options|options) & BEV_OPT_DEFER_CALLBACKS) {
		p->writecb_pending = 1;
        // 通知eventloop线程执行延迟回调
		SCHEDULE_DEFERRED(p);
	} else {
        // 在当前线程执行
		bufev->writecb(bufev, bufev->cbarg);
	}
}

     evbuffer之延迟回调的介绍就到此为止。

3.9 evbuffer_read

    然后,我们再来分析evbuffer_read这个接口,此接口主要是用来从socket中读取数据到evbuffer中的:

  • 在支持readv/WSARecv的系统,会调用evbuffer_expand_fast_(buf, howmuch, NUM_READ_IOVEC=4)先准备好struct iovecevbuffer_chain空间,再通过readv / WSARecv将fd数据读取到evbuffer中;
  • 在支持read / recv的系统上,通过evbuffer_expand_singlechain(evbuf, howmuch)准备好evbuffer_chain空间,再完成读操作;
  • 以上每次读取最多4kb数据,读完再唤起evbuffer的延迟回调;
#define EVBUFFER_MAX_READ	4096

/* TODO(niels): should this function return ev_ssize_t and take ev_ssize_t
 * as howmuch? */
int
evbuffer_read(struct evbuffer *buf, evutil_socket_t fd, int howmuch)
{
	struct evbuffer_chain **chainp;
	int n;
	int result;

#ifdef USE_IOVEC_IMPL
	int nvecs, i, remaining;
#else
	struct evbuffer_chain *chain;
	unsigned char *p;
#endif

	EVBUFFER_LOCK(buf);

	if (buf->freeze_end) {
		result = -1;
		goto done;
	}

    // 获取当前socket可读byte
	n = get_n_bytes_readable_on_socket(fd);
    // 一次最多读4096字节数据
	if (n <= 0 || n > EVBUFFER_MAX_READ)
		n = EVBUFFER_MAX_READ;
	if (howmuch < 0 || howmuch > n)
		howmuch = n;

#ifdef USE_IOVEC_IMPL
	/* Since we can use iovecs, we're willing to use the last
	 * NUM_READ_IOVEC chains. */
	if (evbuffer_expand_fast_(buf, howmuch, NUM_READ_IOVEC) == -1) {
		result = -1;
		goto done;
	} else {
		IOV_TYPE vecs[NUM_READ_IOVEC];
#ifdef EVBUFFER_IOVEC_IS_NATIVE_
		nvecs = evbuffer_read_setup_vecs_(buf, howmuch, vecs,
		    NUM_READ_IOVEC, &chainp, 1);
#else
		/* We aren't using the native struct iovec.  Therefore,
		   we are on win32. */
		struct evbuffer_iovec ev_vecs[NUM_READ_IOVEC];
		nvecs = evbuffer_read_setup_vecs_(buf, howmuch, ev_vecs, 2,
		    &chainp, 1);

		for (i=0; i < nvecs; ++i)
			WSABUF_FROM_EVBUFFER_IOV(&vecs[i], &ev_vecs[i]);
#endif

#ifdef _WIN32
		{
			DWORD bytesRead;
			DWORD flags=0;
			if (WSARecv(fd, vecs, nvecs, &bytesRead, &flags, NULL, NULL)) {
				/* The read failed. It might be a close,
				 * or it might be an error. */
				if (WSAGetLastError() == WSAECONNABORTED)
					n = 0;
				else
					n = -1;
			} else
				n = bytesRead;
		}
#else
		n = readv(fd, vecs, nvecs);
#endif
	}

#else /*!USE_IOVEC_IMPL*/
	/* If we don't have FIONREAD, we might waste some space here */
	/* XXX we _will_ waste some space here if there is any space left
	 * over on buf->last. */
    // 先为此evbuffer_chain准备好足够空间
	if ((chain = evbuffer_expand_singlechain(buf, howmuch)) == NULL) {
		result = -1;
		goto done;
	}

	/* We can append new data at this point */
	p = chain->buffer + chain->misalign + chain->off;

#ifndef _WIN32
	n = read(fd, p, howmuch);
#else
	n = recv(fd, p, howmuch, 0);
#endif
#endif /* USE_IOVEC_IMPL */

	if (n == -1) {
		result = -1;
		goto done;
	}
	if (n == 0) {
		result = 0;
		goto done;
	}

#ifdef USE_IOVEC_IMPL
	remaining = n;
	for (i=0; i < nvecs; ++i) {
		/* can't overflow, since only mutable chains have
		 * huge misaligns. */
		size_t space = (size_t) CHAIN_SPACE_LEN(*chainp);
		/* XXXX This is a kludge that can waste space in perverse
		 * situations. */
		if (space > EVBUFFER_CHAIN_MAX)
			space = EVBUFFER_CHAIN_MAX;
		if ((ev_ssize_t)space < remaining) {
			(*chainp)->off += space;
			remaining -= (int)space;
		} else {
			(*chainp)->off += remaining;
			buf->last_with_datap = chainp;
			break;
		}
		chainp = &(*chainp)->next;
	}
#else
	chain->off += n;
	advance_last_with_data(buf);
#endif
	buf->total_len += n;
	buf->n_add_for_cb += n;

	/* Tell someone about changes in this buffer */
    // 此evbuffer内容变更,唤起对此关注的相关方
	evbuffer_invoke_callbacks_(buf);
	result = n;
done:
	EVBUFFER_UNLOCK(buf);
	return result;
}

3.10 evbuffer_write

    介绍完evbuffer_read,再来看看evbuffer_write接口。 

int
evbuffer_write(struct evbuffer *buffer, evutil_socket_t fd)
{
	return evbuffer_write_atmost(buffer, fd, -1);
}

    展开evbuffer_write_atmost看其具体实现: 

int
evbuffer_write_atmost(struct evbuffer *buffer, evutil_socket_t fd,
    ev_ssize_t howmuch)
{
	int n = -1;

	EVBUFFER_LOCK(buffer);

	if (buffer->freeze_start) {
		goto done;
	}

	if (howmuch < 0 || (size_t)howmuch > buffer->total_len)
		howmuch = buffer->total_len;

	if (howmuch > 0) {
#ifdef USE_SENDFILE
		struct evbuffer_chain *chain = buffer->first;
        // 直接通过sendfile从内核态发送
		if (chain != NULL && (chain->flags & EVBUFFER_SENDFILE))
			n = evbuffer_write_sendfile(buffer, fd, howmuch);
		else {
#endif
#ifdef USE_IOVEC_IMPL
        // 支持iovec的系统上,调用WSASend / writev来发送
		n = evbuffer_write_iovec(buffer, fd, howmuch);
#elif defined(_WIN32)
		/* XXX(nickm) Don't disable this code until we know if
		 * the WSARecv code above works. */
		void *p = evbuffer_pullup(buffer, howmuch);
		EVUTIL_ASSERT(p || !howmuch);
		n = send(fd, p, howmuch, 0);
#else
		void *p = evbuffer_pullup(buffer, howmuch);
		EVUTIL_ASSERT(p || !howmuch);
		n = write(fd, p, howmuch);
#endif
#ifdef USE_SENDFILE
		}
#endif
	}

    // 待发送完毕即清理掉对应数据
	if (n > 0)
		evbuffer_drain(buffer, n);

done:
	EVBUFFER_UNLOCK(buffer);
	return (n);
}

    展开evbuffer_write_sendfile看其实现,是调用对应系统sendfile接口来避免经用户态再到内核态带来的开销: 

#ifdef USE_SENDFILE
static inline int
evbuffer_write_sendfile(struct evbuffer *buffer, evutil_socket_t dest_fd,
    ev_ssize_t howmuch)
{
	struct evbuffer_chain *chain = buffer->first;
	struct evbuffer_chain_file_segment *info =
	    EVBUFFER_CHAIN_EXTRA(struct evbuffer_chain_file_segment,
		chain);
	const int source_fd = info->segment->fd;
#if defined(SENDFILE_IS_MACOSX) || defined(SENDFILE_IS_FREEBSD)
	int res;
	ev_off_t len = chain->off;
#elif defined(SENDFILE_IS_LINUX) || defined(SENDFILE_IS_SOLARIS)
	ev_ssize_t res;
	ev_off_t offset = chain->misalign;
#endif

	ASSERT_EVBUFFER_LOCKED(buffer);

#if defined(SENDFILE_IS_MACOSX)
	res = sendfile(source_fd, dest_fd, chain->misalign, &len, NULL, 0);
	if (res == -1 && !EVUTIL_ERR_RW_RETRIABLE(errno))
		return (-1);

	return (len);
#elif defined(SENDFILE_IS_FREEBSD)
	res = sendfile(source_fd, dest_fd, chain->misalign, chain->off, NULL, &len, 0);
	if (res == -1 && !EVUTIL_ERR_RW_RETRIABLE(errno))
		return (-1);

	return (len);
#elif defined(SENDFILE_IS_LINUX)
	/* TODO(niels): implement splice */
	res = sendfile(dest_fd, source_fd, &offset, chain->off);
	if (res == -1 && EVUTIL_ERR_RW_RETRIABLE(errno)) {
		/* if this is EAGAIN or EINTR return 0; otherwise, -1 */
		return (0);
	}
	return (res);
#elif defined(SENDFILE_IS_SOLARIS)
	{
		const off_t offset_orig = offset;
		res = sendfile(dest_fd, source_fd, &offset, chain->off);
		if (res == -1 && EVUTIL_ERR_RW_RETRIABLE(errno)) {
			if (offset - offset_orig)
				return offset - offset_orig;
			/* if this is EAGAIN or EINTR and no bytes were
			 * written, return 0 */
			return (0);
		}
		return (res);
	}
#endif
}
#endif

    如果不支持或不想使用sendfile,选择evbuffer_write_iovec来发送,此接口是对各系统iovec的封装: 

#ifdef USE_IOVEC_IMPL
static inline int
evbuffer_write_iovec(struct evbuffer *buffer, evutil_socket_t fd,
    ev_ssize_t howmuch)
{
	IOV_TYPE iov[NUM_WRITE_IOVEC];
	struct evbuffer_chain *chain = buffer->first;
	int n, i = 0;

	if (howmuch < 0)
		return -1;

	ASSERT_EVBUFFER_LOCKED(buffer);
	/* XXX make this top out at some maximal data length?  if the
	 * buffer has (say) 1MB in it, split over 128 chains, there's
	 * no way it all gets written in one go. */
	while (chain != NULL && i < NUM_WRITE_IOVEC && howmuch) {
#ifdef USE_SENDFILE
		/* we cannot write the file info via writev */
		if (chain->flags & EVBUFFER_SENDFILE)
			break;
#endif
		iov[i].IOV_PTR_FIELD = (void *) (chain->buffer + chain->misalign);
		if ((size_t)howmuch >= chain->off) {
			/* XXXcould be problematic when windows supports mmap*/
			iov[i++].IOV_LEN_FIELD = (IOV_LEN_TYPE)chain->off;
			howmuch -= chain->off;
		} else {
			/* XXXcould be problematic when windows supports mmap*/
			iov[i++].IOV_LEN_FIELD = (IOV_LEN_TYPE)howmuch;
			break;
		}
		chain = chain->next;
	}
	if (! i)
		return 0;

#ifdef _WIN32
	{
		DWORD bytesSent;
        // win平台的iovec是通过WSASend来实现的
		if (WSASend(fd, iov, i, &bytesSent, 0, NULL, NULL))
			n = -1;
		else
			n = bytesSent;
	}
#else
    // 其他平台都是通过writev实现
	n = writev(fd, iov, i);
#endif
	return (n);
}
#endif

    好了,关于evbuffer相关介绍,暂介绍至此。 

4 小结

  • evbuffer 是 libevent 中处理动态数据缓冲的核心组件,它通过链表结构灵活管理内存,并提供高效的读写 API,特别适合处理异步 I/O 和网络编程中的流式数据传输任务。通过 evbuffer,程序员可以减少手动管理内存的复杂性,同时确保程序在处理大规模数据时仍保持高效。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

老中医的博客

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值