Unix System Call Timeouts

本文探讨了在Unix系统中实现系统调用超时的挑战,特别是针对子进程等待的场景。作者分析了多种系统调用,如wait、waitpid等,指出它们并未直接支持超时功能。文章讨论了使用alarm()和signalfd()等方法的局限性,并提到了现代Linux系统中通过epoll和select结合信号处理来实现超时等待的复杂解决方案。

Unix System Call Timeouts

Unix系统调用超时

Mar 12, 2017    2017年三月12日

Recently I was writing some code where I wanted to wait for a child process, and I wanted the wait call to have a timeout. The use case is something like this: you spawn a subprocess, and you expect the subprocess to complete within ten seconds. If it doesn’t complete in that time, you want to treat it as an error (and perhaps kill the child).

最近我写了一些代码,在代码中,我想等待一个子进程,并且这个等待调用需要可以超时。用力类似于这个:你生成一个子进程,并且你期待这个子进程在10秒之内完成。如果他没有在规定时间内完成了,你想将它看成一个错误(也许还要杀死这个子进程)。

There are a lot of “wait” system calls on Linux. In section 2 of the Linux man pages you get all of the following:

linux中有很多等待系统调用,在Linux手册中的第二章节中你也可以获取到如下所有信息:

 

pid_t wait(int *wstatus); pid_t waitpid(pid_t pid, int *wstatus, int options); int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options); pid_t wait3(int *wstatus, int options, struct rusage *rusage); pid_t wait4(pid_t pid, int *wstatus, int options, struct rusage *rusage);

 

Wow, look at all of those ways to wait for a process! In reality, a lot of these are wrapper methods required by POSIX, or provided by glibc. Only two of these are true system calls on Linux: waitid() and wait4(). But that’s still a lot of ways to wait for things.

哇,看看这些等待进程的方法,实际上,很多的的包装器的方法都需要依赖于POSIX, 或者是由glibc提供。 在Linux系统中仅仅是 waitid() 和 wait4()这两个方法才是的真正的系统调用的。但是仍然是有很多方法去等待的。

As you can see, none of these accept a timeout. I Googled the question, and found a Stack Overflow post titled“Waitpid equivalent with timeout?”. The top answer suggests using alarm(), which apparently is wrong. Of course it’s wrong, if you’ve done a lot of Unix systems programming you’ll know that alarm() is always the wrong answer. Then there are numerous other answers that go into crazy gymnastics to solve the problem. Modern Linux systems have a system call called signalfd() which allows you to register a file descriptor to receive signal events. With this technique, you can register a signalfd for SIGCHLD events, and then put it into an epoll or select loop with a timeout. This is a lot simpler than the other Stack Overflow answers, but is still kind of complicated. Furthermore, signalfd() wasn’t added to Linux until 2007, with kernel 2.6.22. This is certainly old enough for pretty much all real world running Linux applications, but it’s not a standard Unix feature and therefore isn’t portable. On classic Unix systems you need to resort to the kind of tricks in the Stack Overflow post.

就如你所看到的,这些都是不接受超时的,我用谷歌搜索了一下这个问题,并且返现Stack Overflow 收集了标题是Waitpid equivalent with timeout?(等待pid等价于超时?)最顶部的回答是建议使用alarm()方法,这显然是错误的,当然它是错误的,如果你做过很多Unix系统的编程,你就会知道alarm方法总是错误的答案,然后这些众多的其他答案将人们引入一个疯狂的解决问题的体系中。现代的Linux系统有一个系统调用名字叫signalfd方法,这个允许你注册一个文件,用于描述接收信号的事件。用这个技术,你就能注册一个信号SIGCHLD事件,并且然后将它放入一个epoll循环,或者select 循环并超时。这比在Stack Overflow 网站的其他答案简单很多,但是仍然有点复杂,此外, signalfd方法指导2007年才被添加到Linux系统中去,内核是2.6.22,这个对于运行中Linux系统的想当大的一部分应用程序来说确实有点老了。但是它不是标准的Unix功能,因此不能被移植,在经典的Unix系统中,您需要使用在Stack Overflow网站上求助的计策了。

These poor API decisions come up more frequently in Unix than people like to admit. In fact, the reason there are so many “wait” calls available is because the original APIs were poorly designed, and had to be modified to be more flexible.

这些糟糕的API决策在Unix中出现的频率要比人们承认的要高很多,事实上,有如此多的wait调用可用的原因是因为原始的api被设计的狠烂,必须需要被修改才能灵活的使用起来。

Things get even worse with a lot of the blocking I/O system calls. For instance, suppose you want to create a directory. You get two choices:

i/o阻塞系统的调用看起来更糟糕,例如,假设你想要创建一个目录。你有两个选择:

 

int mkdir(const char *pathname, mode_t mode);
int mkdirat(int dirfd, const char *pathname, mode_t mode);

Neither of these takes a timeout, and neither of them exposes an interface that can be used with select or epoll. If you’ve ever had the mispleasure of reading the libuv source code you’ll know that there’s a trick to turning these kinds of I/O operations into something you can put into an event loop: you run the desired operation (mkdir() in this case) in another thread, and then wait for the thread to finish with a timeout. I’ve been told it’s common practice for vendors of things like NFS hardware appliances to patch the kernel (most likely BSD in this case) to add new system calls to make it possible to implement operations like this natively.

 

这两种方式都没有超时,而且都不会公开接口可以被select或者是epoll所使用,如果你不喜欢阅读libuv的源代码,你会知道有一个技巧可以将这些类型的IO操作转换为可以放入事件循环机制中:你在其他线程中运行这个所需要的操作(本例中是mkdir()),人挪活等待线程超时,我被告知NFS硬件设备之类的供应商通常会修补内核(在本例中最接近BSD),用以添加一个新的系统调用,从而可以在本机上实现这样的操作。

Hopefully one day we can redo all of the I/O stuff in Unix. But I’m not holding my breath.

希望有一天我们可以重做所有Unix中的I/O。不过我不以为然。

Update: Henrique Almeida sent me the following email telling me about some system calls I was not familiar with:

更新:Henrique Almeida给我发了如下的邮件告诉我了关于我不熟悉的一些系统调用:

Hello, if you need to wait with a timeout for a child to exit I think you should use sigprocmask with either pselect or sigtimedwait and wait for SIGCHLD. You don’t need signalfd or alarm.

 

After looking into this I see there’s also an epoll_pwait(), which is the equivalent call for the epoll family. So it looks like there are a lot of options for waiting with a timeout. Things are still a mess from the asynchronous I/O side of things, however.

在研究完了这个之后,我看到了仍然有一个epoll_pwait()是与epoll家族可以等价的方法,所以看起来有很多选择可以去实现等待一个超时。然而从异步I/O的角度来看事情还是一团糟。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值