python3.7 协程实战

最新推荐文章于 2026-04-26 20:15:15 发布

原创最新推荐文章于 2026-04-26 20:15:15 发布 · 4.4k 阅读

24 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#python3.7 协程

Python 并发编程专栏收录该内容

5 篇文章

订阅专栏

本文介绍了Python3.7中协程的实战应用，对比了CPU密集和IO密集操作下的处理方式，强调了协程在提高效率、避免锁竞争方面的优势。通过官网例子和常见用法展示了如何创建和管理协程任务，并分享了在单核和多核环境下的协程优化策略，特别提到了在多进程环境下结合协程的高效应用。此外，还提及了协程在爬虫代理有效性验证场景中的实际运用。

文章目录

协程实战

协程实战

只介绍常见用法，底层用法请回顾 yield from，原理请参考论文

cpu密集

对于cpu密集操作，当然只能靠多进程方式

io密集

对于io密集型操作，传统上用多线程方式。
现在，更加高效的的方式来了，协程！

协程的本质是函数回调，
不需要os调度，效率极高，且不需要锁。
缺点是回调得自己写，代码老复杂了

python3.7 协程代码

官网两个例子

使用 asyncio.create_task() 方法创建任务

import asyncio

async def test():
    asyncio.sleep(1)

async def main():
    task1 = asyncio.create_task(test())
    task2 = asyncio.create_task(test())
    
    await task1
    await task2

asyncio.run(main())

使用 asyncio.gather() 方法搜集多个任务

import asyncio


async def who(name):
    asyncio.sleep(1)
    print(name)


async def main():
    # Schedule three calls *concurrently*:
    await asyncio.gather(
        who('Bob'),
        who('Amy'),
        who('Mike'),
    )

asyncio.run(main())

常用的用法

单核上的协程

tasks = [asyncio.create_task(test(1)) for proxy in range(10000)] 创建了任务
[await t for t in tasks] 丢到执行队列里面去

这里共一万个任务，耗时1.2640011310577393秒（二百块的AMD处理器）

import aiohttp
import asyncio
import time


async def test(time):
    await asyncio.sleep(time)


async def main():
    start_time = time.time()
    tasks = [asyncio.create_task(test(1)) for proxy in range(10000)]
    [await t for t in tasks]
    print(time.time() - start_time)

if __name__ == "__main__":
    asyncio.run(main())

多核上的协程

多核上要用当然要用到《多进程 + 协程》
总共10000任务，分成四个进程跑，运行时间1.08 秒（二百块的AMD处理器）
Process1 耗时 : 1.0899991989135742
Process2 耗时 : 1.0870001316070557
Process3 耗时 : 1.0749988555908203
Process4 耗时 : 1.070998191833496

from multiprocessing import Pool
import asyncio
import time


async def test(time):
    await asyncio.sleep(time)


async def main(num):
    start_time = time.time()
    tasks = [asyncio.create_task(test(1)) for proxy in range(num)]
    [await t for t in tasks]
    print(time.time() - start_time)


def run(num):
    asyncio.run(main(num))


if __name__ == "__main__":
    p = Pool()
    for i in range(4):
        p.apply_async(run, args=(2500,))
    p.close()
    p.join()

实战-检验代理有效性

爬虫业务需要代理，自建的代理池需要验证模块。
需要爬哪个网站，就针对哪个网站进行检测。
比如这里是 http://xa.ganji.com/
测试用例如下：

import aiohttp
import requests
import asyncio


async def test_single_proxy(proxy):
    # conn = aiohttp.TCPConnector(ssl=False)
    # async with aiohttp.ClientSession(connector=conn) as session:
    async with aiohttp.ClientSession() as session:
        try:
            if isinstance(proxy, bytes):
                proxy = proxy.decode('utf8')
            real_proxy = 'http://' + proxy
            async with session.get('http://xa.ganji.com/', proxy=real_proxy, timeout=6) as response:
                if response.status in [200]:
                    print(proxy, 'valid pass ……')
                else:
                    print(proxy, 'remove')
        except:
            print('请求失败')


api_list = [
    '60.169.217.151:46174',
    '27.220.121.8:54276',
    '180.122.145.179:33419',
    '59.58.209.134:36154',
    '123.180.209.136:41878',
    '123.188.102.233:38182',
    '60.182.33.2:23016',
    '171.11.138.254:33794',
    '123.162.151.90:23693',
    '125.65.91.130:24209'
]


async def main(api_list):
    tasks = [asyncio.create_task(test_single_proxy(proxy))
             for proxy in api_list]
    [await t for t in tasks]

if __name__ == "__main__":
    asyncio.run(main(api_list))