Python 生成器和迭代器高级应用指南

原创于 2026-04-15 13:32:56 发布 · 286 阅读

8 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#rust #github #python

Python 生成器和迭代器高级应用指南

1. 什么是迭代器？

迭代器是一个实现了 __iter__() 和 __next__() 方法的对象。它允许我们逐个访问集合中的元素，而不需要一次性加载所有元素到内存中。

class MyIterator:
    def __init__(self, start, end):
        self.current = start
        self.end = end
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current >= self.end:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

# 使用迭代器
for i in MyIterator(1, 5):
    print(i)  # 输出: 1, 2, 3, 4

2. 什么是生成器？

生成器是一种特殊的迭代器，使用 yield 语句来生成值。生成器函数在调用时返回一个生成器对象，而不是执行函数体。

def my_generator(start, end):
    current = start
    while current < end:
        yield current
        current += 1

# 使用生成器
for i in my_generator(1, 5):
    print(i)  # 输出: 1, 2, 3, 4

3. 生成器表达式

生成器表达式是一种简洁的生成器创建方式，语法类似于列表推导式，但使用圆括号而不是方括号。

# 生成器表达式
gen = (x * 2 for x in range(5))
for value in gen:
    print(value)  # 输出: 0, 2, 4, 6, 8

# 列表推导式（对比）
list_comp = [x * 2 for x in range(5)]
print(list_comp)  # 输出: [0, 2, 4, 6, 8]

4. 高级应用

4.1 惰性求值

生成器的一个重要特性是惰性求值，它只在需要时才生成值，这使得它非常适合处理大型数据集。

def large_file_reader(file_path):
    """逐行读取大文件，避免一次性加载到内存"""
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

# 使用生成器读取大文件
for line in large_file_reader('large_file.txt'):
    process_line(line)  # 处理每一行

4.2 无限序列

生成器可以用于创建无限序列，因为它只在需要时生成值。

def fibonacci():
    """生成无限的斐波那契数列"""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# 使用无限生成器
fib = fibonacci()
for _ in range(10):
    print(next(fib))  # 输出前10个斐波那契数

4.3 协程

生成器可以用作协程，通过 send() 方法向生成器发送值。

def coroutine():
    while True:
        value = yield
        print(f"Received: {value}")

# 使用协程
c = coroutine()
next(c)  # 启动协程
c.send(1)  # 输出: Received: 1
c.send(2)  # 输出: Received: 2
c.close()  # 关闭协程

4.4 上下文管理器

生成器可以用于创建上下文管理器，使用 contextlib.contextmanager 装饰器。

from contextlib import contextmanager

@contextmanager
def timer():
    """测量代码执行时间的上下文管理器"""
    import time
    start = time.time()
    yield
    end = time.time()
    print(f"Elapsed time: {end - start:.4f} seconds")

# 使用上下文管理器
with timer():
    # 执行一些操作
    for i in range(1000000):
        pass

4.5 管道模式

生成器可以用于创建数据处理管道，每个生成器负责一个处理步骤。

def read_file(file_path):
    """读取文件"""
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

def filter_lines(lines, keyword):
    """过滤包含关键字的行"""
    for line in lines:
        if keyword in line:
            yield line

def count_words(lines):
    """计算每行的单词数"""
    for line in lines:
        yield len(line.split())

# 创建数据处理管道
file_path = 'data.txt'
lines = read_file(file_path)
filtered_lines = filter_lines(lines, 'python')
word_counts = count_words(filtered_lines)

# 处理结果
for count in word_counts:
    print(count)

5. 内置函数和模块

5.1 `iter()` 函数

iter() 函数用于创建迭代器，它可以接受一个可迭代对象或一个可调用对象和一个哨兵值。

# 从可迭代对象创建迭代器
my_list = [1, 2, 3]
iter_obj = iter(my_list)
print(next(iter_obj))  # 输出: 1
print(next(iter_obj))  # 输出: 2

# 使用哨兵值
import random
def random_number():
    return random.randint(1, 10)

# 当生成的数字等于 5 时停止
for num in iter(random_number, 5):
    print(num)

5.2 `next()` 函数

next() 函数用于获取迭代器的下一个元素，它可以接受一个默认值，当迭代结束时返回。

my_iter = iter([1, 2, 3])
print(next(my_iter))  # 输出: 1
print(next(my_iter))  # 输出: 2
print(next(my_iter))  # 输出: 3
print(next(my_iter, 'End'))  # 输出: End

5.3 `itertools` 模块

itertools 模块提供了许多用于操作迭代器的工具函数。

import itertools

# 无限迭代器
count = itertools.count(start=1, step=2)
for _ in range(5):
    print(next(count))  # 输出: 1, 3, 5, 7, 9

# 循环迭代器
cycle = itertools.cycle(['A', 'B', 'C'])
for _ in range(5):
    print(next(cycle))  # 输出: A, B, C, A, B

# 重复迭代器
repeat = itertools.repeat('Hello', 3)
for item in repeat:
    print(item)  # 输出: Hello, Hello, Hello

# 组合迭代器
product = itertools.product([1, 2], ['a', 'b'])
print(list(product))  # 输出: [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

# 排列迭代器
permutations = itertools.permutations([1, 2, 3], 2)
print(list(permutations))  # 输出: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

# 组合迭代器
combinations = itertools.combinations([1, 2, 3], 2)
print(list(combinations))  # 输出: [(1, 2), (1, 3), (2, 3)]

6. 实际应用场景

6.1 处理大型数据集

生成器非常适合处理大型数据集，因为它不需要一次性加载所有数据到内存中。

def process_large_dataset(file_path):
    """处理大型数据集"""
    with open(file_path, 'r') as f:
        for line in f:
            # 处理每一行数据
            data = line.strip().split(',')
            # 进行数据处理
            yield processed_data

# 使用生成器处理数据
for data in process_large_dataset('large_dataset.csv'):
    # 处理结果
    pass

6.2 生成无限数据流

生成器可以用于生成无限数据流，如传感器数据、随机数等。

def sensor_data():
    """模拟传感器数据"""
    while True:
        # 模拟获取传感器数据
        data = {
            'timestamp': time.time(),
            'value': random.uniform(0, 100)
        }
        yield data
        time.sleep(1)  # 每秒生成一个数据点

# 使用生成器获取传感器数据
for data in sensor_data():
    print(data)
    # 处理传感器数据

6.3 实现自定义迭代器

我们可以通过实现 __iter__() 和 __next__() 方法来创建自定义迭代器。

class ReverseIterator:
    """反向迭代器"""
    def __init__(self, data):
        self.data = data
        self.index = len(data) - 1
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.index < 0:
            raise StopIteration
        value = self.data[self.index]
        self.index -= 1
        return value

# 使用反向迭代器
for item in ReverseIterator([1, 2, 3, 4, 5]):
    print(item)  # 输出: 5, 4, 3, 2, 1

6.4 使用生成器进行惰性计算

生成器可以用于惰性计算，只在需要时才计算值。

def lazy_calculation():
    """惰性计算"""
    print("开始计算...")
    yield 1
    print("继续计算...")
    yield 2
    print("完成计算...")
    yield 3

# 使用惰性计算
gen = lazy_calculation()
print(next(gen))  # 输出: 开始计算... 1
print(next(gen))  # 输出: 继续计算... 2
print(next(gen))  # 输出: 完成计算... 3