[AI大模型] DAY 1 ：零基础学LangChain

原创已于 2026-02-05 16:51:01 修改 · 1.1k 阅读

11 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#python #langchain #大模型 #agent

于 2026-02-04 23:42:49 首次发布

AI大模型专栏收录该内容

1 篇文章

订阅专栏

概要

基于官方文档进行学习，但是会加注释和必要解释帮助理解，只会python基础语法也完全可以看懂！本人亲测！（暂时没写完！）

1. 智能体（Agent）

概述

智能体（Agent）将语言模型与工具结合，创建能够对任务进行推理、决定使用哪些工具并迭代寻求解决方案的系统。智能体在循环中运行工具以实现目标，直到满足停止条件（模型输出最终结果或达到迭代次数限制）。

核心组件

1. 模型（Model）

模型是智能体的推理引擎，支持静态和动态模型选择。

静态模型

在创建智能体时配置一次，执行过程中保持不变。

方式1：使用模型标识符字符串

from langchain.agents import create_agent

agent = create_agent(
    "openai:gpt-5",  # 模型标识符
    tools=tools
)

模型标识符支持自动推断，如 "gpt-5" 会被推断为 "openai:gpt-5"。

方式2：直接使用模型实例（推荐）

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-5",
    temperature=0.1,      # 控制创造性（0-1）
    max_tokens=1000,      # 最大生成令牌数
    timeout=30,           # 超时时间
    # ... 其他参数
)

agent = create_agent(model, tools=tools)

这种方式提供对配置的完全控制。

动态模型

在运行时根据状态和上下文选择模型，支持复杂路由逻辑和成本优化。

from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse

# 创建两个模型实例
basic_model = ChatOpenAI(model="gpt-4o-mini")  # 轻量模型，成本低
advanced_model = ChatOpenAI(model="gpt-4o")     # 强大模型，性能好

# @wrap_model_call 是装饰器，表示这个函数要包装模型调用
@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
    """
    request: 模型请求对象，包含状态等信息
    handler: 实际的模型调用处理器
    -> ModelResponse: 返回模型响应
    """
    # 获取当前对话的消息数量
    message_count = len(request.state["messages"])
    
    # 根据消息数量决定使用哪个模型
    if message_count > 10:
        model = advanced_model  # 对话长，用强模型
    else:
        model = basic_model     # 对话短，用轻量模型
    
    # 修改请求中的模型
    request.model = model
    
    # 调用实际的模型处理器
    return handler(request)

# 创建智能体
agent = create_agent(
    model=basic_model,          # 默认模型（会被中间件覆盖）
    tools=tools,                # 工具列表（假设已定义）
    middleware=[dynamic_model_selection]  # 中间件列表
    # 执行时：用户输入 → middleware处理 → model推理 → 输出
)

注意：使用结构化输出时，不支持预绑定模型（已调用 bind_tools 的模型）。

2. 工具（Tools）

工具赋予智能体执行行动的能力，支持：

序列中的多个工具调用（由单个提示触发）
适当的并行工具调用
基于先前结果的动态工具选择
工具重试逻辑和错误处理
工具调用之间的状态持久化

定义工具

from langchain.tools import tool
from langchain.agents import create_agent

@tool
def search(query: str) -> str:
    """搜索信息"""
    return f"结果：{query}"

@tool
def get_weather(location: str) -> str:
    """获取位置的天气信息"""
    return f"{location} 的天气：晴朗，72°F"

agent = create_agent(model, tools=[search, get_weather])

如果提供空工具列表，智能体将仅包含 LLM 节点，不具备工具调用能力。

工具错误处理

自定义工具错误的处理方式：

from langchain.agents import create_agent
from langchain.agents.middleware import wrap_tool_call
from langchain_core.messages import ToolMessage

@wrap_tool_call
def handle_tool_errors(request, handler):
    """使用自定义消息处理工具执行错误"""
    try:
        return handler(request)
    except Exception as e:
        # 向模型返回自定义错误消息
        return ToolMessage(
            content=f"工具错误：请检查您的输入并重试。({str(e)})",
            tool_call_id=request.tool_call["id"]
        )

agent = create_agent(
    model="openai:gpt-4o",
    tools=[search, get_weather],
    middleware=[handle_tool_errors]
)

3. 系统提示（System Prompt）

通过提示塑造智能体的行为方式。

基本用法

agent = create_agent(
    model,
    tools,
    system_prompt="你是一个有帮助的助手。请简洁准确。"
)

动态系统提示

根据运行时上下文动态生成提示：

from typing import TypedDict
from langchain.agents import create_agent
from langchain.agents.middleware import dynamic_prompt, ModelRequest

class Context(TypedDict):
    user_role: str

@dynamic_prompt
def user_role_prompt(request: ModelRequest) -> str:
    """根据用户角色生成系统提示"""
    user_role = request.runtime.context.get("user_role")
    base_prompt = "你是一个有帮助的助手。"
    
    if user_role == "expert":
        return f"{base_prompt} 提供详细的技术响应。"
    elif user_role == "beginner":
        return f"{base_prompt} 简单解释概念，避免使用行话。"
    
    return base_prompt

agent = create_agent(
    model="openai:gpt-4o",
    tools=[web_search],
    middleware=[user_role_prompt],
    context_schema=Context
)

# 根据上下文动态设置系统提示
result = agent.invoke(
    {"messages": [{"role": "user", "content": "解释机器学习"}]},
    context={"user_role": "expert"}
)

调用智能体

基本调用

result = agent.invoke(
    {"messages": [{"role": "user", "content": "旧金山天气如何？"}]}
)

智能体遵循 LangGraph Graph API，支持所有相关方法。

高级概念

结构化输出（Structured Output）

智能体可以特定格式返回输出，有两种策略：

1. ToolStrategy

使用人工工具调用生成结构化输出，适用于任何支持工具调用的模型：

# 导入必要的库
from pydantic import BaseModel  # Pydantic用于数据验证和设置
from langchain.agents import create_agent  # 创建智能体的主函数
from langchain.agents.structured_output import ToolStrategy  # 结构化输出策略：通过工具调用实现

# 第一步：定义数据结构模板
# BaseModel是Pydantic的基类，用于创建数据模型
class ContactInfo(BaseModel):
    # 定义三个必填字段，都是字符串类型
    name: str   # 姓名
    email: str  # 邮箱
    phone: str  # 电话
   
# 第二步：创建使用ToolStrategy的智能体
agent = create_agent(
    model="openai:gpt-4o-mini",
    tools=[search_tool],  
    response_format=ToolStrategy(ContactInfo)
)

# 第三步：调用智能体并获取结构化结果
result = agent.invoke({
    # 传入用户消息
    "messages": [{
        "role": "user", 
        "content": "提取联系信息：John Doe, john@example.com, (555) 123-4567"
    }]
})

# 第四步：获取结构化响应
result["structured_response"]
# 输出：ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')
# 这是一个ContactInfo对象，不是普通字符串

# 使用这个对象：
contact = result["structured_response"]
print(contact.name)   # "John Doe"
print(contact.email)  # "john@example.com"
print(contact.phone)  # "(555) 123-4567"

2. ProviderStrategy

使用模型提供商的原生结构化输出（response_format），更可靠但仅适用于支持的提供商：

from langchain.agents.structured_output import ProviderStrategy

agent = create_agent(
    model="openai:gpt-4o",  
    response_format=ProviderStrategy(ContactInfo)
)

注意：从 LangChain 1.0 开始，必须明确使用 ToolStrategy 或 ProviderStrategy，不再支持直接传递模式。

记忆（Memory）

智能体通过消息状态自动维护对话历史，还可以配置自定义状态模式。

记忆方法一：通过中间件定义状态（推荐）

# 导入必要的类和模块
from langchain.agents import AgentState  # 智能体状态基类
from langchain.agents.middleware import AgentMiddleware  # 中间件基类

# 第一步：定义自定义状态类
# 继承 AgentState（所有智能体状态的基类）
class CustomState(AgentState):
    """
    自定义智能体状态
    智能体默认只有 messages 字段（对话历史）
    这里添加额外的 user_preferences 字段来存储用户偏好
    """
    user_preferences: dict  # 新增字段：存储用户偏好设置
    # AgentState 已经包含了 messages 字段
    # 所以 CustomState 实际有：messages + user_preferences 两个字段

# 第二步：创建自定义中间件
class CustomMiddleware(AgentMiddleware):
    """
    自定义中间件
    中间件可以在智能体执行的不同阶段插入自定义逻辑
    """
    
    # 指定这个中间件使用的状态模式
    state_schema = CustomState
    # 告诉系统："我这个中间件需要 CustomState 类型的状态"
    
    # 指定这个中间件管理的工具（可选）
    tools = [tool1, tool2]
    # 这些工具可以访问 CustomState 中的额外字段
    
    # 定义在模型调用之前执行的方法
    def before_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        """
        在模型推理之前被调用
        state: 当前智能体状态（包含 messages 和 user_preferences）
        runtime: 运行时上下文
        返回值：可以返回一个字典来修改状态，或返回 None 表示不修改
        """
        # 示例：根据用户偏好修改系统提示
        if "user_preferences" in state:
            preferences = state["user_preferences"]
            if preferences.get("style") == "technical":
                # 如果是技术型用户，可以在这里添加技术性提示
                # 例如修改消息或添加技术细节
                pass
        
        # 如果不修改状态，返回 None
        return None
    
    # 还可以定义其他钩子方法：
    # after_model() - 模型调用后执行
    # before_tool() - 工具调用前执行
    # after_tool() - 工具调用后执行

# 第三步：创建带有自定义中间件的智能体
agent = create_agent(
    model,  # 模型实例
    tools=tools,  # 智能体的主要工具列表
    middleware=[CustomMiddleware()]  # 添加自定义中间件
    # 执行顺序：用户输入 → middleware处理 → 模型推理 → 输出
)

# 第四步：调用智能体并传入自定义状态
result = agent.invoke({
    # 必需：消息历史（智能体默认状态）
    "messages": [{
        "role": "user", 
        "content": "我更喜欢技术性解释"
    }],
    
    # 自定义状态字段（CustomState 中定义的）
    "user_preferences": {
        "style": "technical",      # 偏好技术性解释
        "verbosity": "detailed",   # 偏好详细解释
        # 可以添加更多自定义字段
        "language": "zh",         # 语言偏好
        "expertise_level": "advanced"  # 专业水平
    }
    
    # 状态传递机制：
    # 1. 这里传入的字典会被转换为 CustomState 对象
    # 2. 中间件可以读取和修改这些状态
    # 3. 状态会在整个对话过程中保持
})

# 第五步：后续调用保持状态
# 下一次调用时，状态会自动包含之前的 user_preferences
result2 = agent.invoke({
    "messages": [{
        "role": "user", 
        "content": "请解释神经网络"
    }]
    # 不需要再传 user_preferences，智能体会记住
    # 中间件会根据已有的偏好调整解释风格
})

记忆方法二：通过 state_schema 定义状态

不需要依赖中间件

from langchain.agents import AgentState

class CustomState(AgentState):
    user_preferences: dict

agent = create_agent(
    model,
    tools=[tool1, tool2],
    state_schema=CustomState
)

注意：从 LangChain 1.0 开始，自定义状态必须是 TypedDict 类型，不再支持 Pydantic 模型和数据类。

流式传输（Streaming）

显示智能体执行的中间进度：

# 使用 agent.stream() 进行流式调用
# 传入初始状态：一个用户消息
for chunk in agent.stream({
    "messages": [{"role": "user", "content": "搜索 AI 新闻并总结发现"}]
}, stream_mode="values"):
    # chunk: 当前时间点的完整状态快照
    # stream_mode="values": 流式传输完整的状态对象
    
    # 获取最新的消息（最后一条）
    latest_message = chunk["messages"][-1]
    # chunk["messages"] 是当前所有的消息历史
    
    # 判断最新消息是什么类型
    if latest_message.content:
        # 如果是文本内容（模型输出的文字）
        print(f"智能体：{latest_message.content}")
        # 这会随着模型生成，一个字一个字地输出
    elif latest_message.tool_calls:
        # 如果是工具调用（模型决定要调用工具）
        print(f"正在调用工具：{[tc['name'] for tc in latest_message.tool_calls]}")
        # tc['name'] 提取工具名称，如 "search_news"

普通调用

agent.invoke()
result = agent.invoke({"messages": [...]})
等待10秒后...
print(result["messages"][-1]["content"])

一次性输出全部内容，中间看不到进度

消息的结构是什么样的？

{
    "messages": [
        # 第一条：用户消息
        {
            "role": "user",
            "content": "搜索 AI 新闻并总结发现",
            "type": "human"  # 类型：human表示用户
        },
        
        # 第二条：AI的思考
        {
            "role": "assistant",
            "content": "我来帮你搜索AI新闻。首先我需要调用搜索工具...",
            "type": "ai"  # 类型：ai表示AI
        },
        
        # 第三条：AI决定调用工具
        {
            "role": "assistant",
            "content": None,  # 没有文本内容
            "type": "ai",
            "tool_calls": [  # 工具调用列表
                {
                    "name": "search_news",  # 工具名称
                    "args": {               # 工具参数
                        "query": "AI news 2024"
                    },
                    "id": "call_abc123"     # 唯一调用ID
                }
            ]
        },
        
        # 第四条：工具返回结果
        {
            "role": "tool",
            "content": "找到了3条新闻：1. OpenAI发布... 2. 谷歌...",  # 工具返回的内容
            "type": "tool",
            "tool_call_id": "call_abc123"  # 对应哪个工具调用
        },
        
        # 第五条：AI继续思考
        {
            "role": "assistant", 
            "content": "根据搜索结果，我来总结一下...",
            "type": "ai",
            "tool_calls": None  # 这次没有调用工具
        },
        
        # 第六条：AI最终答案
        {
            "role": "assistant",
            "content": "总结：近期AI领域有三大进展...",
            "type": "ai"
        }
    ]
}

中间件（Middleware）

中间件为自定义智能体行为提供强大的扩展性，可以在执行的不同阶段拦截和修改数据流。

中间件功能：

在调用模型之前处理状态（消息裁剪、上下文注入）
修改或验证模型的响应（防护栏、内容过滤）
使用自定义逻辑处理工具执行错误
基于状态或上下文实现动态模型选择
添加自定义日志、监控或分析

常用装饰器：

@before_model：模型调用前执行
@after_model：模型调用后执行
@wrap_tool_call：包装工具调用
@dynamic_prompt：动态生成提示
@wrap_model_call：包装模型调用

示例：自定义日志中间件

from langchain.agents.middleware import before_model

@before_model
def log_request(state, runtime):
    """记录模型请求"""
    print(f"请求模型，消息数：{len(state.get('messages', []))}")
    print(f"当前状态：{state}")
    return None  # 不修改状态

agent = create_agent(
    model="openai:gpt-4o",
    tools=tools,
    middleware=[log_request]
)

2. 模型（Models）

模型是智能体的推理引擎，决定调用哪些工具、如何解释结果、何时提供最终答案。

基本用法

初始化模型

方式1：使用 `init_chat_model`（推荐）

import os
from langchain.chat_models import init_chat_model

# 设置API密钥（环境变量方式）
os.environ["OPENAI_API_KEY"] = "sk-..."

# 初始化模型（自动推断提供商）
model = init_chat_model("gpt-4.1")  # 等价于 "openai:gpt-4.1"

# 或明确指定提供商
model = init_chat_model("anthropic:claude-sonnet-4-5")

# 传递配置参数
model = init_chat_model(
    "openai:gpt-4o",
    temperature=0.7,      # 控制随机性：0-1，值越高越有创造性
    max_tokens=1000,      # 限制输出长度
    timeout=30,           # 超时时间（秒）
    max_retries=3         # 失败重试次数
)

方式2：直接使用提供商类

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

# OpenAI
model = ChatOpenAI(
    model="gpt-4.1",
    temperature=0.5,
    max_tokens=500
)

# Anthropic
model = ChatAnthropic(
    model="claude-sonnet-4-5",
    temperature=0.3,
    max_tokens=1000
)

# Google Gemini
from langchain_google_genai import ChatGoogleGenerativeAI
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")

# Azure OpenAI
from langchain_openai import AzureChatOpenAI
model = AzureChatOpenAI(
    model="gpt-4.1",
    azure_deployment="your-deployment-name",
    api_version="2025-03-01-preview"
)

支持的模型提供商

# 主要提供商及初始化方式
providers = {
    "OpenAI": init_chat_model("openai:gpt-4o"),
    "Anthropic": init_chat_model("anthropic:claude-sonnet-4-5"),
    "Google": init_chat_model("google_genai:gemini-2.0-flash"),
    "Azure": init_chat_model("azure_openai:gpt-4.1"),
    "AWS Bedrock": init_chat_model("bedrock_converse:anthropic.claude-3-5-sonnet"),
    "本地模型": init_chat_model("ollama:llama3.1"),  # 需要本地安装Ollama
}

调用方法

1. `invoke()` - 同步调用

# 最简单的方式：传入字符串
response = model.invoke("为什么天空是蓝色的？")
print(response.text)  # 访问文本内容
print(response)       # 完整的 AIMessage 对象

# 传入消息列表（对话历史）
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

messages = [
    SystemMessage("你是一个有用的翻译助手"),
    HumanMessage("将'Hello'翻译成法语"),
    AIMessage("Bonjour"),
    HumanMessage("将'Goodbye'翻译成法语")
]

response = model.invoke(messages)
print(response.text)  # "Au revoir"

# 字典格式的消息
messages_dict = [
    {"role": "system", "content": "你是有帮助的助手"},
    {"role": "user", "content": "你好"}
]
response = model.invoke(messages_dict)

2. `stream()` - 流式调用

# 逐块输出，实时显示
for chunk in model.stream("讲一个关于AI的短故事"):
    print(chunk.text, end="", flush=True)  # 逐字显示
# 输出：从...前...有...一...个...AI...（逐步显示）

# 流式传输对话历史
messages = [HumanMessage("解释什么是机器学习")]
for chunk in model.stream(messages):
    if chunk.text:
        print(chunk.text, end="", flush=True)

# 聚合流式结果
full_response = None
for chunk in model.stream("天气如何？"):
    full_response = chunk if full_response is None else full_response + chunk
    print(full_response.text)  # 显示逐步累积的文本

3. `batch()` - 批量调用

# 批量处理多个独立请求
questions = [
    "什么是人工智能？",
    "解释机器学习",
    "深度学习是什么？",
    "神经网络如何工作？"
]

# 批量同步调用
responses = model.batch(questions)
for i, response in enumerate(responses):
    print(f"问题 {i+1}: {response.text[:50]}...")

# 带配置的批量调用
responses = model.batch(
    questions,
    config={
        'max_concurrency': 2,  # 最大并行数
        'tags': ['batch_demo']
    }
)

# 批量流式（按完成顺序返回）
for response in model.batch_as_completed(questions):
    # response 包含 (index, message) 或直接是message
    print(f"完成一个响应: {response.text[:30]}...")

工具调用

定义和绑定工具

from langchain.tools import tool

# 定义工具函数
@tool
def get_weather(location: str) -> str:
    """获取指定城市的天气信息"""
    # 这里可以调用真实天气API
    return f"{location}的天气：晴，25°C"

@tool
def calculator(expression: str) -> str:
    """计算数学表达式"""
    try:
        result = eval(expression)  # 注意：实际使用中要更安全
        return f"{expression} = {result}"
    except:
        return "无法计算该表达式"

# 绑定工具到模型
model_with_tools = model.bind_tools([get_weather, calculator])

# 也可以强制使用特定工具
model_forced = model.bind_tools(
    [get_weather], 
    tool_choice="any"  # 或 "get_weather" 强制使用
)

tool_choice="auto" - 模型自行决定是否使用工具（默认模式）
tool_choice="any" - 模型必须使用任意一个工具
tool_choice="具体工具名" - 模型必须使用指定的工具

工具调用循环

# 1. 模型生成工具调用
messages = [HumanMessage("北京天气怎么样？")]
ai_response = model_with_tools.invoke(messages)

# 检查是否有工具调用
if ai_response.tool_calls:
    print(f"模型要调用 {len(ai_response.tool_calls)} 个工具")
    
    # 2. 执行所有工具
    for tool_call in ai_response.tool_calls:
        tool_name = tool_call["name"]
        tool_args = tool_call["args"]
        
        print(f"调用工具: {tool_name}({tool_args})")
        
        # 执行对应工具
        if tool_name == "get_weather":
            result = get_weather.invoke(tool_args["location"])
        elif tool_name == "calculator":
            result = calculator.invoke(tool_args["expression"])
        
        # 3. 将结果添加为工具消息
        from langchain_core.messages import ToolMessage
        tool_msg = ToolMessage(
            content=result,
            tool_call_id=tool_call["id"]
        )
        messages.append(tool_msg)
    
    # 4. 将结果传回模型获取最终答案
    messages.append(ai_response)  # 先添加AI的消息
    final_response = model_with_tools.invoke(messages)
    print(f"最终答案: {final_response.text}")
else:
    print(f"直接回答: {ai_response.text}")

当单独使用模型（而不是Agent）时，需要手动执行工具。
使用裸模型+手动执行 - 复杂但控制强。

并行工具调用

# 用户查询两个城市的天气
response = model_with_tools.invoke("北京和上海现在的天气怎么样？")

# 检查模型是否生成了工具调用
if response.tool_calls:
    print("并行调用的工具:")
    
    # 遍历所有工具调用请求
    for tool_call in response.talls:
        # 打印每个工具调用的详细信息
        # tool_call结构示例:
        # {
        #     "name": "get_weather",           # 工具函数名
        #     "args": {"location": "北京"},    # 工具参数
        #     "id": "call_abc123"              # 本次调用的唯一ID
        # }
        print(f"- {tool_call['name']}: {tool_call['args']}")
    
    # 由于查询两个城市，模型很可能生成了两个独立的工具调用
    # 比如：
    # 工具1: get_weather({"location": "北京"})
    # 工具2: get_weather({"location": "上海"})
    
    # 导入异步编程模块
    import asyncio
    
    # 定义异步函数来并行执行工具
    async def execute_tools():
        tasks = []  # 存储所有异步任务的列表
        
        # 为每个工具调用创建异步任务
        for tool_call in response.tool_calls:
            # 只处理天气查询工具
            if tool_call["name"] == "get_weather":
                # 创建异步任务
                # asyncio.create_task() 创建并立即开始执行任务
                # get_weather.ainvoke() 是工具的异步调用版本
                task = asyncio.create_task(
                    get_weather.ainvoke(tool_call["args"]["location"])
                )
                # 保存任务及其对应的调用ID，以便后续匹配结果
                tasks.append((tool_call["id"], task))
        
        # 等待所有任务完成，并收集结果
        results = {}  # 字典：{工具调用ID: 工具返回结果}
        for tool_id, task in tasks:
            # 等待单个任务完成
            # await 会暂停当前函数，直到这个任务完成
            results[tool_id] = await task
        
        return results  # 返回所有工具执行结果
    
    # 注意：这只是函数定义，实际调用需要：
    # 1. 获取事件循环
    # 2. 运行异步函数
    
    # 实际应用中的完整调用示例：
    """
    # 方法1: 在异步环境中直接调用
    async def main():
        tool_results = await execute_tools()
        print(f"工具执行结果: {tool_results}")
    
    # 在异步程序中运行
    asyncio.run(main())
    
    # 方法2: 在Jupyter等环境中
    tool_results = await execute_tools()
    
    # 方法3: 使用同步代码调用异步函数（不推荐，但可行）
    import nest_asyncio
    nest_asyncio.apply()
    tool_results = asyncio.run(execute_tools())
    """
    
    # 工具执行完成后，需要将结果传回模型生成最终答案
    # 完整流程应该是：
    # 1. 用户提问 → 模型生成工具调用
    # 2. 并行执行所有工具调用
    # 3. 将工具结果包装成ToolMessage
    # 4. 将工具结果传回模型生成最终总结

流式工具调用

# 使用流式传输调用模型，实时查看生成过程
# 用户查询包含两个任务：1. 数学计算 2. 天气查询
for chunk in model_with_tools.stream("计算(3+4)*5，然后查北京天气"):
    """
    chunk 是模型实时生成的一小块输出
    可能包含：
    - 文本内容 (chunk.text)
    - 工具调用块 (chunk.tool_call_chunks)
    - 或两者都有
    
    流式输出示例时间线：
    第1秒: 模型思考文本："让我来计算..."
    第2秒: 生成计算工具调用块（逐步构建）
    第3秒: 执行计算（外部，不可见）
    第4秒: 生成天气查询工具调用块
    第5秒: 执行天气查询（外部）
    第6秒: 输出最终总结文本
    """
    
    # 检查当前块是否包含工具调用信息
    if chunk.tool_call_chunks:
        """
        tool_call_chunks 是工具调用的流式片段
        因为工具调用可能较大，模型会分多次生成
        
        示例工具调用的生成过程：
        块1: {'name': 'calculator'}           # 先出现工具名
        块2: {'args': '{"expression": "'}    # 开始生成参数
        块3: {'args': '(3+4)*5'}             # 参数内容
        块4: {'args': '"}'}                  # 参数结束
        """
        
        # 遍历当前块中的所有工具调用片段
        for tool_chunk in chunk.tool_call_chunks:
            # 检查这个片段是否包含工具名称
            if "name" in tool_chunk:
                """
                工具名称出现时打印
                示例输出：
                工具名称出现: calculator
                工具名称出现: get_weather
                """
                print(f"工具名称出现: {tool_chunk['name']}")
            
            # 检查这个片段是否包含工具参数
            if "args" in tool_chunk:
                """
                参数被分成多个块逐步生成
                示例输出：
                参数块: {"expression": "
                参数块: (3+4)*5
                参数块: "}
                
                注意：参数可能是JSON格式的一部分
                需要将多个块拼接起来才能得到完整的JSON
                """
                print(f"参数块: {tool_chunk['args']}")
    
    # 如果当前块是文本内容（不是工具调用）
    elif chunk.text:
        """
        模型生成的普通文本内容
        示例输出：
        "让我先计算(3+4)*5..."  # 逐步显示
        "计算结果是35。"         # 继续显示
        "现在查询北京天气..."    # 继续显示
        """
        print(chunk.text, end="", flush=True)
        # end="" 表示不换行
        # flush=True 表示立即输出，不缓冲

结构化输出

使用 Pydantic 模型

# 导入Pydantic库，用于定义数据模型和验证
# BaseModel是Pydantic的基础类，所有数据模型都继承它
# Field用于定义字段的元数据和验证规则
# List是Python的类型提示，表示列表类型
from pydantic import BaseModel, Field
from typing import List

# 定义第一个数据模型：Person（个人信息）
# 继承BaseModel，会自动获得数据验证、序列化等功能
class Person(BaseModel):
    """个人信息类 - 用于存储和验证个人数据"""
    
    # 姓名字段：字符串类型，必填
    # Field(...) 表示这是必需字段，不能为空
    name: str = Field(..., description="姓名")
    
    # 年龄字段：整数类型，必填
    # ge=0 表示最小值为0（greater than or equal to 0）
    # le=150 表示最大值为150（less than or equal to 150）
    # 这些验证规则会在创建Person对象时自动检查
    age: int = Field(..., description="年龄", ge=0, le=150)
    email: str = Field(..., description="邮箱地址")
    
    # 兴趣爱好字段：字符串列表，可选，有默认值
    # default_factory=list 表示默认值为空列表
    hobbies: List[str] = Field(default_factory=list, description="兴趣爱好")

# 定义第二个数据模型：Company（公司信息）
# 这个类展示了嵌套结构（公司包含员工列表）
class Company(BaseModel):
    """公司信息类 - 包含公司基本信息和员工列表"""
    
    # 公司名称：字符串类型，必填
    name: str = Field(..., description="公司名称")
    
    # 所属行业：字符串类型，必填
    industry: str = Field(..., description="所属行业")
    
    # 员工列表：Person对象的列表，必填
    # List[Person] 表示这是一个列表，列表中的每个元素都是Person类型
    # 这会形成嵌套结构：Company → List[Person] → Person的各个字段
    employees: List[Person] = Field(..., description="员工列表")

# ------------------------------------------------------------
# 使用结构化输出
# ------------------------------------------------------------

# 将模型转换为支持结构化输出的版本
# with_structured_output(Person) 告诉模型：
# "请按照Person类的格式输出数据，而不是自由文本"
structured_model = model.with_structured_output(Person)

# 调用模型并提取结构化信息
# 模型会分析文本，提取符合Person结构的信息
response = structured_model.invoke("提取：张三，30岁，zhangsan@email.com，喜欢编程和阅读")

# 打印完整的Person对象
print(response)
# 输出示例: Person(name='张三', age=30, email='zhangsan@email.com', hobbies=['编程', '阅读'])
# 注意：这不是字符串，而是一个Person对象！

# ------------------------------------------------------------
# 访问和验证结构化数据
# ------------------------------------------------------------

# 可以像访问普通对象属性一样访问字段
print(f"姓名: {response.name}")      # 输出: 姓名: 张三
print(f"年龄: {response.age}")       # 输出: 年龄: 30
print(f"邮箱: {response.email}")     # 输出: 邮箱: zhangsan@email.com
print(f"爱好: {response.hobbies}")   # 输出: 爱好: ['编程', '阅读']

# 验证功能示例（自动发生）：
# 如果文本中说"张三，-5岁"，会触发验证错误
# 因为 age 有 ge=0 的限制

# ------------------------------------------------------------
# 使用嵌套结构
# ------------------------------------------------------------

# 创建支持Company结构的模型
company_model = model.with_structured_output(Company)

# 调用模型提取公司信息
# 模型需要从文本中识别：公司名称、行业、员工列表
response = company_model.invoke("苹果公司，科技行业，员工有Tim Cook等")

# 访问嵌套数据
print(f"公司: {response.name}")               # 输出: 公司: 苹果公司
print(f"行业: {response.industry}")           # 输出: 行业: 科技行业
print(f"员工数: {len(response.employees)}")   # 输出: 员工数: 1

# 访问员工列表中的第一个员工
if response.employees:
    first_employee = response.employees[0]
    print(f"CEO: {first_employee.name}")      # 输出: CEO: Tim Cook

# ------------------------------------------------------------
# 更多使用示例
# ------------------------------------------------------------

# 示例1: 从复杂文本提取
complex_text = """
公司：微软 Microsoft
行业：软件和云计算
员工信息：
1. Satya Nadella，56岁，satya@microsoft.com，爱好：读书、板球
2. Brad Smith，64岁，brad@microsoft.com，爱好：写作、公益
"""

response = company_model.invoke(complex_text)
print(f"公司: {response.name}")
print(f"员工列表:")
for i, emp in enumerate(response.employees, 1):
    print(f"  {i}. {emp.name}, {emp.age}岁, {emp.email}")

# 示例2: 数据验证失败的情况
try:
    # 年龄超过限制会触发验证错误
    bad_response = structured_model.invoke("李四，200岁，lisi@test.com")
except Exception as e:
    print(f"验证错误: {e}")
    # 输出可能: 1 validation error for Person
    #           age
    #             Input should be less than or equal to 150 [type=less_than_equal, ...]

# 示例3: 使用默认值
text_without_hobbies = "王五，25岁，wangwu@test.com"
response = structured_model.invoke(text_without_hobbies)
print(f"爱好（默认值）: {response.hobbies}")  # 输出: 爱好（默认值）: []

# 示例4: 转换为字典或JSON（方便存储或传输）
person_dict = response.dict()
print(f"字典格式: {person_dict}")
# 输出: {'name': '王五', 'age': 25, 'email': 'wangwu@test.com', 'hobbies': []}

import json
person_json = response.json()
print(f"JSON格式: {person_json}")
# 输出: {"name": "王五", "age": 25, "email": "wangwu@test.com", "hobbies": []}

# ------------------------------------------------------------
# 结构化输出的优势
# ------------------------------------------------------------

"""
1. 数据一致性：确保输出总是包含指定的字段
2. 类型安全：自动验证数据类型（字符串、整数等）
3. 易于处理：可以直接访问属性，不用解析文本
4. 验证保障：自动检查数据范围、格式等
5. 文档清晰：字段描述帮助AI更好理解需要提取什么
6. 嵌套支持：可以处理复杂的数据结构

对比非结构化输出：
文本输出: "张三，30岁，zhangsan@email.com，喜欢编程和阅读"
需要手动解析，容易出错

结构化输出: Person对象
可以直接 response.name, response.age...
"""

使用 TypedDict

# 导入必要的类型定义工具
# TypedDict: Python的类型提示工具，用于定义字典的结构（键和值的类型）
# Annotated: 用于给类型添加额外元数据（如字段描述）
from typing_extensions import TypedDict, Annotated
from typing import List

# 使用TypedDict定义数据结构
# TypedDict是Python的内置功能（3.8+），不需要额外依赖
# 适合不需要复杂验证的简单场景
class PersonDict(TypedDict):
    """
    个人信息 - TypedDict版本
    TypedDict只定义类型，不提供运行时验证
    适合：1. 代码提示 2. 类型检查 3. 简单结构定义
    不像Pydantic那样有自动验证功能
    """
    
    # 使用Annotated给字段添加元数据
    # Annotated[类型, 默认值, 描述]
    # 第一个参数: 字段类型（str）
    # 第二个参数: ... 表示省略号，这里是占位符（在TypedDict中不用写默认值）
    # 第三个参数: 字段描述，会传递给AI模型，帮助它理解这个字段的含义
    name: Annotated[str, ..., "姓名"]      # 姓名字段，字符串类型，描述为"姓名"
    
    # 年龄字段，整数类型
    age: Annotated[int, ..., "年龄"]      # 年龄字段，整数类型，描述为"年龄"
    
    # 邮箱字段，字符串类型  
    email: Annotated[str, ..., "邮箱"]     # 邮箱字段，字符串类型，描述为"邮箱"

# 将模型绑定到这种结构化输出
# with_structured_output(PersonDict) 告诉模型：
# "请输出一个字典，这个字典必须符合PersonDict的类型定义"
dict_model = model.with_structured_output(PersonDict)

# 使用模型提取结构化信息
# 模型会分析文本，提取信息并组织成PersonDict格式的字典
response = dict_model.invoke("李四，25岁，lisi@test.com")

# 打印结果
print(response)  
# 输出示例: {'name': '李四', 'age': 25, 'email': 'lisi@test.com'}
# 注意：这是一个普通的Python字典，不是特殊对象


# 两种方式的区别：
"""
1. TypedDict (PersonDict):
   - 优点: 轻量，Python内置，不需要额外依赖
   - 缺点: 没有运行时验证，只是类型提示
   - 输出: 普通字典
   - 验证: 无自动验证

2. Pydantic (PersonPydantic):  
   - 优点: 完整验证，丰富功能（默认值、验证器等）
   - 缺点: 需要额外依赖
   - 输出: 对象（有属性和方法）
   - 验证: 自动验证数据范围和类型

使用场景：
- 简单数据提取，不需要复杂验证 → TypedDict
- 生产环境，需要严格验证 → Pydantic
"""

# ------------------------------------------------------------
# 使用TypedDict响应
# ------------------------------------------------------------

# 访问字典字段
print(f"姓名: {response['name']}")      # 输出: 姓名: 李四
print(f"年龄: {response['age']}")       # 输出: 年龄: 25
print(f"邮箱: {response['email']}")     # 输出: 邮箱: lisi@test.com

# 可以进行字典操作
response_copy = response.copy()          # 复制字典
response_keys = response.keys()          # 获取所有键
response_values = response.values()      # 获取所有值

# 转换为JSON（需要先导入json模块）
import json
json_str = json.dumps(response, ensure_ascii=False)
print(f"JSON格式: {json_str}")  # 输出: {"name": "李四", "age": 25, "email": "lisi@test.com"}

# ------------------------------------------------------------
# 带可选字段的TypedDict
# ------------------------------------------------------------

from typing import Optional

class Product(TypedDict, total=False):
    """
    产品信息 - total=False表示所有字段都是可选的
    这样模型可以灵活处理不完整的信息
    """
    name: Annotated[Optional[str], ..., "产品名称"]
    price: Annotated[Optional[float], ..., "价格"]
    category: Annotated[Optional[str], ..., "类别"]
    in_stock: Annotated[Optional[bool], ..., "是否有库存"]

使用 JSON Schema

import json

# 定义JSON Schema
person_schema = {
    "title": "Person",
    "description": "个人信息",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "姓名"
        },
        "age": {
            "type": "integer",
            "description": "年龄",
            "minimum": 0,
            "maximum": 150
        },
        "email": {
            "type": "string",
            "description": "邮箱",
            "format": "email"
        }
    },
    "required": ["name", "age", "email"],
    "additionalProperties": False
}

# 绑定
schema_model = model.with_structured_output(
    person_schema,
    method="json_schema"  # 明确指定使用JSON Schema方法
)

# 使用
response = schema_model.invoke("王五，28岁，wangwu@company.com")
print(json.dumps(response, indent=2, ensure_ascii=False))

获取原始响应和解析结果

# 同时获取解析结果和原始消息
model_with_raw = model.with_structured_output(
    Person,
    include_raw=True  # 同时返回原始消息
)

result = model_with_raw.invoke("提取：赵六，35岁，zhaoliu@test.com")

print("解析后的结构:")
print(result["parsed"])  # Person对象

print("\n原始消息:")
print(result["raw"])  # AIMessage对象

print("\n解析错误（如果有）:")
print(result["parsing_error"])  # None或错误信息

# 访问原始消息的元数据
if result["raw"].usage_metadata:
    print(f"Token使用: {result['raw'].usage_metadata}")

高级功能

多模态支持

# 处理图像输入
from langchain_core.messages import HumanMessage
from langchain_core.documents import Document
import base64  # 用于图片base64编码

# 假设有图像数据，读取本地图片文件
image_data = open("cat.jpg", "rb").read()  # 以二进制模式读取图片

# 创建多模态消息（包含文本和图片）
# HumanMessage的content可以是列表，包含多种类型的内容
multimodal_msg = HumanMessage(
    content=[
        # 文本部分：给模型的指令
        {"type": "text", "text": "描述这张图片"},
        # 图片部分：将图片编码为base64数据URL
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64.b64encode(image_data).decode()}"}}
        # data:image/jpeg;base64, 是数据URL的前缀，告诉浏览器这是JPEG图片的base64编码
        # base64.b64encode() 将二进制图片数据转换为base64字符串
        # .decode() 将bytes转换为字符串
    ]
)

# 需要支持多模态的模型（如GPT-4V、Claude-3.5、Gemini等）
# 这些模型可以同时"看"图片和"读"文字
response = model.invoke(multimodal_msg)
print(response.text)  # 打印模型对图片的描述

# 模型输出多模态内容（模型不仅可以接受多模态输入，还可以生成多模态输出）
response = model.invoke("生成一张猫的图片描述，然后创建图片")
# content_blocks 包含模型生成的各种类型的内容块
for block in response.content_blocks:
    if block["type"] == "text":
        # 文本内容块：模型生成的文字描述
        print(f"文本: {block['text']}")
    elif block["type"] == "image":
        # 图片内容块：模型生成的图片
        print(f"图片: {block['mime_type']}, 大小: {len(block['image_data'])}字节")
        # mime_type: 图片类型，如"image/jpeg"、"image/png"
        # image_data: base64编码的图片数据
        
        # 可以保存图片到文件
        with open("generated_cat.png", "wb") as f:
            # 将base64字符串解码为二进制数据并写入文件
            f.write(base64.b64decode(block["image_data"]))

推理过程

# 显示模型的推理步骤（如果模型支持）
# 某些高级模型（如Claude 3.5+、GPT-4o reasoning等）支持"思维链"输出
# 它们不仅给出答案，还会显示得出答案的思考过程

response = model.invoke(
    "一个篮子里有5个苹果，拿走2个，又放进3个，现在有几个？",
    # 有些模型可以通过参数控制推理深度
    # reasoning_effort 参数告诉模型："请详细展示你的思考过程"
    reasoning_effort="high"  # 或 'low', 'medium', 或具体token数如1000
    # 参数含义：
    # - "low": 简单推理，可能只显示关键步骤
    # - "medium": 中等详细程度
    # - "high": 详细推理，展示完整思考链
    # - 数字（如1000）：分配特定token数量给推理
)

# 提取推理内容
# 模型的响应可能包含多种类型的内容块（content blocks）
if hasattr(response, 'content_blocks'):
    # 筛选出所有类型为 "reasoning" 的内容块
    # content_blocks 结构示例：
    # [
    #     {"type": "reasoning", "reasoning": "首先，篮子里有5个苹果..."},
    #     {"type": "reasoning", "reasoning": "然后，拿走2个，剩下3个..."},
    #     {"type": "reasoning", "reasoning": "接着，放进3个，总共有6个..."},
    #     {"type": "text", "text": "所以，篮子里现在有6个苹果。"}
    # ]
    reasoning_blocks = [
        b for b in response.content_blocks  # 遍历所有内容块
        if b.get("type") == "reasoning"     # 只选择类型为 "reasoning" 的块
    ]
    
    # 打印所有推理步骤
    print("=== 模型的推理过程 ===")
    for i, block in enumerate(reasoning_blocks, 1):
        # block.get('reasoning', '') 安全获取推理文本，如果没有则返回空字符串
        print(f"步骤{i}: {block.get('reasoning', '')}")
    print("=====================")
    
    # 也可以查看其他类型的内容块
    text_blocks = [b for b in response.content_blocks if b.get("type") == "text"]
    if text_blocks:
        print(f"\n最终答案: {text_blocks[0].get('text', '')}")

# 流式传输推理过程 - 实时查看模型思考
print("\n=== 流式推理演示 ===")
for chunk in model.stream("解方程 2x + 5 = 15"):
    # chunk 是模型实时生成的一小块输出
    # 可能包含多个内容块（content_blocks）
    
    # 检查chunk中是否有content_blocks属性
    if hasattr(chunk, 'content_blocks'):
        # 遍历当前chunk中的所有内容块
        for block in chunk.content_blocks:
            # 如果是推理块
            if block.get("type") == "reasoning":
                # 实时显示推理文本
                # 注意：推理文本可能被分成多个块逐步发送
                reasoning_text = block.get('reasoning', '')
                if reasoning_text:  # 只打印非空内容
                    print(f"🤔 推理: {reasoning_text}")
            
            # 如果是文本块（最终答案）
            elif block.get("type") == "text":
                # 实时显示最终答案（逐字显示效果）
                text = block.get('text', '')
                if text:
                    print(f"📝 输出: {text}", end="", flush=True)
                    # end="" 表示不换行，flush=True 表示立即显示
    else:
        # 如果模型不支持content_blocks，chunk可能直接包含text
        if hasattr(chunk, 'text') and chunk.text:
            print(chunk.text, end="", flush=True)

print()  # 最后换行

速率限制

from langchain_core.rate_limiters import InMemoryRateLimiter

# 创建速率限制器
rate_limiter = InMemoryRateLimiter(
    requests_per_second=2,      # 每秒最多2个请求
    check_every_n_seconds=0.1,  # 每100ms检查一次
    max_bucket_size=5          # 最大突发请求数
)

# 应用速率限制
limited_model = init_chat_model(
    "openai:gpt-4o",
    rate_limiter=rate_limiter
)

# 批量请求会自动遵守速率限制
responses = limited_model.batch([
    "问题1", "问题2", "问题3", "问题4", "问题5"
], config={'max_concurrency': 1})  # 限制并发数

Token使用统计

from langchain_core.callbacks import UsageMetadataCallbackHandler

# 创建回调处理器
usage_callback = UsageMetadataCallbackHandler()

# 调用时传入callback
response = model.invoke(
    "写一篇关于AI的短文",
    config={"callbacks": [usage_callback]}
)

# 获取使用统计
print("Token使用情况:")
print(usage_callback.usage_metadata)

# 使用上下文管理器
from langchain_core.callbacks import get_usage_metadata_callback

with get_usage_metadata_callback() as cb:
    model.invoke("第一个请求")
    model.invoke("第二个请求")
    
    print("聚合统计:")
    for model_name, stats in cb.usage_metadata.items():
        print(f"{model_name}: {stats['total_tokens']} tokens")

配置模型行为

可配置模型字段

# 创建可配置的模型
configurable_model = init_chat_model(
    "gpt-4o-mini",  # 默认模型
    temperature=0.5,
    configurable_fields=("model", "temperature", "max_tokens")  # 可运行时修改的字段
)

# 运行时配置
response1 = configurable_model.invoke(
    "讲个笑话",
    config={"configurable": {"model": "gpt-4o"}}  # 临时切换到gpt-4o
)

response2 = configurable_model.invoke(
    "严肃回答",
    config={"configurable": {"temperature": 0.1}}  # 降低随机性
)

# 多模型切换
models_config = {
    "fast": {"model": "gpt-4o-mini", "temperature": 0.3},
    "accurate": {"model": "gpt-4o", "temperature": 0.1},
    "creative": {"model": "claude-sonnet-4-5", "temperature": 0.8}
}

for mode, settings in models_config.items():
    response = configurable_model.invoke(
        "写一首诗",
        config={"configurable": settings}
    )
    print(f"{mode}模式: {response.text[:50]}...")

完整配置示例

# 完整的模型调用配置
response = model.invoke(
    "分析市场趋势",
    config={
        "run_name": "market_analysis",        # 运行名称（用于日志）
        "tags": ["analysis", "finance"],      # 标签
        "metadata": {                         # 元数据
            "user_id": "u123",
            "project": "market_research",
            "priority": "high"
        },
        "callbacks": [usage_callback],        # 回调处理器
        "max_concurrency": 1,                 # 最大并发
        "recursion_limit": 50,                # 递归深度限制
        
        # 可配置字段
        "configurable": {
            "temperature": 0.3,
            "max_tokens": 500
        }
    }
)

3.消息（Messages）

概述

消息是LangChain中与LLM交互的基本单位。它们代表模型的输入和输出，携带内容和元数据，用于在与LLM交互时表示对话状态。

消息是包含以下内容的对象：

角色 - 标识消息类型（例如system、user）
内容 - 表示消息的实际内容（例如文本、图像、音频、文档等）
元数据 - 可选字段，例如响应信息、消息ID和令牌使用情况

LangChain提供了一种标准消息类型，可在所有模型提供商之间工作，确保无论调用哪个模型都能保持一致的行为。

基本用法

使用消息的最简单方式是创建消息对象，并在调用时将它们传递给模型。

from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage, AIMessage, SystemMessage

model = init_chat_model("openai:gpt-5-nano")

# 创建系统消息和用户消息
system_msg = SystemMessage("You are a helpful assistant.")
human_msg = HumanMessage("Hello, how are you?")

# 与聊天模型一起使用
messages = [system_msg, human_msg]
response = model.invoke(messages)  # 返回 AIMessage

文本提示

文本提示是字符串 - 适用于不需要保留对话历史的简单生成任务。

response = model.invoke("Write a haiku about spring")

何时使用文本提示：

只有一个独立的请求
不需要对话历史
希望代码复杂度最小

消息提示

或者，您可以通过提供消息对象列表将消息列表传递给模型。

from langchain.messages import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage("You are a poetry expert"),
    HumanMessage("Write a haiku about spring"),
    AIMessage("Cherry blossoms bloom...")
]
response = model.invoke(messages)

何时使用消息提示：

管理多轮对话
处理多模态内容（图像、音频、文件）
包含系统指令

字典格式

您还可以直接以OpenAI聊天补全格式指定消息。

messages = [
    {"role": "system", "content": "You are a poetry expert"},
    {"role": "user", "content": "Write a haiku about spring"},
    {"role": "assistant", "content": "Cherry blossoms bloom..."}
]
response = model.invoke(messages)

消息类型

系统消息

SystemMessage表示一组初始指令，用于引导模型的行为。您可以使用系统消息来设置语气、定义模型角色并建立响应指南。

system_msg = SystemMessage("You are a helpful coding assistant.")

messages = [
    system_msg,
    HumanMessage("How do I create a REST API?")
]
response = model.invoke(messages)

# 更详细的系统消息
system_msg = SystemMessage("""
You are a senior Python developer with expertise in web frameworks.
Always provide code examples and explain your reasoning.
Be concise but thorough in your explanations.
""")

人类消息

HumanMessage表示用户输入和交互。它们可以包含文本、图像、音频、文件以及任何其他多模态内容。

# 文本内容
response = model.invoke([
    HumanMessage("What is machine learning?")
])

# 使用字符串是单个HumanMessage的快捷方式
response = model.invoke("What is machine learning?")

# 消息元数据
human_msg = HumanMessage(
    content="Hello!",
    name="alice",  # 可选：标识不同用户
    id="msg_123",  # 可选：用于追踪的唯一标识符
)

# 注意：name字段的行为因提供商而异

AI消息

AIMessage表示模型调用的输出。它们可以包含多模态数据、工具调用和提供商特定的元数据。

提供商对不同类型的消息的权重/上下文处理不同，这意味着有时手动创建新的AIMessage对象并将其插入消息历史中就像来自模型一样很有帮助。

from langchain.messages import AIMessage, SystemMessage, HumanMessage

# 手动创建AI消息（例如，用于对话历史）
ai_msg = AIMessage("I'd be happy to help you with that question!")

# 添加到对话历史
messages = [
    SystemMessage("You are a helpful assistant"),
    HumanMessage("Can you help me?"),
    ai_msg,  # 插入就像来自模型一样
    HumanMessage("Great! What's 2+2?")
]

response = model.invoke(messages)

AI消息的属性

工具调用

当模型进行工具调用时，它们包含在AIMessage中：

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5-nano")

def get_weather(location: str) -> str:
    """Get the weather at a location."""
    ...

model_with_tools = model.bind_tools([get_weather])
response = model_with_tools.invoke("What's the weather in Paris?")

for tool_call in response.tool_calls:
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")
    print(f"ID: {tool_call['id']}")

其他结构化数据（如推理或引用）也可以出现在消息内容中。

Toekn使用

AIMessage可以在其usage_metadata字段中保存Token计数和其他使用元数据：

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5-nano")

response = model.invoke("Hello!")
response.usage_metadata
# {'input_tokens': 8,
#  'output_tokens': 304,
#  'total_tokens': 312,
#  'input_token_details': {'audio': 0, 'cache_read': 0},
#  'output_token_details': {'audio': 0, 'reasoning': 256}}

流式传输和块

在流式传输期间，您将收到可以组合成完整消息对象的AIMessageChunk对象：

chunks = []
full_message = None
for chunk in model.stream("Hi"):
    chunks.append(chunk)
    print(chunk.text)
    full_message = chunk if full_message is None else full_message + chunk

工具消息

对于支持工具调用的模型，AI消息可以包含工具调用。工具消息用于将单个工具执行的结果传回模型。

工具可以直接生成ToolMessage对象。下面展示一个简单示例：

# 模型进行工具调用后
ai_message = AIMessage(
    content=[],
    tool_calls=[{
        "name": "get_weather",
        "args": {"location": "San Francisco"},
        "id": "call_123"
    }]
)

# 执行工具并创建结果消息
weather_result = "Sunny, 72°F"
tool_message = ToolMessage(
    content=weather_result,
    tool_call_id="call_123"  # 必须匹配调用ID
)

# 继续对话
messages = [
    HumanMessage("What's the weather in San Francisco?"),
    ai_message,  # 模型的工具调用
    tool_message,  # 工具执行结果
]
response = model.invoke(messages)  # 模型处理结果

消息内容

您可以将消息的内容视为发送给模型的数据负载。消息具有一个松散类型的content属性，支持字符串和未类型对象列表（例如字典）。这允许在LangChain聊天模型中直接支持提供商原生结构，例如多模态内容和其他数据。

LangChain另外为文本、推理、引用、多模态数据、服务器端工具调用和其他消息内容提供了专用内容类型。

LangChain聊天模型接受content属性中的消息内容，可以包含：

一个字符串
提供商原生格式的内容块列表
LangChain的标准内容块列表

from langchain.messages import HumanMessage

# 字符串内容
human_message = HumanMessage("Hello, how are you?")

# 提供商原生格式（例如OpenAI）
human_message = HumanMessage(content=[
    {"type": "text", "text": "Hello, how are you?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
])

# 标准内容块列表
human_message = HumanMessage(content_blocks=[
    {"type": "text", "text": "Hello, how are you?"},
    {"type": "image", "url": "https://example.com/image.jpg"},
])

提示： 在初始化消息时指定content_blocks仍将填充消息content，但提供了一个类型安全的接口。

标准内容块

不同的AI模型提供商有不同的输出格式：

Anthropic (Claude) 输出：{“type”: “thinking”, “thinking”: “…”}
OpenAI (GPT) 输出：{“type”: “reasoning”, “id”: “rs_abc123”, …}
Google (Gemini) 输出：{“type”: “reasoning_content”, “parts”: […]

如果你直接处理原始content，需要写不同的代码处理不同模型。

解决方案：content_blocks
content_blocks 是一个统一接口，把所有模型的输出转换成标准格式。

from langchain.messages import AIMessage

message = AIMessage(
    content=[
        {"type": "thinking", "thinking": "...", "signature": "WaUjzkyp..."},
        {"type": "text", "text": "..."},
    ],
    response_metadata={"model_provider": "anthropic"}
)
message.content_blocks
# [{'type': 'reasoning',
#   'reasoning': '...',
#   'extras': {'signature': 'WaUjzkyp...'}},
#  {'type': 'text', 'text': '...'}]

多模态输入

图像输入

# 方式1：URL（最简单）
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "描述图片"},
        {"type": "image", "url": "https://example.com/image.jpg"},
    ]
}

# 方式2：base64（本地文件）
import base64
image_data = open("cat.jpg", "rb").read()
base64_image = base64.b64encode(image_data).decode()

message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "描述图片"},
        {"type": "image", "base64": base64_image, "mime_type": "image/jpeg"},
    ]
}

# 方式3：文件ID（上传到模型平台）
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "描述图片"},
        {"type": "image", "file_id": "file-abc123"},
    ]
}

PDF/音频/视频

# PDF文档
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "总结文档"},
        {"type": "file", "url": "https://example.com/doc.pdf"},
    ]
}

# 音频文件
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "音频说了什么"},
        {"type": "audio", "base64": "...", "mime_type": "audio/wav"},
    ]
}

# 视频文件
message = {
    "role": "user",
    "content": [
        {"type": "text", "text": "描述视频"},
        {"type": "video", "base64": "...", "mime_type": "video/mp4"},
    ]
}

内容块类型

核心块

# 文本块
{"type": "text", "text": "你好", "annotations": [], "extras": {}}

# 推理块（模型思考过程）
{"type": "reasoning", "reasoning": "用户问天气...", "extras": {}}

多模态块

# 图像
{"type": "image", "url": "...", "mime_type": "image/jpeg"}

# 文件
{"type": "file", "url": "...", "mime_type": "application/pdf"}

# 纯文本文件
{"type": "text-plain", "text": "内容", "mime_type": "text/plain"}

工具相关块

# 工具调用
{"type": "tool_call", "name": "search", "args": {"query": "天气"}, "id": "call_123"}

# 流式工具调用块（逐步发送）
{"type": "tool_call_chunk", "name": "search", "args": '{"query": "we', "id": "call_123"}

# 服务器端工具调用
{"type": "server_tool_call", "id": "ws_123", "name": "web_search", "args": {"query": "天气"}}

4.工具（Tool）

工具是让AI模型能够与外部系统（API、数据库、文件系统等）交互的桥梁。它们将函数封装成结构化的接口，模型可以决定何时调用以及使用什么参数。

创建工具的三种方式

1. 基础方法 - @tool装饰器

from langchain.tools import tool

@tool
def search_api(query: str, max_results: int = 5) -> str:
    # 实际调用API的代码
    return f"搜索 '{query}' 得到 {max_results} 个结果"

类型提示必须写：query: str 这样模型才知道参数类型
docstring就是工具描述：模型根据这个决定是否使用该工具

2. 自定义属性

# 自定义工具名称（默认用函数名）
@tool("database_lookup")  # 工具名变为"database_lookup"
def find_in_db(search_term: str) -> str:
    """在数据库中查找记录"""
    return "查询结果"

# 自定义完整描述
@tool(
    name="calculator",
    description="执行数学计算。输入数学表达式如'2+2*3'，返回计算结果。"
)
def calculate(expression: str) -> str:
    """实际计算函数"""
    return str(eval(expression))

3. 复杂参数定义

当参数复杂时，用Pydantic模型定义：

from pydantic import BaseModel, Field
from typing import Optional, Literal

class FlightSearchInput(BaseModel):
    """航班搜索输入参数"""
    departure: str = Field(description="出发城市代码，如'PEK'")
    arrival: str = Field(description="到达城市代码，如'SHA'")
    date: str = Field(description="日期，格式'YYYY-MM-DD'")
    passengers: int = Field(default=1, description="乘客数量，1-9")
    cabin_class: Literal["economy", "premium", "business", "first"] = "economy"
    direct_only: bool = Field(default=False, description="是否只要直飞")

@tool(args_schema=FlightSearchInput)
def search_flights(
    departure: str,
    arrival: str,
    date: str,
    passengers: int = 1,
    cabin_class: str = "economy",
    direct_only: bool = False
) -> str:
    """实际搜索航班逻辑"""
    return f"找到从{departure}到{arrival}的航班"

ToolRuntime详解 - 访问上下文

为什么需要上下文？

普通工具只能看到传入的参数，但真实场景需要：

知道当前对话历史
获取用户身份信息
访问长期记忆
实时更新进度

ToolRuntime包含什么？

# 查看ToolRuntime结构
runtime: ToolRuntime = {
    "state": {},        # 可变状态（当前对话消息等）
    "context": {},      # 不可变配置（用户ID等）
    "store": {},        # 持久化存储
    "stream_writer": ..., # 流式写入器
    "config": {...},    # 运行配置
    "tool_call_id": "..." # 当前工具调用ID
}

1. 访问状态（State）

状态是执行过程中的可变数据：

@tool
def analyze_conversation(runtime: ToolRuntime) -> str:
    """分析当前对话状态"""
    messages = runtime.state.get("messages", [])
    
    # 统计各类消息
    user_msgs = [m for m in messages if m.type == "human"]
    ai_msgs = [m for m in messages if m.type == "ai"]
    
    return f"用户说了{len(user_msgs)}次，AI回复了{len(ai_msgs)}次"

@tool
def get_user_preference(pref_key: str, runtime: ToolRuntime) -> str:
    """获取用户偏好设置（比如主题、语言等）"""
    # 从状态中获取preferences字段
    preferences = runtime.state.get("user_preferences", {})
    return preferences.get(pref_key, "未设置")

2. 更新状态

使用Command对象修改状态：

from langgraph.types import Command

@tool
def reset_conversation(runtime: ToolRuntime) -> Command:
    """重置对话历史"""
    return Command(
        update={
            "messages": [],  # 清空消息
            "conversation_count": 0
        }
    )

@tool
def set_user_name(new_name: str, runtime: ToolRuntime) -> Command:
    """设置用户名"""
    return Command(
        update={
            "user_name": new_name,
            "messages": [...]  # 也可以同时修改多条数据
        }
    )

Command 是 LangGraph 中的一个特殊对象，用于命令式地修改图执行的状态和流程。它让工具不仅能返回数据给用户，还能主动改变系统的运行状态。

为什么需要Command？
普通工具只能返回字符串给用户，但在复杂场景中，你还需要：

修改对话状态（清空历史、添加消息）
控制执行流程（跳转到其他节点、中断执行）
批量更新多个状态字段

Command就是用来做这些事的。

Command的核心功能

1. 更新状态（Update State）
修改当前执行的状态数据：

@tool
def start_new_chat(runtime: ToolRuntime) -> Command:
    """开始新的对话，清空历史"""
    return Command(
        update={
            "messages": [],  # 清空所有消息
            "conversation_start_time": datetime.now(),  # 设置新的开始时间
            "message_count": 0  # 重置计数器
        }
    )

2. 添加消息到历史
工具执行后自动添加的消息是ToolMessage，但你可能需要添加自定义消息：

@tool
def add_system_announcement(announcement: str, runtime: ToolRuntime) -> Command:
    """添加系统公告到对话历史"""
    return Command(
        update={
            "messages": [  # messages是一个列表，可以追加
                HumanMessage(content="用户刚才说了什么..."),  # 现有消息保留
                AIMessage(content="AI的回复..."),
                # 添加新的系统消息
                AIMessage(content=f"【系统公告】{announcement}")
            ]
        }
    )

3. 特殊的消息操作
LangGraph提供了一些特殊的消息操作：

from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES

@tool
def clear_last_n_messages(n: int, runtime: ToolRuntime) -> Command:
    """清除最后N条消息"""
    return Command(
        update={
            "messages": [
                RemoveMessage(id=-1)  # 删除最后一条
                for _ in range(n)
            ]
        }
    )

@tool 
def clear_all_messages() -> Command:
    """清除所有消息（更高效的方式）"""
    return Command(
        update={
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES)  # 一次性清除所有
            ]
        }
    )

4. 流程控制
Command可以控制下一步执行什么：

@tool
def escalate_to_human(runtime: ToolRuntime) -> Command:
    """将对话升级到人工客服"""
    return Command(
        update={
            "messages": [
                AIMessage(content="正在为您转接人工客服...")
            ],
            "need_human": True  # 设置标志
        },
        goto="human_agent_node"  # 跳转到人工客服节点
    )

@tool
def interrupt_conversation(reason: str, runtime: ToolRuntime) -> Command:
    """中断对话"""
    return Command(
        update={
            "interrupted": True,
            "interrupt_reason": reason
        },
        # 中断执行，不继续后续节点
        interrupt=True  
    )

Command的完整结构

Command(
    # 要更新到状态的值（字典）
    update: Dict[str, Any] = {},
    
    # 要跳转到的节点名称（可选）
    goto: Optional[str] = None,
    
    # 是否中断图执行（可选）
    interrupt: Optional[Union[bool, str]] = None,
    
    # 要处理的工具调用（内部使用）
    tool_call: Optional[Dict] = None,
    
    # 合并策略（如何合并更新）
    merge_strategy: str = "replace"  # 或 "merge"
)

3. 访问上下文（Context）

上下文是不可变的配置信息，比如用户身份：

from dataclasses import dataclass

@dataclass
class AppContext:
    user_id: str
    session_id: str
    api_key: Optional[str] = None

@tool
def get_user_profile(runtime: ToolRuntime[AppContext]) -> str:
    """获取用户个人资料"""
    user_id = runtime.context.user_id
    session = runtime.context.session_id
    
    # 根据user_id查询数据库
    user_data = query_database(user_id)
    
    return f"用户{user_id}的资料（会话{session}）: {user_data}"

4. 存储（Memory）- 长期记忆

存储是跨对话的持久化数据：

@tool
def remember_fact(key: str, value: str, runtime: ToolRuntime) -> str:
    """记住一个事实（比如用户喜好）"""
    # 存储到"facts"命名空间，key为标识
    runtime.store.put(("facts",), key, value)
    return f"已记住：{key} = {value}"

@tool
def recall_fact(key: str, runtime: ToolRuntime) -> str:
    """回忆存储的事实"""
    result = runtime.store.get(("facts",), key)
    if result:
        return f"{key}: {result.value}"
    return f"不记得关于{key}的信息"

5. 流式输出（Stream Writer）

工具执行时可以实时发送更新：

@tool
def process_data(data: str, runtime: ToolRuntime) -> str:
    """处理数据的工具，显示进度"""
    writer = runtime.stream_writer
    
    writer("开始处理数据...")
    writer(f"输入数据长度：{len(data)}字符")
    
    # 模拟处理步骤
    writer("步骤1: 数据清洗...")
    cleaned = clean_data(data)
    
    writer("步骤2: 数据分析...")
    analysis = analyze_data(cleaned)
    
    writer("步骤3: 生成报告...")
    
    return f"处理完成：{analysis}"

实际使用示例

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langgraph.store.memory import InMemoryStore

# 定义工具
@tool
def search_web(query: str) -> str:
    """搜索网络信息"""
    return fetch_from_google(query)

@tool
def save_to_memory(key: str, value: str, runtime: ToolRuntime) -> str:
    """保存信息到记忆"""
    runtime.store.put(("memories",), key, value)
    return "已保存"

# 创建带存储的agent
store = InMemoryStore()
model = ChatOpenAI(model="gpt-4")
agent = create_agent(
    model,
    tools=[search_web, save_to_memory],
    store=store,
    system_prompt="你是助手，可以使用工具"
)

# 使用
result = agent.invoke({
    "messages": [{
        "role": "user", 
        "content": "搜索LangChain的信息然后记住它"
    }]
})

5.流式传输（Stream）

概述 (Overview)

流式传输是LangChain为基于LLM的应用程序提供的实时反馈系统。它解决了LLM处理延迟带来的用户体验问题——用户不需要等待整个响应生成完成，而是可以逐步看到处理进度和部分结果。

LangChain的流式传输系统提供了三个层次的实时更新：

代理进度流式传输 - 在每个代理步骤完成后发送完整状态
LLM令牌流式传输 - 在模型生成每个令牌时实时发送
自定义更新流式传输 - 工具执行过程中发送的用户定义信号

这些模式可以单独使用，也可以组合使用，以满足不同应用场景的需求。

代理进度 (Agent progress)

什么是代理进度流式传输？

当代理执行时，它会经历多个步骤：思考、工具调用、工具执行、再思考等。stream_mode="updates"模式会在每个步骤完成后发送一个事件，让你知道代理执行到了哪个阶段。

详细工作原理

代理本质上是一个状态机，它的执行过程可以分解为：

用户输入 → LLM思考 → [工具调用 → 工具执行] × N → LLM生成最终响应

使用stream_mode="updates"，你可以监听到每个→处的状态变化。

代码示例详解

from langchain.agents import create_agent

def get_weather(city: str) -> str:
    """获取给定城市的天气。"""
    return f"It's always sunny in {city}!"

agent = create_agent(
    model="openai:gpt-5-nano",
    tools=[get_weather],
)

# 关键：使用stream方法并指定stream_mode="updates"
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="updates",  # ← 监听代理进度
):
    # chunk的结构：{"步骤名称": {"状态数据"}}
    for step_name, step_data in chunk.items():
        print(f"当前步骤: {step_name}")
        
        # 通常最关心的是最新的一条消息
        last_message = step_data['messages'][-1]
        
        if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
            print(f"工具调用: {last_message.tool_calls}")
        else:
            print(f"消息内容: {last_message.content}")

典型输出分析

step: model
content: [{'type': 'tool_call', 'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_...'}]

↑ 第一步：LLM决定调用get_weather工具

step: tools  
content: [{'type': 'text', 'text': "It's always sunny in San Francisco!"}]

↑ 第二步：工具执行完成，返回结果

step: model
content: [{'type': 'text', 'text': 'It's always sunny in San Francisco!'}]

↑ 第三步：LLM基于工具结果生成最终响应

实际应用场景

进度指示器：显示"思考中…" → “调用天气API…” → “生成回答…”
调试监控：查看代理的执行路径是否正确
用户反馈：让用户知道系统正在工作，而不是卡住了

LLM令牌 (LLM tokens)

什么是令牌流式传输？

LLM生成文本时是一个令牌(token)一个令牌地生成的。stream_mode="messages"模式会在每个令牌生成时立即发送它，让你看到模型思考的实时过程。

令牌 vs 字符

令牌(token)：LLM处理的基本单位，可能是单词、子词或标点
例如：“San Francisco"可能被分成[“San”, " Francisco”]两个令牌

代码示例详解

from langchain.agents import create_agent

def get_weather(city: str) -> str:
    return f"It's always sunny in {city}!"

agent = create_agent(
    model="openai:gpt-5-nano",
    tools=[get_weather],
)

# 关键：使用stream_mode="messages"
for token, metadata in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="messages",  # ← 监听令牌级生成
):
    # token: 当前生成的令牌
    # metadata: 包含执行节点等元数据
    print(f"生成节点: {metadata['langgraph_node']}")
    
    # 检查令牌类型
    for block in token.content_blocks:
        if block['type'] == 'tool_call_chunk':
            # 这是工具调用相关的令牌
            print(f"工具调用片段: {block}")
        elif block['type'] == 'text':
            # 这是普通文本令牌
            print(f"文本令牌: {block['text']}")
        else:
            print(f"其他类型: {block}")

输出内容解析

输出展示了LLM生成过程的细节：

工具调用生成阶段：

{'type': 'tool_call_chunk', 'id': 'call_...', 'name': 'get_weather', 'args': '', 'index': 0}
{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': '{"', 'index': 0}
{'type': 'tool_call_chunk', 'id': None, 'name': None, 'args': 'city', 'index': 0}

模型正在生成JSON格式的工具调用参数

最终响应生成阶段：

{'type': 'text', 'text': 'Here'}
{'type': 'text', 'text': "'s"}
{'type': 'text', 'text': ' what'}

模型正在逐词生成最终回答

技术细节

工具调用也是流式生成的：LLM先决定调用哪个工具，然后逐步生成参数
空内容块：有些事件可能没有新的文本内容，只表示状态变化
元数据丰富：每个事件都包含langgraph_node等信息，可以知道是哪个节点生成的

实际应用场景

实时聊天界面：像ChatGPT那样一个字一个字地显示
思考过程可视化：展示模型是如何一步步构建回答的
打字机效果：创建更自然的对话体验

自定义更新 (Custom updates)

什么是自定义更新？

有时候，除了代理步骤和LLM令牌外，你还想在工具执行过程中发送自己的状态更新。比如：

“正在查询数据库，已获取10/100条记录”
“调用API中，预计剩余3秒”
“图像处理完成50%”

实现机制

通过在工具函数中调用get_stream_writer()获取一个写入器，然后调用这个写入器发送任意数据。

代码示例详解

from langchain.agents import create_agent
from langgraph.config import get_stream_writer  # 关键导入

def get_weather(city: str) -> str:
    """获取给定城市的天气。"""
    # 获取流写入器 - 这是一个可以在流式传输中发送数据的函数
    writer = get_stream_writer()
    
    # 模拟处理步骤，每一步都发送进度更新
    writer(f"开始处理请求: 查询{city}的天气")
    
    # 模拟网络请求延迟
    import time
    writer(f"连接到天气API服务器...")
    time.sleep(0.5)
    
    writer(f"发送查询请求: {city}")
    time.sleep(0.5)
    
    writer(f"接收到API响应，解析数据...")
    time.sleep(0.3)
    
    # 最终结果
    return f"{city}的天气是晴天，25°C"

agent = create_agent(
    model="anthropic:claude-sonnet-4-5",
    tools=[get_weather],
)

# 关键：使用stream_mode="custom"来接收自定义更新
print("开始流式传输，将看到自定义进度更新:")
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="custom"  # ← 只接收自定义更新
):
    # chunk就是writer()发送的字符串
    print(f"进度: {chunk}")

流式传输多种模式 (Stream multiple modes)

为什么需要多模式？

不同的应用场景需要不同粒度的信息：

用户界面可能需要令牌级流式传输来显示打字效果
后台监控可能需要代理进度来跟踪执行路径
日志系统可能需要自定义更新来记录详细步骤

如何同时监听多种模式

from langchain.agents import create_agent
from langgraph.config import get_stream_writer

def get_weather(city: str) -> str:
    writer = get_stream_writer()
    writer(f"开始查询{city}天气")
    writer(f"查询完成")
    return f"{city}天气晴"

agent = create_agent(
    model="openai:gpt-5-nano",
    tools=[get_weather],
)

# 关键：传递模式列表
modes = ["updates", "custom", "messages"]
for stream_mode, chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode=modes  # ← 监听所有三种模式
):
    # 每个事件都带有模式标签
    print(f"=== 模式: {stream_mode} ===")
    
    if stream_mode == "custom":
        # custom模式：chunk是字符串
        print(f"自定义更新: {chunk}")
        
    elif stream_mode == "updates":
        # updates模式：chunk是字典 {节点名: 状态}
        for node_name, node_state in chunk.items():
            print(f"节点 {node_name} 完成")
            # 可以进一步处理状态数据
            
    elif stream_mode == "messages":
        # messages模式：chunk是消息，第二个返回值是元数据
        # 注意：这里实际上返回的是(token, metadata)元组
        pass  # 实际处理略

禁用流式传输 (Disable streaming)

为什么需要禁用流式传输？

尽管流式传输改善了用户体验，但在某些情况下你可能需要禁用它：

多代理系统：控制哪些代理流式传输，哪些不流式传输
批量处理：后台处理大量任务时，不需要实时反馈
性能优化：流式传输有额外的开销，在性能关键场景下可能禁用
API限制：某些API提供商对流式传输有限制或额外收费

如何禁用一个模型的流式传输

from langchain_openai import ChatOpenAI

# 创建模型时指定流式传输选项
model = ChatOpenAI(
    model="gpt-4",
    streaming=False,  # ← 禁用该模型的流式传输
    temperature=0
)

# 即使代理使用stream()方法，也不会流式传输
agent = create_agent(model, tools=[...])

# 这仍然会等待完整响应后才返回
for chunk in agent.stream(...):
    print(chunk)  # 可能只收到一个最终的chunk

在代理层面控制

你还可以在不同层面对流式传输进行控制：

# 场景1：某些工具不流式传输
def fast_tool():
    # 这个工具执行很快，不需要进度更新
    return "result"

def slow_tool():
    # 这个工具执行慢，需要进度更新
    writer = get_stream_writer()
    writer("处理中...")
    return "result"

# 场景2：根据请求类型决定
def handle_request(request_type):
    if request_type == "interactive":
        # 交互式请求，需要流式传输
        return agent.stream(..., stream_mode="messages")
    else:
        # 批量请求，不需要流式传输
        return agent.invoke(...)

6. 中间件（Middleware）

概述

什么是中间件？

中间件是LangChain代理执行流程中的钩子机制，让你能在代理的每个关键步骤前后插入自定义逻辑。它就像代理的"插件系统"，可以监控、修改或控制执行流程。

为什么需要中间件？

LLM应用开发中常见的需求：

监控：记录代理的执行过程用于调试
安全：检查输入输出中的敏感信息
优化：缓存、重试、回退等
控制：限制调用次数、添加人工审核
定制：动态修改提示词、工具选择等

中间件将这些通用需求模块化，避免在每个代理中重复实现。

核心执行流程

代理的基本循环：用户输入 → LLM思考 → 工具执行 → LLM再思考 → 输出

中间件在这个循环的关键位置提供了钩子：

开始
  ↓
before_agent (代理开始前)
  ↓
before_model (模型调用前)
  ↓
模型思考 ←→ 工具调用 ←→ 工具执行
  ↓
after_model (模型调用后)  
  ↓
after_agent (代理结束后)

预置中间件详解

1. 摘要中间件 (SummarizationMiddleware)

解决的问题

当对话历史太长超出模型上下文窗口时，要么截断（丢失信息），要么让模型处理低效。摘要中间件自动总结旧对话，保留关键信息。

工作原理

from langchain.agents.middleware import SummarizationMiddleware

# 配置摘要中间件
middleware = SummarizationMiddleware(
    model="openai:gpt-4o-mini",           # 用便宜模型做摘要
    max_tokens_before_summary=4000,       # 超过4000token时触发
    messages_to_keep=20,                  # 保留最近20条原始消息
    summary_prompt="请总结以下对话...",   # 自定义提示词
)

# 使用示例
agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[middleware],
)

2. 人在回路中中间件 (HumanInTheLoopMiddleware)

解决的问题

高风险操作（转账、发邮件、删除数据）需要人工审核，避免AI误操作。

配置详解

from langchain.agents.middleware import HumanInTheLoopMiddleware

middleware = HumanInTheLoopMiddleware(
    interrupt_on={
        # 发邮件需要人工审批（可批准、编辑或拒绝）
        "send_email": {
            "allowed_decisions": ["approve", "edit", "reject"],
            "description": "将要发送邮件给客户"
        },
        # 转账需要人工审批
        "transfer_money": {
            "allowed_decisions": ["approve", "reject"],
            "description": lambda args: f"转账{args['amount']}元到{args['account']}"
        },
        # 查询操作自动批准
        "search_database": False,
    },
    description_prefix="需要人工审核的操作"
)

# 必须配合checkpointer使用
from langgraph.checkpoint.memory import InMemorySaver
agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    checkpointer=InMemorySaver(),  # 必须！
    middleware=[middleware],
)

工作流程

用户: "帮我把钱转给张三"
AI: 调用transfer_money工具
→ 中间件拦截，暂停执行
→ 发送审批请求给人工界面
→ 人工: [批准] / [编辑参数] / [拒绝]
→ 根据审批结果继续或停止

重要提醒

必须使用checkpointer：因为中断后需要恢复状态
超时处理：需要实现审批请求的超时逻辑
用户界面：需要配套的审批界面

3. 缓存中间件 (AnthropicPromptCachingMiddleware)

解决的问题

重复的系统提示词和对话前缀导致API成本浪费。Anthropic提供了提示词缓存机制，可以缓存重复的部分。

工作原理

from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import AnthropicPromptCachingMiddleware

# 长系统提示词
LONG_SYSTEM_PROMPT = """
你是一个专业的客服助手，需要遵守以下规则：
1. 始终礼貌回应
2. 验证用户身份
3. 记录所有交互
...（很长）
"""

agent = create_agent(
    model=ChatAnthropic(model="claude-3-5-sonnet"),
    system_prompt=LONG_SYSTEM_PROMPT,
    middleware=[
        AnthropicPromptCachingMiddleware(
            ttl="5m",  # 缓存5分钟
            min_messages_to_cache=5,  # 至少5条消息才开始缓存
        )
    ],
)

技术细节

仅支持Anthropic模型：其他模型使用会警告或报错
缓存粒度：按对话前缀缓存，不是完整对话
成本节约：可减少重复提示词的token消耗

4. 限制中间件 (ModelCallLimitMiddleware / ToolCallLimitMiddleware)

防止无限循环

LLM有时会陷入思维循环，反复调用工具。限制中间件防止这种情况。

from langchain.agents.middleware import ModelCallLimitMiddleware, ToolCallLimitMiddleware

middleware = [
    # 限制模型调用：每个线程最多20次，每次运行最多10次
    ModelCallLimitMiddleware(
        thread_limit=20,
        run_limit=10,
        exit_behavior="end",  # 达到限制时优雅结束
    ),
    
    # 限制特定工具调用
    ToolCallLimitMiddleware(
        tool_name="web_search",
        thread_limit=5,    # 每个线程最多5次搜索
        run_limit=3,       # 每次运行最多3次搜索
    ),
]

退出行为

"end"：返回优雅的错误消息
"error"：抛出异常，需要调用方处理

5. PII检测中间件 (PIIMiddleware)

保护用户隐私

自动检测和脱敏个人敏感信息。

from langchain.agents.middleware import PIIMiddleware

middleware = [
    # 邮箱：检测到就涂改
    PIIMiddleware(
        "email",
        strategy="redact",  # 替换为[REDACTED_EMAIL]
        apply_to_input=True,    # 检查用户输入
        apply_to_output=False,  # 不检查AI输出
    ),
    
    # 信用卡：部分掩盖
    PIIMiddleware(
        "credit_card",
        strategy="mask",  # 显示后4位：****-****-****-1234
        apply_to_input=True,
        apply_to_output=True,  # 也检查AI输出
    ),
    
    # 自定义正则检测API密钥
    PIIMiddleware(
        "api_key",
        detector=r"sk-[a-zA-Z0-9]{32}",  # OpenAI格式
        strategy="block",  # 检测到就抛异常
    ),
]

脱敏策略

"block"：直接拒绝，抛出异常
"redact"：完全替换为标记
"mask"：部分隐藏（如手机号：138****1234）
"hash"：替换为哈希值（可逆检测）

6. 规划中间件 (TodoListMiddleware)

复杂任务分解

让AI自动将复杂任务分解为待办事项列表。

from langchain.agents.middleware import TodoListMiddleware

agent = create_agent(
    model="openai:gpt-4o",
    tools=[...],
    middleware=[TodoListMiddleware()],
)

# 使用示例
result = agent.invoke({"messages": "帮我重构代码库"})
print(result["todos"])  # 查看生成的待办事项

内置功能

自动添加write_todos工具：AI可以用这个工具创建待办事项
状态跟踪：待办事项可以有状态（待办、进行中、完成）
优先级管理：AI可以标记优先级

7. 工具选择器中间件 (LLMToolSelectorMiddleware)

解决工具太多问题

当代理有几十个工具时，每次把全部工具描述传给LLM既浪费token又降低准确性。

from langchain.agents.middleware import LLMToolSelectorMiddleware

middleware = LLMToolSelectorMiddleware(
    model="openai:gpt-4o-mini",  # 用便宜模型选择工具
    max_tools=5,                 # 最多选择5个相关工具
    always_include=["search"],   # 搜索工具始终包含
)

# 工具筛选过程
# 原始：50个工具描述 → LLM思考（慢、贵、不准确）
# 筛选后：5个相关工具描述 → LLM思考（快、便宜、准确）

8. 工具重试中间件 (ToolRetryMiddleware)

处理临时故障

外部API可能因网络、限流等原因暂时失败，重试中间件自动处理。

from langchain.agents.middleware import ToolRetryMiddleware

middleware = ToolRetryMiddleware(
    max_retries=3,          # 最多重试3次
    backoff_factor=2.0,     # 指数退避：1s, 2s, 4s
    initial_delay=1.0,      # 首次重试等待1秒
    jitter=True,            # 添加随机抖动避免惊群
    on_failure="return_message",  # 失败时让LLM知道
)

# 重试逻辑
# 第一次调用 → 失败 → 等待1秒 → 第二次 → 失败 → 等待2秒 → 第三次

9. LLM工具模拟器 (LLMToolEmulator)

开发测试工具

在真实工具实现前，用LLM模拟工具行为。

from langchain.agents.middleware import LLMToolEmulator

# 模拟所有工具
middleware = LLMToolEmulator(
    model="anthropic:claude-3-5-sonnet",  # 模拟用的模型
    # tools=None  # 默认模拟所有工具
)

# 只模拟特定工具  
# middleware = LLMToolEmulator(tools=["get_weather", "search"])

工作方式

真实：用户问天气 → AI调用get_weather("北京") → 真实API返回"北京25°C"
模拟：用户问天气 → AI调用get_weather("北京") → LLM模拟返回"北京应该是20-25°C"

自定义中间件开发

基于装饰器的中间件（简单场景）

基本语法

from langchain.agents.middleware import before_model, after_model, wrap_model_call
from langgraph.runtime import Runtime
from typing import Any

# 1. 模型调用前执行（日志记录）
@before_model
def log_input(state: dict, runtime: Runtime) -> dict | None:
    print(f"模型输入消息数: {len(state['messages'])}")
    return None  # 不修改状态

# 2. 模型调用后执行（验证输出）
@after_model(can_jump_to=["end"])
def validate_output(state: dict, runtime: Runtime) -> dict | None:
    last_msg = state["messages"][-1]
    if "密码" in last_msg.content:
        return {
            "messages": [{"role": "assistant", "content": "不能提供密码信息"}],
            "jump_to": "end"  # 跳转到结束节点
        }
    return None

# 3. 包装模型调用（重试逻辑）
@wrap_model_call
def retry_on_error(request, handler):
    for i in range(3):
        try:
            return handler(request)
        except Exception as e:
            if i == 2:
                raise
            print(f"重试 {i+1}/3: {e}")

可用装饰器

装饰器	执行时机	用途
`@before_agent`	代理开始前	初始化、权限检查
`@before_model`	每次模型调用前	日志、输入验证
`@after_model`	每次模型调用后	输出验证、状态更新
`@after_agent`	代理结束后	清理、统计
`@wrap_model_call`	模型调用周围	重试、缓存、替换模型
`@wrap_tool_call`	工具调用周围	重试、模拟、监控
`@dynamic_prompt`	生成提示词时	动态系统提示词

7.守卫

概述

守卫是保护AI应用安全的检查和过滤机制，在代理执行的关键节点验证和过滤内容，防止安全风险、数据泄露和不当行为。

核心目的：构建安全、合规的AI应用

常见问题防护：

防止个人敏感信息(PII)泄露
检测和阻止提示注入攻击
过滤不当或有害内容
强制执行业务规则和合规要求
验证输出质量和准确性

守卫的两种实现方法

1. 确定性守卫

基于规则的逻辑判断：

技术：正则表达式、关键词匹配、格式检查
优点：速度快、结果可预测、成本低
缺点：可能漏判复杂或变形的违规内容
适用场景：格式检查、敏感词过滤、基础合规

2. 基于模型的守卫

使用LLM或分类器进行语义理解：

技术：LLM内容分析、分类模型
优点：能识别复杂微妙的违规，理解上下文
缺点：速度慢、成本高、可能有误判
适用场景：内容安全评估、意图识别、复杂合规检查

内置守卫

PII检测守卫

功能概述

自动检测和处理对话中的个人身份信息：

检测类型：邮箱、信用卡、IP地址、电话号码等
处理策略：涂改、掩码、哈希、阻止
检查位置：用户输入、AI输出、工具结果

代码示例

from langchain.agents.middleware import PIIMiddleware

# 创建代理时添加PII守卫
agent = create_agent(
    model="gpt-4",
    tools=[...],
    middleware=[
        # 邮箱：检测到就涂改
        PIIMiddleware(
            "email",
            strategy="redact",        # 替换为[REDACTED_EMAIL]
            apply_to_input=True,      # 检查用户输入
            apply_to_output=False,    # 不检查AI输出
        ),
        
        # 信用卡：部分掩码显示
        PIIMiddleware(
            "credit_card", 
            strategy="mask",          # 显示后4位：****-****-****-1234
            apply_to_input=True,
            apply_to_output=True,     # 也检查AI输出
        ),
        
        # API密钥：检测到就阻止执行
        PIIMiddleware(
            "api_key",
            detector=r"sk-[a-zA-Z0-9]{32}",  # 自定义正则
            strategy="block",          # 直接抛异常
            apply_to_input=True,
        ),
    ]
)

# 使用效果
# 用户输入："我的邮箱是test@example.com"
# 实际处理："我的邮箱是[REDACTED_EMAIL]"

四种处理策略

策略	描述	适用场景
`redact`	完全替换为标记	严格脱敏，不保留任何信息
`mask`	部分隐藏，保留关键部分	验证场景，需要部分可见
`hash`	替换为哈希值	去标识化分析
`block`	直接阻止并报错	高风险信息，零容忍

人工审核守卫

功能概述

高风险操作前暂停执行，等待人工审批：

适用操作：金融交易、数据删除、外发邮件
审批类型：批准、编辑参数、拒绝
状态管理：需要配合checkpointer保存中断状态

代码示例

from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver

agent = create_agent(
    model="gpt-4",
    tools=[send_email, transfer_money, search],
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                # 发邮件需要人工审批（可批准、编辑、拒绝）
                "send_email": {
                    "allowed_decisions": ["approve", "edit", "reject"],
                    "description": "发送邮件给用户"
                },
                # 转账只能批准或拒绝
                "transfer_money": {
                    "allowed_decisions": ["approve", "reject"],
                    "description": lambda args: f"转账{args['amount']}元"
                },
                # 查询操作自动放行
                "search": False,
            }
        )
    ],
    checkpointer=InMemorySaver(),  # 必须！用于保存中断状态
)

# 使用流程
config = {"configurable": {"thread_id": "user123"}}

# 第一次调用：触发审批
result = agent.invoke(
    {"messages": "给客户发确认邮件"},
    config=config
)
# → 代理暂停，等待审批

# 第二次调用：传递审批结果
result = agent.invoke(
    Command(resume={"decisions": [{"type": "approve"}]}),
    config=config  # 相同thread_id恢复对话
)

重要特性

必须配合checkpointer：中断后需要恢复执行状态
线程隔离：不同用户的审批独立
超时处理：需要应用层实现超时逻辑
审批界面：需要配套的用户界面

自定义守卫

执行前守卫

在代理开始执行前进行验证：

基于类的实现

from langchain.agents.middleware import AgentMiddleware
from langgraph.runtime import Runtime
from typing import Any

class InputValidationMiddleware(AgentMiddleware):
    """输入验证守卫"""
    
    def __init__(self, banned_topics=None):
        super().__init__()
        self.banned_topics = banned_topics or ["暴力", "色情", "诈骗"]
    
    def before_agent(self, state: dict, runtime: Runtime) -> dict | None:
        """代理开始前检查"""
        user_input = state["messages"][-1].content
        
        # 检查违禁话题
        for topic in self.banned_topics:
            if topic in user_input:
                return {
                    "messages": [{
                        "role": "assistant",
                        "content": f"抱歉，不能讨论'{topic}'相关话题"
                    }],
                    "jump_to": "end"  # 直接结束
                }
        
        # 检查输入长度
        if len(user_input) > 1000:
            return {
                "messages": [{
                    "role": "assistant", 
                    "content": "输入过长，请精简问题"
                }],
                "jump_to": "end"
            }
        
        return None  # 验证通过

基于装饰器的实现

from langchain.agents.middleware import before_agent
from langgraph.runtime import Runtime

@before_agent
def rate_limit_guard(state: dict, runtime: Runtime) -> dict | None:
    """速率限制守卫"""
    user_id = runtime.context.get("user_id", "anonymous")
    
    # 检查调用频率
    import time
    last_call = get_last_call_time(user_id)
    if time.time() - last_call < 1:  # 1秒内只能调用1次
        return {
            "messages": [{
                "role": "assistant",
                "content": "调用过于频繁，请稍后再试"
            }],
            "jump_to": "end"
        }
    
    # 更新调用时间
    update_call_time(user_id)
    return None

执行后守卫

在代理完成执行后验证输出：

from langchain.agents.middleware import after_agent
from langgraph.runtime import Runtime

@after_agent
def output_safety_check(state: dict, runtime: Runtime) -> dict | None:
    """输出安全检查"""
    last_message = state["messages"][-1]
    ai_response = last_message.content
    
    # 检查有害内容
    if contains_harmful_content(ai_response):
        return {
            "messages": [{
                "role": "assistant",
                "content": "抱歉，我无法提供这个回答"
            }]
        }
    
    # 检查事实准确性（可调用事实核查API）
    if needs_fact_checking(ai_response):
        is_accurate = fact_check_api(ai_response)
        if not is_accurate:
            return {
                "messages": [{
                    "role": "assistant", 
                    "content": "信息可能需要进一步核实"
                }]
            }
    
    return None

组合多个守卫

分层防御策略

from langchain.agents import create_agent

agent = create_agent(
    model="gpt-4",
    tools=[...],
    middleware=[
        # 第1层：输入验证（执行前）
        InputValidationMiddleware(banned_topics=["违法内容"]),
        
        # 第2层：PII保护（执行前后）
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        PIIMiddleware("email", strategy="redact", apply_to_output=True),
        
        # 第3层：敏感操作审批（执行中）
        HumanInTheLoopMiddleware(interrupt_on={
            "send_email": True,
            "transfer_money": True
        }),
        
        # 第4层：输出安全（执行后）
        OutputSafetyMiddleware(),
        
        # 第5层：基于模型的内容审核（执行后）
        LLMContentReviewMiddleware(
            model="gpt-4-mini",  # 用便宜模型审核
            check_topics=["安全性", "合规性", "事实准确性"]
        )
    ],
    checkpointer=InMemorySaver(),
)

守卫执行顺序

用户请求
  ↓
第1层：执行前守卫（认证、限流、基础过滤）
  ↓
第2层：输入处理守卫（PII检测、格式验证）
  ↓
代理执行过程
  ↓
第3层：执行中守卫（人工审批、工具调用监控）
  ↓  
第4层：输出处理守卫（PII再检测、格式整理）
  ↓
第5层：执行后守卫（内容安全、质量验证）
  ↓
返回给用户