AI Agent开发实战㉘｜Agent部署与运维：从开发到生产的完整路径

最新推荐文章于 2026-06-22 20:56:42 发布

原创

最新推荐文章于 2026-06-22 20:56:42 发布 · 201 阅读

标签

#人工智能 #运维

AI Agent开发实战㉘｜Agent部署与运维：从开发到生产的完整路径

Agent本地跑得好好的，一上线就出问题——内存泄漏、并发崩溃、LLM API超时。部署不是简单的docker run，需要系统化的运维策略。

一、部署前的准备

1.1 环境检查清单

## 硬件资源
- [ ] CPU: ≥4核
- [ ] 内存: ≥8GB（推荐16GB）
- [ ] 存储: ≥50GB SSD
- [ ] GPU: 如需本地模型（可选）

## 软件环境
- [ ] Python 3.10+
- [ ] Docker 24.0+
- [ ] Redis 7.0+
- [ ] 向量数据库（Milvus/Qdrant）

## 依赖服务
- [ ] LLM API（OpenAI/其他）
- [ ] 监控系统（Prometheus/Grafana）
- [ ] 日志系统（ELK/Loki）
- [ ] 配置中心（可选）

## 安全配置
- [ ] API Key加密存储
- [ ] HTTPS证书
- [ ] 网络隔离（生产环境）
- [ ] 访问控制

1.2 配置管理

# config.py
from pydantic_settings import BaseSettings
from typing import Optional

class Settings(BaseSettings):
    # 环境
    env: str = "development"  # development/staging/production
    
    # LLM配置
    openai_api_key: str
    openai_model: str = "gpt-4"
    openai_temperature: float = 0.7
    openai_max_tokens: int = 2000
    
    # 向量库配置
    vector_db_type: str = "milvus"
    vector_db_host: str = "localhost"
    vector_db_port: int = 19530
    
    # Redis配置
    redis_host: str = "localhost"
    redis_port: int = 6379
    redis_db: int = 0
    
    # 应用配置
    app_name: str = "Agent Service"
    debug: bool = False
    workers: int = 4
    max_concurrent_requests: int = 100
    request_timeout: int = 60
    
    # 监控配置
    prometheus_port: int = 9090
    log_level: str = "INFO"
    
    class Config:
        env_file = f".env.{
     
     env}"
        env_file_encoding = "utf-8"

# 使用
settings = Settings()

二、容器化部署

2.1 Dockerfile

# Dockerfile
FROM python:3.11-slim as builder

WORKDIR /app

# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# 生产镜像
FROM python:3.11-slim

WORKDIR /app

# 复制依赖
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

# 复制应用
COPY app ./app
COPY config ./config

# 非root用户
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# 启动
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

2.2 Docker Compose

# docker-compose.yml
version: '3.8'

services:
  agent-api:
    build: .
    image: agent-api:latest
    ports:
      - "8000:8000"
      - "9090:9090"
    environment:
      - ENV=production
      - OPENAI_API_KEY=${
   
   OPENAI_API_KEY}
      - VECTOR_DB_HOST=milvus
      - REDIS_HOST=redis
    depends_on:
      - redis
      - milvus
    networks:
      - agent-network
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    restart: unless-stopped
  
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes
    networks:
      - agent-network
  
  milvus:
    image: milvusdb/milvus:v2.3.0