YOLOV10目标识别微调

原创已于 2026-05-11 21:13:53 修改 · 391 阅读

10 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#YOLO #目标检测

于 2026-05-10 23:16:10 首次发布

AI 专栏收录该内容

2 篇文章

订阅专栏

YOLOv10 目标检测模型微调实战：从数据采集到实时识别

本文记录了使用 YOLOv10 对自定义物体（剑、眼镜等）进行目标检测模型微调的全流程，涵盖环境搭建、数据采集、在线标注、模型训练与实时推理。

1. 项目概述

本次实战目标是对 YOLOv10 Nano 模型进行微调，使其能够识别特定物品（剑、眼镜、手机、螺丝刀、人、电视）。整体流程如下：

数据采集 (OpenCV webcam) → 在线标注 (Roboflow) → 模型微调 (YOLOv10n) → 实时推理 (Webcam)

环境与依赖：

依赖	版本/说明
Python	3.9+
PyTorch	CUDA 12.9 (cu129)
YOLOv10	THU-MIG/yolov10
标注工具	Roboflow (在线标注)
基础模型	YOLOv10-Nano (2.3M 参数)

操作系统环境Win11，安装了CUDA13.0和CUDNN9.13.0，GPU为NVIDIA RTX 5060

2. 环境搭建

2.1 安装基础依赖

# 使用 uv 安装核心依赖
uv pip install supervision labelme labelme2yolo huggingface_hub google_cloud_audit_log

# 安装 PyTorch (CUDA 12.9)
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129

2.2 安装 YOLOv10

# 从 GitHub 安装 YOLOv10
pip install git+https://github.com/THU-MIG/yolov10.git

YOLOv10 是清华大学 MIG 团队开源的实时端到端目标检测模型，在 NeurIPS 2024 发表。其主要特点：

NMS-Free 端到端：无需非极大值抑制后处理，降低推理延迟
高效率：YOLOv10-Nano 仅 2.3M 参数，6.7G FLOPs
多尺度支持：提供 N/S/M/B/L/X 六种规模的模型

3. 数据采集

3.1 摄像头图像采集脚本

使用 OpenCV 编写 gen_imgs.py，通过摄像头实时采集图像：

import cv2
import os

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

if not cap.isOpened():
    print("can't open the webcam")

output_dir = 'output_images'
os.makedirs(output_dir, exist_ok=True)
img_counter = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Webcam', frame)
    k = cv2.waitKey(1)
    if k % 256 == 27:  # ESC 退出
        print("Escape hit, closing...")
        break
    elif k % 256 == ord('s'):  # 按 S 保存帧
        img_name = os.path.join(output_dir, "opencv_frame_{}.png".format(img_counter))
        cv2.imwrite(img_name, frame)
        print("{} 保存".format(img_name))
        img_counter += 1

cap.release()
cv2.destroyAllWindows()

操作说明：

按 S 键保存当前帧
按 ESC 键退出
分辨率设置为 1280×720

本次实战共采集了 12 张图像（frame_0 ~ frame_11），包含剑、眼镜、手机等物品。

4. 数据标注

4.1 使用 Roboflow 进行在线标注

将采集的图像上传至 Roboflow 进行在线标注：

创建项目 yolo10-os5sm
上传采集的图像
标注物体
定义 6 个类别：cellphone、glasses、human、screwdriver、sword、tv

4.2 导出 YOLO 格式数据集

Roboflow 支持直接导出 YOLO 格式，数据集配置如下（yolo10-test/data.yaml）：

train: C:\Users\KevinGlaser\Documents\code_s\yolo10\yolo10-test\train
val: C:\Users\KevinGlaser\Documents\code_s\yolo10\yolo10-test\train
test: C:\Users\KevinGlaser\Documents\code_s\yolo10\yolo10-test\train

nc: 6
names: ['cellphone', 'glasses', 'human', 'screwdriver', 'sword', 'tv']

roboflow:
  workspace: yolo10-tvuwx
  project: yolo10-os5sm
  version: 1
  license: CC BY 4.0
  url: https://universe.roboflow.com/yolo10-tvuwx/yolo10-os5sm/dataset/1

4.3 数据集结构

yolo10-test/
├── train/
│   ├── images/
│   │   ├── opencv_frame_0_png.rf.55800e8418cf5f6bc10cd76d946653b9.jpg
│   │   ├── opencv_frame_0_png.rf.74391b663b97139eee4186d70db8f92e.jpg
│   │   └── ... (共 36 张图像)
│   └── labels/
│       ├── opencv_frame_0_png.rf.55800e8418cf5f6bc10cd76d946653b9.txt
│       └── ... (共 36 个标注文件)
├── data.yaml
└── README.roboflow.txt

4.4 标注格式说明

Roboflow 导出的 YOLO 格式标注文件，每行代表一个目标：

<class_id> <x_center> <y_center> <width> <height>

示例（标注了剑、眼镜等 4 个目标）：

2 0.45390625 0.60546875 0.1890625 0.4359375
5 0.678125 0.52890625 0.30703125 0.31640625
4 0.6875 0.74296875 0.353125 0.06640625
3 0.93046875 0.7109375 0.1171875 0.5609375

4.5 数据预处理

Roboflow 对数据集进行了以下预处理：

自动方向校正：去除 EXIF 方向信息
Resize：拉伸至 640×640
数据增强：每张源图像生成 3 个增强版本（亮度随机调整 ±25%）

最终得到 36 张标注图像。

5. 模型训练

5.1 训练命令

uv run yolo detect train data=yolo10-test/data.yaml model=yolov10n.pt epochs=30 batch=8 imgsz=640 device=0

训练参数说明：

参数	值	说明
`data`	`yolo10-test/data.yaml`	数据集配置文件路径
`model`	`yolov10n.pt`	预训练 Nano 模型权重
`epochs`	30	训练轮数
`batch`	8	批量大小
`imgsz`	640	输入图像尺寸
`device`	0	使用 GPU 0

5.2 YOLOv10-Nano 模型规格

指标	值
参数量	2.3M
FLOPs	6.7G
COCO AP	38.5%
推理延迟	1.84ms

选择 Nano 模型的优势在于：

体积小，适合边缘部署
推理速度快，实时性好
足以应对简单场景的微调需求

5.3 训练过程

训练完成后，模型权重保存为 best.pt，可直接用于推理。

6. 实时推理

6.1 推理脚本

使用 yolov10-detect.py 调用 Webcam 进行实时目标检测：

import cv2
import supervision as sv
from ultralytics import YOLOv10

# 加载训练好的模型
model = YOLOv10(f'best.pt')

# 初始化标注器
bounding_box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

# 打开摄像头
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

if not cap.isOpened():
    print("can't open the webcam")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # 模型推理
    results = model(frame)[0]
    detections = sv.Detections.from_ultralytics(results)

    # 可视化标注
    annotated_image = bounding_box_annotator.annotate(scene=frame, detections=detections)
    annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)

    cv2.imshow('Webcam', annotated_image)
    k = cv2.waitKey(1)
    if k % 256 == 27:  # ESC 退出
        print("Escape hit, closing...")
        break

cap.release()
cv2.destroyAllWindows()

6.2 可视化效果

使用 supervision 库进行检测框和标签的可视化，实时显示检测结果。按 ESC 键退出。

7. 实验结果

经过 30 轮训练后，模型成功实现了对以下类别的实时检测：

类别	检测效果
剑 (sword)	✅ 成功识别
眼镜 (glasses)	未识别
手机 (cellphone)	✅ 可识别
螺丝刀 (screwdriver)	✅ 可识别
人 (human)	✅ 可识别
电视 (tv)	✅ 可识别

推理速度保持实时，Webcam 帧率流畅。

8. 完整流程总结

 YOLOv10 微调流程 
 1. 环境搭建 
 uv pip install + pip install yolov10
 2. 数据采集
 gen_imgs.py (OpenCV Webcam, 按 S 保存)
 3. 数据标注 
 Roboflow 在线标注 (6 类别)
 4. 数据导出
 YOLO 格式 → yolo10-test/train/
 5. 模型训练
 yolov10n.pt + 30 epochs + batch=8
 6. 实时推理 
 best.pt + supervision 可视化