PASCAL VOC 2012数据集介绍

最新推荐文章于 2026-05-05 10:06:16 发布

原创最新推荐文章于 2026-05-05 10:06:16 发布 · 1.2w 阅读

43 ·

本内容遵循CC 4.0 BY-SA版权协议

数据集专栏收录该内容

9 篇文章

订阅专栏

本文详细介绍了VOC2012数据集的结构与内容，包括原图与标签图的特性，各类别分布及数据集的划分方式。VOC2012包含17125张原图和2913张标签图，适用于语义分割任务，分为官方与增强数据集两部分。

数据集下载在百度云盘：链接：https://pan.baidu.com/s/1FTjY-ISsDMu0vIypAQyDpg 提取码：fyxt

云盘里面有3个文件夹：VOC2012, VOC2012_test，SBD.tgz（表示SBD数据集，关于SBD数据集参考https://blog.csdn.net/zz2230633069/article/details/89335205）

补充介绍在http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html（官方）

和https://blog.csdn.net/u013832707/article/details/80060327

在VOC2012文件夹解压之后，语义分割方面的所关系的文件夹是

JPEGImage文件夹（包含了分割所要用的原图片），SegmentationClass文件夹（里面包含了分割所要用的标签图），SegmentationClass_aug文件夹（里面包含了分割所要用的标签图,融合了SBD数据集的扩充集），ImageSets文件夹下的Segmentation文件夹（里面包含了所需图片的图片名字的集合TXT文件）

JPEGImage文件夹：包含了所有的原图片总共17125张且shape=h x w x 3，mode=RGB，format=JPEG，大小不一致,像素范围是0~255.

SegmentationClass文件夹：包含了语义分割的所有标签图2913张，是处理前的标签图，shape=h x w x 3 ， mode=P ， format=PNG ，大小不一致，像素值就是下面给的彩色的RGB相对应的像素值，但是里面有其他的值比如有的边缘像素值是224x224x192.。

SegmentationClass_aug文件夹：包含了所有的语义标签图，处理过后的标签图，是灰度图，总共12031张，shape=h x w，mode=L，像素值范围就是标签值（从0～20共21类，背景是0）,处理过程很简单，初始化一张全0的图，如果该位置的像素点是物体对应的RGB值，那么该位置就为该类的标签值。

ImageSets/Segmentation/train.txt：总共有1464行也就是1464张训练图片的名字

ImageSets/Segmentation/val.txt：总共有1449行也就是1449张验证图片的名字

ImageSets/Segmentation/trainval.txt：总共有2913行也就是2913张训练验证图片，上面两个的并集

ImageSets/Segmentation/train_aug.txt = voc_trian + sbd_train - 重复的图片

总共有8829行也就是8829张训练验证图片

ImageSets/Segmentation/train_aug_val.txt = voc_val - sbd_train（就是剔除掉已经是trian_aug里面的图片）

总共有904行也就是904张训练验证图片

ImageSets/Segmentation/val_aug.txt = voc_val + sbd_val - 重复的图片 - train_aug

总共有3202行也就是3202张训练验证图片

所以：采用官方数据集就是train.txt和val.txt，采用增强数据集就是train_aug.txt和val_aug.txt。原图全部直接来自JPEGImage，标签图全部来自SegmentationClass_aug

总共20类如下：

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

分布如下：

Below are training examples for the segmentation taster, each consisting of:

下面是类别与颜色的对应关系：一张标签图片总共有22种数字（0-20,255）其中0和255的颜色都是黑色RGB=（0,0,0），所以语义图总共有21种颜色，20个类别+黑色

the training image
the object segmentation
pixel indices correspond to the first, second, third object etc.
the class segmentation
pixel indices correspond to classes in alphabetical order (0=background, 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle, 6=bus, 7=car , 8=cat, 9=chair, 10=cow, 11=diningtable, 12=dog, 13=horse, 14=motorbike, 15=person, 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor, 255='void' or unlabelled)
For both types of segmentation image, index 0 corresponds to background and index 255 corresponds to 'void' or unlabelled.