python3.5+Tensorflow+Faster R-CNN在ubuntu下训练数据，进行表面缺陷检测(一)

最新推荐文章于 2026-03-23 11:08:39 发布

转载最新推荐文章于 2026-03-23 11:08:39 发布 · 364 阅读

3 ·

本内容遵循CC 4.0 BY-SA版权协议

原文链接：https://blog.csdn.net/ganwenbo2011/article/details/89713124

Python 专栏收录该内容

97 篇文章

订阅专栏

本文详述了使用Ubuntu18.04与TensorFlow环境下，如何从510张原始图像通过数据增强，包括水平翻转、平移、添加高斯噪声，扩充至2K+张图像的过程。并介绍了如何利用labelImg进行图像标注，以及如何修改XML文件以适配数据增强后的图像。

环境：ubuntu18.04+python3.5(我用的anaconda3)+Tensorflow+gtx1060+cuda9.0+cudnn7.3

至于环境的搭建，就不赘述，网上很多教程。环境的搭建也是够坑的，我重装了n次。主要是ubuntu对N卡的支持问题，导致开机卡在登陆界面。如果出现开机卡紫屏，参考我另外一篇博文安装显卡驱动Ubuntu18 开机卡紫屏，n卡驱动在线安装

我做的是铆钉的缺陷检测，数据集是自己手动标定的，源图像510张（一张张标定累死我了！），因为考虑到数据量偏少，然后进行了数据增强，分别了进行水平翻转，平移，添加高斯噪声，扩充到2K+张。
一、制作VOC2007数据集（前期准备与平台无关，可在windows下完成）

0.将原图像复制保存，用于增强（如果数据集够大，就不必数据增强）。（单张图片最好不要太大，适当resize缩小。我的就太大了，每张1MB以上）

我做了3个变换，分别为翻转、平移、添加噪声操作，所以复制了3次。从510张扩充到2040张。

1.图像命名为6位数字，000000.jpg-002039.jpg（普遍都是这么命名的，我们也没必要标新立异）。注意图像格式，别傻傻的把.png格式直接重命名为.jpg哦，需要转换格式才行。转格式简单，直接用cv2.imread()，再cv2.imwrite()保存为.jpg格式就好。

    def imgRename(path):
        i=0
        """图像重命名"""
        imgList=[os.path.join(path, f) for f in os.listdir(path) if f.endswith('.jpg')]
        for i in range(len(imgList)):
            print(i)
            name=os.path.splitext(imgList[i])[0];

            newname=os.path.join(path,str('%06d' % i)+'.jpg')
            # print(newname)
            os.rename(imgList[i], newname)

1.创建文件夹，分别保存510张图像。如resource保存0-509，imgflip保存510-1019，Image_shift保存1020-1529，gasuss_noise保存1530-2039。

2.对resource文件夹下原图像进行数据标注。（增强后图像的标记通过程序进行获得。如果纯手工标定2k+张，会累死的。）,我是在window下完成数据标定的，图像标注工具labelImg安装方法。

3.创建resource-Annotations文件夹，标注保存，每张图片会生成对应的xml文件。

    <annotation>
       <folder>resource-Annotations</folder>
       <filename>000021.jpg</filename>
       <path>L:\DataSet\20190311\resource-Annotations\000021.jpg</path>
       <source>
           <database>Unknown</database>
       </source>
       <size>
           <width>2550</width>
           <height>2488</height>
           <depth>3</depth>
       </size>
       <segmented>0</segmented>
       <object>
           <name>cashang</name>
           <pose>Unspecified</pose>
           <truncated>0</truncated>
           <difficult>0</difficult>
           <bndbox>
               <xmin>608</xmin>
               <ymin>1963</ymin>
               <xmax>883</xmax>
               <ymax>2209</ymax>
           </bndbox>
       </object>
       <object>
           <name>cashang</name>
           <pose>Unspecified</pose>
           <truncated>0</truncated>
           <difficult>0</difficult>
           <bndbox>
               <xmin>895</xmin>
               <ymin>2122</ymin>
               <xmax>1111</xmax>
               <ymax>2296</ymax>
           </bndbox>
       </object>
    </annotation>

3.生成的xml文件需要修改几个标签。主要是文件路径的问题，后面训练时候要从这个路径下读图像。

    from xml.etree import ElementTree as ET
    from xml.etree.ElementTree import Element,ElementTree

    def processXml(path):
        tree = ET.parse(path)
        root = tree.getroot()
        childs = root.getchildren()
        childs[0].text='VOC2007'
        childs[1].text='./VOC2007/JPEGImages/000000.jpg'
        childs3=childs[3].getchildren()
        childs3[0].text='pascalvoc'

        tree.write(path, 'UTF-8')

修改后：

<folder>VOC2007</folder>
    <filename>000021.jpg</filename>
    <path>./VOC2007/JPEGImages/000021.jpg</path>
    <source>
        <database>pascalvoc</database>
    </source>

. . . . . .
2.数据增强

0.图像水平翻转

    def imgflip(imgpath):
        """
        水平镜像
        """
        im = cv2.imread(imgpath)
        dst= cv2.flip(img,1,dst=None) #水平镜像
        #newpath=imgpath[:-10]+str('%06d' % int(int(imgpath[-10:-4])))+'.jpg'
        #newpath=r'./imgflip/'+imgpath[-10:]    #自己的保存路径
        cv2.imwrite(imgpath, dst)

1.图像平移

    def Image_shift(imgpath):
        """
        平移变换
        :param imgpath:
        :return:
        """
        img = cv2.imread(imgpath)
        rows= img.shape[0]
        cols= img.shape[1]
        dw=100   #往右平移100
        dh=200 #往下平移200
        M = np.float32([[1, 0, -dw], [0, 1, -dh]])
        dst = cv2.warpAffine(img, M, (cols, rows))

         for i in range(0,dh):
            for j in range(0,cols):
                    dst[i][j][:]=dst[dh][j][:]
        for i in range(0, rows):
            for j in range(0, dw):
                    dst[i][j][:] = dst[i][dw][:]
        #newpath = r'./Image_shift/'+imgpath[-10:]
        cv2.imwrite(imgpath,dst)

2.图像添加高斯噪声

    import skimage
    import cv2

    def addNoise(imgpath):
        """
        添加噪声
        :param imgpath:
        :return:
        """
        img = cv2.imread(imgpath)
        img_=skimage.util.random_noise(img,mode='gaussian',seed =int(imgpath[-10:-4]),mean =0.005)
        # cv2.namedWindow("salt", 0)
        # cv2.imshow("salt", img2)
        # cv2.waitKey(0)
        dst = cv2.normalize(img_, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        out = cv2.cvtColor(dst, cv2.COLOR_RGB2GRAY)
        cv2.imwrite(imgpath, out)

3.给增强后的图像添加标注，复制标注原图像生成的xml文件，再做相应的修改。

（1）创建文件夹，复制resource-Annotations文件夹下xml文件，分别保存到其他3个文件夹

（2）修改gasuss_noise-Annotations文件夹下xml文件

因为图形位置没做变换，只是添加了噪声，只用修改xml文件夹下的 <filename>和<path>标签。

（3）修改imgflip-Annotations文件夹下xml文件

图形做了水平镜像，所以要修改xml文件夹下的 <filename>、<path>、<xmin>和<xmax>标签。

    from xml.etree import ElementTree as ET

    def processXml(xmlpath):
        tree = ET.parse(xmlpath)
        root = tree.getroot()
        childs = root.getchildren()
        childs[1].text = xmlpath[-10:-4]+'.jpg'
        childs[2].text = './VOC2007/JPEGImages/'+childs[1].text
        for i in range(6,len(childs)):
            if childs[i].tag == 'object':
                child = childs[i].getchildren()
                child2 = child[4].getchildren()
                wid=int(child2[2].text)-int(child2[0].text)
                x0=int(child2[0].text)+wid
                x2=int(child2[2].text)-wid
                if x0<1275:   #1275为图像宽的一半
                    a=x0+(1275-x0)*2
                else:
                    a=x0-(x0-1275)*2
                if x2 < 1275:
                    b = x2 + (1275 - x2) * 2
                else:
                    b = x2 - (x2 - 1275) * 2
                child2[0].text=str(a)
                child2[2].text =str(b)
                tree.write(xmlpath,'UTF-8')

（4）修改Image_shift-Annotations文件夹下xml文件

图形做了平移，所以要修改xml文件夹下的 <filename>、<path>、<xmin>、<ymin>、<xmax>、<ymax>标签。

    from xml.etree import ElementTree as ET

    def processXml(xmlpath):
        tree = ET.parse(xmlpath)
        root = tree.getroot()
        childs = root.getchildren()
        childs[1].text = xmlpath[-10:-4]+'.jpg'
        childs[2].text = './VOC2007/JPEGImages/'+childs[1].text
        for i in range(6,len(childs)):
            if childs[i].tag == 'object':
                child = childs[i].getchildren()
                child2 = child[4].getchildren()
                child2[0].text=str(int(child2[0].text)-50)
                child2[1].text=str(int(child2[1].text)-100)
                child2[2].text=str(int(child2[2].text)-50)
                child2[3].text=str(int(child2[3].text)-100)
                for j in range(4): #检查是否超过了图像边界
                    if int(child2[j].text)<0 :
                        child2[j].text='0'
                tree.write(xmlpath,'UTF-8')

最后将所有图像放在同一文件夹JPEGImages下，所有xml文件放在同一文件夹Annotations下，后面待用。

我们随便打开一个xml文件，查看里面的内容

    <annotation>
       <folder>VOC2007</folder>
       <filename>000710.jpg</filename>
       <path>./VOC2007/JPEGImages/000710.jpg</path>
       <source>
           <database>pascalvoc</database>
       </source>
       <size>
           <width>2550</width>
           <height>2488</height>
           <depth>3</depth>
       </size>
       <segmented>0</segmented>
       <object>
           <name>huaheng</name>
           <pose>Unspecified</pose>
           <truncated>0</truncated>
           <difficult>0</difficult>
           <bndbox>
               <xmin>904</xmin>
               <ymin>1328</ymin>
               <xmax>1027</xmax>
               <ymax>1520</ymax>
           </bndbox>
       </object>
       <object>
           <name>huaheng</name>
           <pose>Unspecified</pose>
           <truncated>0</truncated>
           <difficult>0</difficult>
           <bndbox>
               <xmin>1897</xmin>
               <ymin>1369</ymin>
               <xmax>2032</xmax>
               <ymax>1469</ymax>
           </bndbox>
       </object>
       <object>
           <name>cashang</name>
           <pose>Unspecified</pose>
           <truncated>0</truncated>
           <difficult>0</difficult>
           <bndbox>
               <xmin>1317</xmin>
               <ymin>1415</ymin>
               <xmax>1784</xmax>
               <ymax>1538</ymax>
           </bndbox>
       </object>
    </annotation>

到此，数据集制作完毕。

下一篇，faster-rcnn源码修改，待续。
5月14日更新

博文：https://blog.csdn.net/ganwenbo2011/article/details/90169980
————————————————
版权声明：本文为CSDN博主「ganwenbo2011」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/ganwenbo2011/article/details/89713124