yolov3识别的类别_目标检测--YOLOV3（附TensorFlow代码详解）

最新推荐文章于 2024-04-09 18:03:42 发布

原创最新推荐文章于 2024-04-09 18:03:42 发布 · 1.5k 阅读

3 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#yolov3识别的类别

这篇博客详细介绍了YOLOv3目标检测模型的预测过程，包括网络结构、候选框定义、权重加载、非极大值抑制等关键步骤，并提供了TensorFlow实现代码。同时，讲解了训练过程中的样本准备、标签制作和损失函数计算，深入探讨了YOLOv3的损失计算方式。

TensorFlow-v2.9

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。它提供了一个灵活的平台,用于构建和训练各种机器学习模型

YOLOv3代码详解：

一、预测过程：

1.网络结构的定义：

网络最后得到的detect_1,detect_2,detect_3.

三个尺度的形状分别为：[1, 507(13X13X3), 5+c]、[1, 2028, 5+c]、[1, 8112, 5+c]

其中Yolo_block是一个正常卷积(不改变图像大小)组成的模块，生成route和inputs两个结果，route 用于配合下一个尺度的特征一起计算，返回值inputs用于输入检测层进行bbox_atrrs单元的计算。

detect_layer为检测层，这里用到了候选框，来自于训练数据集的样本。在模型训练时，需要对数据集的标注样本进行聚类分析，得到具体尺寸，代表目标样本中最常见的尺寸，在训练或者模型测试时，将尺寸数据作为先验知识一起放到模型里，可以提高模型准确率。

#定义候选框，来自coco数据集

_ANCHORS = [(10., 13.), (16., 30.), (33., 23.), (30., 61.), (62., 45.), (59., 119.), (116., 90.), (156., 198.), (373., 326.)]

这一部分需要重点理解：

候选框的个数怎么定，其数量的大小对结果有怎样的影响？

def _detection_layer(inputs,num_classes,anchors,img_size,data_format):

print(inputs.get_shape())

"""得到通道数为num_anchors*(5+num_classes)，大小为h*w的预测结果"""

num_anchors = len(anchors) #候选框个数

predictions = slim.conv2d(inputs,num_anchors*(5+num_classes),1,stride=1,

normalizer_fn = None, activation_fn=None,

biases_initializer = tf.zeros_initializer())

shape = predictions.get_shape().as_list() # [batch, H, W, C] C = num_anchors*(5+num_classes)

print("shape",shape)#三个尺度的形状分别为：[1, 13, 13, 3*(5+c)]、[1, 26, 26, 3*(5+c)]、[1, 52, 52, 3*(5+c)]

grid_size = shape[1:3] #去 NHWC中的HW

dim = grid_size[0] * grid_size[1]#每个格子所包含的像素 h*W

bbox_attrs = 5 + num_classes

"""因为最终要得到每个候选框里每个像素的bbox_attrs，所以先reshape成 num_anchors * dim, bbox_attrs格式，然后对bbox_attrs按照2,2,1,num_classes的格式进行单元属性的拆分"""

#把h和w展开成dim [batch,num_anchors * dim,5+num_classes]

predictions = tf.reshape(predictions, [-1, num_anchors * dim, bbox_attrs])

stride = (img_size[0] // grid_size[0], img_size[1] // grid_size[1])#缩放参数 32(416/13)

anchors = [(a[0] / stride[0], a[1] / stride[1]) for a in anchors]#将候选框的尺寸同比例缩小

#将含边框的单元属性拆分

box_centers,box_sizes,confidence,classes = tf.split(predictions,[2,2,1,num_classes],axis=-1)

"""对拆分的单元属性进行一一求解，其中box_centers和box_sizes要映射到原始图上的值的大小，最后将求解的属性值再合并起来"""

box_centers = tf.nn.sigmoid(box_centers)

confidence = tf.nn.sigmoid(confidence)

grid_x = tf.range(grid_size[0],dtype = tf.float32) #定义网格索引0,1,2....n ，shape = (1,13)

grid_y = tf.range(grid_size[1],dtype = tf.float32) #定义网格索引0,1,2....m, shape = (1,13)

a, b = tf.meshgrid(grid_x, grid_y)#生成网格矩阵 a0，a1.。。an(共M行) ， b0，b0，。。。b0(共n个)，第二行为b1

x_offset = tf.reshape(a,(-1,1))

y_offset = tf.reshape(b,(-1,1))

x_y_offset = tf.concat([x_offset, y_offset], axis=-1)#连接----[dim,2]

x_y_offset = tf.reshape(tf.tile(x_y_offset, [1, num_anchors]), [1, -1, 2])#按候选框的个数复制xy(【1，n】代表第0维一次，第1维n次)

box_centers = box_centers + x_y_offset#box_centers为0-1，x_y为具体网格的索引，相加后，就是真实位置(0.1+4=4.1，第4个网格里0.1的偏移)

box_centers = box_centers * stride#真实尺寸像素点

anchors = tf.tile(anchors, [dim, 1])

box_sizes = tf.exp(box_sizes) * anchors#计算边长：hw

box_sizes = box_sizes * stride#真实边长

detections = tf.concat([box_centers, box_sizes, confidence], axis=-1)

classes = tf.nn.sigmoid(classes)

predictions = tf.concat([detections, classes], axis=-1)#将转化后的结果合起来

print(predictions.get_shape())

return predictions#返回预测值

2.预训练权重的加载：

该文件是二进制格式.weights,文件前5个int32值是标题信息，标题之后是网络的权重

#加载权重

def load_weights(var_list, weights_file):

with open(weights_file, "rb") as fp:

_ = np.fromfile(fp, dtype=np.int32, count=5)#跳过前5个int32

weights = np.fromfile(fp, dtype=np.float32)

ptr = 0

i = 0

assign_ops = []

while i < len(var_list) - 1:

var1 = var_list[i]

var2 = var_list[i + 1]

#找到卷积项

if 'Conv' in var1.name.split('/')[-2]:

# 找到BN参数项

if 'BatchNorm' in var2.name.split('/')[-2]:

# 加载批量归一化参数

gamma, beta, mean, var = var_list[i + 1:i + 5]

batch_norm_vars = [beta, gamma, mean, var]

for var in batch_norm_vars:

shape = var.shape.as_list()

num_params = np.prod(shape)

var_weights = weights[ptr:ptr + num_params].reshape(shape)

ptr += num_params

assign_ops.append(tf.assign(var, var_weights, validate_shape=True))

i += 4#已经加载了4个变量，指针移动4

elif 'Conv' in var2.name.split('/')[-2]:

bias = var2

bias_shape = bias.shape.as_list()

bias_params = np.prod(bias_shape)

bias_weights = weights[ptr:ptr + bias_params].reshape(bias_shape)

ptr += bias_params

assign_ops.append(tf.assign(bias, bias_weights, validate_shape=True))

i += 1#移动指针

shape = var1.shape.as_list()

num_params = np.prod(shape)

#加载权重

var_weights = weights[ptr:ptr + num_params].reshape((shape[3], shape[2], shape[0], shape[1]))

var_weights = np.transpose(var_weights, (2, 3, 1, 0))

ptr += num_params

assign_ops.append(tf.assign(var1, var_weights, validate_shape=True))

i += 1

return assign_ops

这里返回的assign_ops是各种tf.assign()操作的list，在session中run(load_ops)，可以将权重值付给模型中的各变量。

3.使用NMS方法，对结果去重

因为是检测模块生成的是中心点、宽和高的形式，需要其进行转化，将中心点、高、宽坐标转化为[x0, y0, x1, y1]坐标形式：

def detections_boxes(detections):

center_x, center_y, width, height, attrs = tf.split(detections, [1, 1, 1, 1, -1], axis=-1)

w2 = width / 2

h2 = height / 2

x0 = center_x - w2

y0 = center_y - h2

x1 = center_x + w2

y1 = center_y + h2

boxes = tf.concat([x0, y0, x1, y1], axis=-1)

detections = tf.concat([boxes, attrs], axis=-1)

return detections

从检测层一张图片可以检测出[1, 507(13X13X3), 5+c] + [1, 2028, 5+c] + [1, 8112, 5+c],三个尺度，一共10647个结果，其中很可能出现重复的物体，为保留检测结果的唯一性，得用非极大值抑制方法对10647个结果进行去重：

1)从所有的检测框找到置信度较大(置信度大于某个阈值)的那个框 2)挨个计算其与剩余框面积的重叠度(intersection over union,IOU) 3)按照IOU阈值过滤，如果大于一定阈值(重合度过高)，将该框剔除其中，置信度阈值和iou阈值需要提前给出

#使用NMS方法，对结果去重

def non_max_suppression(predictions_with_boxes, confidence_threshold, iou_threshold=0.4):

conf_mask = np.expand_dims((predictions_with_boxes[:, :, 4] > confidence_threshold), -1)

predictions = predictions_with_boxes * conf_mask

result = {}

for i, image_pred in enumerate(predictions):

shape = image_pred.shape

print("shape1",shape)

non_zero_idxs = np.nonzero(image_pred)

image_pred = image_pred[non_zero_idxs[0]]

print("shape2",image_pred.shape)

image_pred = image_pred.reshape(-1, shape[-1])

bbox_attrs = image_pred[:, :5]

classes = image_pred[:, 5:]

classes = np.argmax(classes, axis=-1)

unique_classes = list(set(classes.reshape(-1)))

for cls in unique_classes:

cls_mask = classes == cls

cls_boxes = bbox_attrs[np.nonzero(cls_mask)]

cls_boxes = cls_boxes[cls_boxes[:, -1].argsort()[::-1]]

cls_scores = cls_boxes[:, -1]

cls_boxes = cls_boxes[:, :-1]

while len(cls_boxes) > 0:

box = cls_boxes[0]

score = cls_scores[0]

if not cls in result:

result[cls] = []

result[cls].append((box, score))

cls_boxes = cls_boxes[1:]

ious = np.array([_iou(box, x) for x in cls_boxes])

iou_mask = ious < iou_threshold

cls_boxes = cls_boxes[np.nonzero(iou_mask)]

cls_scores = cls_scores[np.nonzero(iou_mask)]

return result

其中用到的iou计算函数：

#定义函数计算两个框的内部重叠情况(IOU)box1，box2为左上、右下的坐标[x0, y0, x1, x2]

def _iou(box1, box2):

b1_x0, b1_y0, b1_x1, b1_y1 = box1

b2_x0, b2_y0, b2_x1, b2_y1 = box2

int_x0 = max(b1_x0, b2_x0)

int_y0 = max(b1_y0, b2_y0)

int_x1 = min(b1_x1, b2_x1)

int_y1 = min(b1_y1, b2_y1)

int_area = (int_x1 - int_x0) * (int_y1 - int_y0)

b1_area = (b1_x1 - b1_x0) * (b1_y1 - b1_y0)

b2_area = (b2_x1 - b2_x0) * (b2_y1 - b2_y0)

#分母加个1e-05，避免除数为 0

iou = int_area / (b1_area + b2_area - int_area + 1e-05)

return iou

4.图片检测结果的显示：

#将级别结果显示在图片上

def draw_boxes(boxes, img, cls_names, detection_size):

draw = ImageDraw.Draw(img)

for cls, bboxs in boxes.items():

color = tuple(np.random.randint(0, 256, 3))

for box, score in bboxs:

box = convert_to_original_size(box, np.array(detection_size), np.array(img.size)) #转化到原图大小

draw.rectangle(box, outline=color)

draw.text(box[:2], '{} {:.2f}%'.format(cls_names[cls], score * 100), fill=color)

print('{} {:.2f}%'.format(cls_names[cls], score * 100),box[:2])

def convert_to_original_size(box, size, original_size):

ratio = original_size / size

box = box.reshape(2, 2) * ratio

return list(box.reshape(-1))

二、训练过程：

1.训练样本的准备：

1.1 读取样本信息

这里的数据集是VOC2007，主要是读取.jpg读片文件获取图像信息，然后通过解析.xml文件获取每个标注框包含位置信息和类别信息。

首先得到的是.xml文件的路径列表，包含了所有.xml文件的路径，

#定义样本路径

ann_dir = os.path.join("../../VOC2007/VOCdevkit/VOC2007/Annotations/", "*.xml")

img_dir = "../../VOC2007/VOCdevkit/VOC2007/JPEGImages"

train_ann_names = glob.glob(ann_dir)#获取该路径下的xml文件

每个.xml包含了对应.jpg图像的文件名，结合图像文件夹的路径，可以获得.xml文件对应的.jpg图片文件的路径。一张图片里包含多张标注框(boxes),每个标注框包含位置信息和类别信息

image_name, boxes, coded_labels = parse_annotation(self.ann_names[self._index], self.img_dir, self.lable_names)

图片信息和标注框的信息经过image augment和resize后，生成新的尺寸下的图片和标注框信息。

def imread(self, img_file, boxes):

image = cv2.imread(img_file)

boxes_ = np.copy(boxes)

if self._jitter: #是否要增强数据

image, boxes_ = make_jitter_on_image(image, boxes_)

image, boxes_ = resize_image(image, boxes_, self._w, self._h)

return image, boxes_

至此，一张图片的信息读取和预处理的过程已经完成，得到三样信息

1)图片 1张

2)多个标注框的位置

3)多个标注框所包含目标的类别

现在要做的是怎样把这三类信息转化成yolo-v3需要的训练数据(主要是标签信息)。

1.2制作标签

得到样本信息后，现在要做的是怎样把这三类信息转化成yolo-v3需要的训练数据(主要是标签格式)。

YOLO_V3网络最后输出三个尺度的信息，以416*416大小的图片为例，将它输入yolo_v3网落，最后得到的输出为三个大小为[batch_size, 13X13X3, 5+20],[batch_size, 26X26X3, 5+20],[batch_size,52X52X3, 5+20], 需要得到3个大小(指各维度上的大小总乘积，因为在计算loss的时候，可以用reshape的方法使得标签值和预测值的shape一致)和预测值一样的标签格式。

1)将标签设置成3个尺度的矩阵，每个尺度下对应三个不同大小的候选框，也就是一共9个候选框，它们由数据集中所有标注框的大小进行聚类得到的聚类中心。

2)每个矩阵的高和宽分别和模型的三个输出尺度相同，13x13,26x26,52x52

3)每个矩阵中的点可以看成一个格子

4)每个尺度下对应3个不同大小的候选框，也就是一共9个候选框，它们由数据集中所有标注框的大小进行聚类得到的聚类中心。

5)每个格子中包含这个尺度下对应的三个候选框的信息

6)每个候选框的信息包括了中心点的坐标、高和宽(标注框相对候选的缩放比例值)、属于该分类的概率(置信度)、该分类的one-hot编码。)

具体过程：

1)构造3个矩阵当作标签放置的容器，每个的大小如上所示,并向这三个矩阵填充0作为初始值。shape = 【13，13, 3，5+20】

2)根据标注框的物体的高和宽的尺寸，找到与之最为接近的候选框，根据这些候选框对应的索引值，可以知道它对应矩阵(1个矩阵对应3个候选框)的索引值，以及它在矩阵中的索引值。

3)计算物体在矩阵上的中心点的位置，以及相对于候选框的缩放比例值

4)按照相应的索引值，定位到相应矩阵的相应位置，将候选框的信息填入其中。

最后的结果：

假设有1张图片，上面有4个标注框，最后得到的标签结果是，在3个矩阵的候选框维度上(一共9维)，只有4维的信息是对应的标注框信息填充值，其它维度上都是初始填充值0.

1.3 构建YOLO_v3模型

在预测过程中已经阐述过了，最后的得到三个尺寸的预测结果，和预测过程不同的是，不用concate三个尺寸的结果已经进行NMS去重

1.4 损失函数的计算：

import tensorflow as tf

def _create_mesh_xy(batch_size, grid_h, grid_w, n_box):#生成带序号的网格

mesh_x = tf.cast(tf.reshape(tf.tile(tf.range(grid_w), [grid_h]), (1, grid_h, grid_w, 1, 1)),tf.float32)

mesh_y = tf.transpose(mesh_x, (0,2,1,3,4))

mesh_xy = tf.tile(tf.concat([mesh_x,mesh_y],-1), [batch_size, 1, 1, n_box, 1])

return mesh_xy

def adjust_pred_tensor(y_pred):#将网格信息融入坐标，置信度做sigmoid。并重新组合

grid_offset = _create_mesh_xy(*y_pred.shape[:4])

pred_xy = grid_offset + tf.sigmoid(y_pred[..., :2]) #计算该尺度矩阵上的坐标sigma(t_xy) + c_xy

pred_wh = y_pred[..., 2:4] #取出预测物体的尺寸t_wh

pred_conf = tf.sigmoid(y_pred[..., 4]) #对分类概率(置信度)做sigmoid转化

pred_classes = y_pred[..., 5:] #取出分类结果

#重新组合

preds = tf.concat([pred_xy, pred_wh, tf.expand_dims(pred_conf, axis=-1), pred_classes], axis=-1)

return preds

#生成一个矩阵。每个格子里放有3个候选框

def _create_mesh_anchor(anchors, batch_size, grid_h, grid_w, n_box):

mesh_anchor = tf.tile(anchors, [batch_size*grid_h*grid_w])

mesh_anchor = tf.reshape(mesh_anchor, [batch_size, grid_h, grid_w, n_box, 2])#每个候选框有2个值

mesh_anchor = tf.cast(mesh_anchor, tf.float32)

return mesh_anchor

def conf_delta_tensor(y_true, y_pred, anchors, ignore_thresh):

pred_box_xy, pred_box_wh, pred_box_conf = y_pred[..., :2], y_pred[..., 2:4], y_pred[..., 4]

#带有候选框的格子矩阵

anchor_grid = _create_mesh_anchor(anchors, *y_pred.shape[:4])#y_pred.shape为(2，13，13，3，15)

true_wh = y_true[:,:,:,:,2:4]

true_wh = anchor_grid * tf.exp(true_wh)

true_wh = true_wh * tf.expand_dims(y_true[:,:,:,:,4], 4)#还原真实尺寸，高和宽

anchors_ = tf.constant(anchors, dtype='float', shape=[1,1,1,y_pred.shape[3],2])#y_pred.shape[3]为候选框个数

true_xy = y_true[..., 0:2]#获取中心点

true_wh_half = true_wh / 2.

true_mins = true_xy - true_wh_half#计算起始坐标

true_maxes = true_xy + true_wh_half#计算尾部坐标

pred_xy = pred_box_xy

pred_wh = tf.exp(pred_box_wh) * anchors_

pred_wh_half = pred_wh / 2.

pred_mins = pred_xy - pred_wh_half#计算起始坐标

pred_maxes = pred_xy + pred_wh_half#计算尾部坐标

intersect_mins = tf.maximum(pred_mins, true_mins)

intersect_maxes = tf.minimum(pred_maxes, true_maxes)

#计算重叠面积

intersect_wh = tf.maximum(intersect_maxes - intersect_mins, 0.)

intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

true_areas = true_wh[..., 0] * true_wh[..., 1]

pred_areas = pred_wh[..., 0] * pred_wh[..., 1]

#计算不重叠面积

union_areas = pred_areas + true_areas - intersect_areas

best_ious = tf.truediv(intersect_areas, union_areas)#计算iou

#ios小于阈值将作为负向的loss

conf_delta = pred_box_conf * tf.cast(best_ious < ignore_thresh,tf.float32)

return conf_delta

def wh_scale_tensor(true_box_wh, anchors, image_size):

image_size_ = tf.reshape(tf.cast(image_size, tf.float32), [1,1,1,1,2])

anchors_ = tf.constant(anchors, dtype='float', shape=[1,1,1,3,2])

#计算高和宽的缩放范围

wh_scale = tf.exp(true_box_wh) * anchors_ / image_size_

#物体尺寸占整个图片的面积比

wh_scale = tf.expand_dims(2 - wh_scale[..., 0] * wh_scale[..., 1], axis=4)

return wh_scale

#位置loss为box之差乘缩放比，所得的结果，再进行平方求和

def loss_coord_tensor(object_mask, pred_box, true_box, wh_scale, xywh_scale):

xy_delta = object_mask * (pred_box-true_box) * wh_scale * xywh_scale

loss_xy = tf.reduce_sum(tf.square(xy_delta), list(range(1,5)))#按照1，2，3，4(xyhw)规约求和

return loss_xy

def loss_conf_tensor(object_mask, pred_box_conf, true_box_conf, obj_scale, noobj_scale, conf_delta):

object_mask_ = tf.squeeze(object_mask, axis=-1)

conf_delta = object_mask_ * (pred_box_conf-true_box_conf) * obj_scale + (1-object_mask_) * conf_delta * noobj_scale

loss_conf = tf.reduce_sum(tf.square(conf_delta), list(range(1,4)))#按照1，2，3(候选框)归约求和，0为批次

return loss_conf

def loss_class_tensor(object_mask, pred_box_class, true_box_class, class_scale):

true_box_class_ = tf.cast(true_box_class, tf.int64)

class_delta = object_mask * \

tf.expand_dims(tf.nn.softmax_cross_entropy_with_logits_v2(labels=true_box_class_, logits=pred_box_class), 4) * \

class_scale

loss_class = tf.reduce_sum(class_delta, list(range(1,5)))

return loss_class

ignore_thresh=0.5

grid_scale=1

obj_scale=5

noobj_scale=1

xywh_scale=1

class_scale=1

def lossCalculator(y_true, y_pred, anchors,image_size): #image_size【h,w】

y_pred = tf.reshape(y_pred, y_true.shape) #(2, 13, 13, 3, 15)

object_mask = tf.expand_dims(y_true[..., 4], 4)#(2, 13, 13, 3, 1)

preds = adjust_pred_tensor(y_pred)#将box与置信度数值变化后重新组合

conf_delta = conf_delta_tensor(y_true, preds, anchors, ignore_thresh)

wh_scale = wh_scale_tensor(y_true[..., 2:4], anchors, image_size)

loss_box = loss_coord_tensor(object_mask, preds[..., :4], y_true[..., :4], wh_scale, xywh_scale)

loss_conf = loss_conf_tensor(object_mask, preds[..., 4], y_true[..., 4], obj_scale, noobj_scale, conf_delta)

loss_class = loss_class_tensor(object_mask, preds[..., 5:], y_true[..., 5:], class_scale)

loss = loss_box + loss_conf + loss_class

return loss*grid_scale

def loss_fn(list_y_trues, list_y_preds,anchors,image_size):

inputanchors = [anchors[12:],anchors[6:12],anchors[:6]]

losses = [lossCalculator(list_y_trues[i], list_y_preds[i], inputanchors[i],image_size) for i in range(len(list_y_trues)) ]

return tf.sqrt(tf.reduce_sum(losses)) #将三个矩阵的loss相加再开平方

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.9

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。它提供了一个灵活的平台,用于构建和训练各种机器学习模型

显存	CPU	内存	系统盘	数据盘
24GB	10核心	120GB	50GB	40GB