正文

深度学习之目标检测(代码片段)

黑桃不带K  黑桃不带K  2023-01-04  281

关键词：

 算法的设计往往与名字有着绝对的关联性，目标定位检测即目标定位+检测。在深度学习中比较常用的目标定位检测方法有RCNN系列方法和YOLO系列方法。其中RCNN系列方法的定位过程和检测过程是分开的，即先定位目标，然后对定位出的目标进行分类，这种设计思路有利节约运算资源，但却不利于实时的定位检测场景。与之相反的是YOLO系列算法，由于将采用图像和位置坐标相融合的表述方式使得该方法能够运用于实时场景中。

1、RCNN系列算法的介绍

RCNN系列算法指的是RCNN、Fast-RCNN、Faster-RCNN等一系列由RCNN算法演变出的算法。这类算法通常是采用两个步骤来实现对目标的定位及检测的，即定位+检测。定位算法通常在RCNN算法中也与很多，详细参照主要包括滑动窗口模型、和选择性收索模型等。然后特征分类网络一般采用ResNet系列模型及VGG系列模型。当然我们也可尝试使用GoogleNet或者Inception系列模型进行训练，以提高发杂分类场景中的分类准确性。RCNN系列模型也被称作为Two Stage模型。

Fast-RCNN论文链接
大致实现步骤如下：
a、输入图片
b、收索目标区域
c、提取特征
d、图像分类


#基于选择性收索的候选区域python基本实现
import cv2

if __name__ == '__main__':
    # If image path and f/q is not passed as command
    # line arguments, quit and display help message
    # speed-up using multithreads
    cv2.setUseOptimized(True)
    cv2.setNumThreads(1)

    # read image
    im = cv2.imread('6.tif')
    ResizeValue = 2

    # resize image
    newHeight = int((im.shape[0])/ResizeValue)
    newWidth = int((im.shape[1])/ResizeValue)
    #newWidth = int(im.shape[1] * 200 / im.shape[0])
    #print(int(im.shape[1]))
    im = cv2.resize(im, (newWidth, newHeight))

    # create Selective Search Segmentation Object using default parameters
    ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

    # set input image on which we will run segmentation
    ss.setBaseImage(im)
    ss.switchToSelectiveSearchFast()
    rects = ss.process()
    print('Total Number of Region Proposals: '.format(len(rects)))

    # number of region proposals to show
    numShowRects = len(rects)
    # increment to increase/decrease total number
    # of reason proposals to be shown
    increment = 50
    Area = 100*100

    set_x,set_y,set_w,set_h = [],[],[],[]
    rec_value = []

    while True:
        # create a copy of original image
        imOut = im.copy()

        # itereate over all the region proposals
        for i, rect in enumerate(rects):
            # draw rectangle for region proposal till numShowRects
            if (i < numShowRects):
                x, y, w, h = rect
                Area_Get = w*h
                key = []
                if 400>w>350 and 600>h>500:
                    key.append(x)
                    key.append(y)
                    key.append(w)
                    key.append(h)
                    rec_value.append(tuple(key))
                    cv2.rectangle(imOut, (x, y), (x + w, y + h), (0, 255, 0), 1, cv2.LINE_AA)
            else:
                break
        # show output
        cv2.imshow("Output", imOut)
        # record key press
        k = cv2.waitKey(0) & 0xFF
        # m is pressed
        if k == 109:
            # increase total number of rectangles to show by increment
            numShowRects += increment
        # l is pressed
        elif k == 108 and numShowRects > increment:
            # decrease total number of rectangles to show by increment
            numShowRects -= increment
        # q is pressed
        elif k == 113:
            break
    # close image show window
    cv2.destroyAllWindows()

总结：
RCNN系列算法本质上都是一样的，无非是Two Stage步骤来回优化，首先需要优化的必然是候选区域的生成方法，无论是滑动窗口法还是选择性收索法都不具有效率，在此基础上Faster-RCNN提出了一种bbox regression的方法，正式将two方法真正意义上变成了完全的基于深度学习的目标检测方法。并且提升了算法的效率。然后就是特征提取模型，一般特征提取模型有三种，即深度模型，广度模型和混合模型，这里深度模型指的是算法层数不断在叠加的模型，典型的模型为ResNet系列模型，广度模型指的是通过不同的卷积窗口对图像特征进行提取的模型，典型的模型为VGG系列模型，还有就是混合模型，即既注重深度特征提取，又注重广度特征提取的模型，如Inception系列模型以及GoogleNet等。

2、YOLO系列算法的介绍

YoloV1原文链接

Yolo系列算法是典型的one stage算法，同样，在算法设计上也注重目标区域的检测以及特征的分类，这里目标区域的检测采用的是和图像区域分类定位的方式实现的。

代码实现：

import torch
import torch.nn as nn
from utils import SPP, SAM, BottleneckCSP, Conv
from backbone import resnet18
import numpy as np
import tools

class myYOLO(nn.Module):
    def __init__(self, device, input_size=None, num_classes=20, trainable=False, conf_thresh=0.01, nms_thresh=0.5, hr=False):
        super(myYOLO, self).__init__()
        self.device = device
        self.num_classes = num_classes
        self.trainable = trainable
        self.conf_thresh = conf_thresh
        self.nms_thresh = nms_thresh
        self.stride = 32
        self.grid_cell = self.create_grid(input_size)
        self.input_size = input_size
        self.scale = np.array([[[input_size[1], input_size[0], input_size[1], input_size[0]]]])
        self.scale_torch = torch.tensor(self.scale.copy(), device=device).float()

        # we use resnet18 as backbone
        self.backbone = resnet18(pretrained=True)

        # neck
        self.SPP = nn.Sequential(
            Conv(512, 256, k=1),
            SPP(),
            BottleneckCSP(256*4, 512, n=1, shortcut=False)
        )
        self.SAM = SAM(512)
        self.conv_set = BottleneckCSP(512, 512, n=3, shortcut=False)

        self.pred = nn.Conv2d(512, 1 + self.num_classes + 4, 1)
    
    def create_grid(self, input_size):
        w, h = input_size[1], input_size[0]
        # generate grid cells
        ws, hs = w // self.stride, h // self.stride
        grid_y, grid_x = torch.meshgrid([torch.arange(hs), torch.arange(ws)])
        grid_xy = torch.stack([grid_x, grid_y], dim=-1).float()
        grid_xy = grid_xy.view(1, hs*ws, 2).to(self.device)
        return grid_xy

    def set_grid(self, input_size):
        self.input_size = input_size
        self.grid_cell = self.create_grid(input_size)
        self.scale = np.array([[[input_size[1], input_size[0], input_size[1], input_size[0]]]])
        self.scale_torch = torch.tensor(self.scale.copy(), device=self.device).float()

    def decode_boxes(self, pred):
        """
        input box :  [tx, ty, tw, th]
        output box : [xmin, ymin, xmax, ymax]
        """
        output = torch.zeros_like(pred)
        pred[:, :, :2] = torch.sigmoid(pred[:, :, :2]) + self.grid_cell
        pred[:, :, 2:] = torch.exp(pred[:, :, 2:])

        # [c_x, c_y, w, h] -> [xmin, ymin, xmax, ymax]
        output[:, :, 0] = pred[:, :, 0] * self.stride - pred[:, :, 2] / 2
        output[:, :, 1] = pred[:, :, 1] * self.stride - pred[:, :, 3] / 2
        output[:, :, 2] = pred[:, :, 0] * self.stride + pred[:, :, 2] / 2
        output[:, :, 3] = pred[:, :, 1] * self.stride + pred[:, :, 3] / 2
        
        return output

    def nms(self, dets, scores):
        """"Pure Python NMS baseline."""
        x1 = dets[:, 0]  #xmin
        y1 = dets[:, 1]  #ymin
        x2 = dets[:, 2]  #xmax
        y2 = dets[:, 3]  #ymax

        areas = (x2 - x1) * (y2 - y1)                 # the size of bbox
        order = scores.argsort()[::-1]                        # sort bounding boxes by decreasing order

        keep = []                                             # store the final bounding boxes
        while order.size > 0:
            i = order[0]                                      #the index of the bbox with highest confidence
            keep.append(i)                                    #save it to keep
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])

            w = np.maximum(1e-28, xx2 - xx1)
            h = np.maximum(1e-28, yy2 - yy1)
            inter = w * h

            # Cross Area / (bbox + particular area - Cross Area)
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
            #reserve all the boundingbox whose ovr less than thresh
            inds = np.where(ovr <= self.nms_thresh)[0]
            order = order[inds + 1]

        return keep

    def postprocess(self, all_local, all_conf, exchange=True, im_shape=None):
        """
        bbox_pred: (HxW, 4), bsize = 1
        prob_pred: (HxW, num_classes), bsize = 1
        """
        bbox_pred = all_local
        prob_pred = all_conf

        cls_inds = np.argmax(prob_pred, axis=1)
        prob_pred = prob_pred[(np.arange(prob_pred.shape[0]), cls_inds)]
        scores = prob_pred.copy()
        
        # threshold
        keep = np.where(scores >= self.conf_thresh)
        bbox_pred = bbox_pred[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        # NMS
        keep = np.zeros(len(bbox_pred), dtype=np.int)
        for i in range(self.num_classes):
            inds = np.where(cls_inds == i)[0]
            if len(inds) == 0:
                continue
            c_bboxes = bbox_pred[inds]
            c_scores = scores[inds]
            c_keep = self.nms(c_bboxes, c_scores)
            keep[inds[c_keep]] = 1

        keep = np.where(keep > 0)
        bbox_pred = bbox_pred[keep]
        scores = scores[keep]
        cls_inds = cls_inds[keep]

        if im_shape != None:
            # clip
            bbox_pred = self.clip_boxes(bbox_pred, im_shape)

        return bbox_pred, scores, cls_inds

    def forward(self, x, target=None):
        # backbone
        _, _, C_5 = self.backbone(x)

        # head
        C_5 = self.SPP(C_5)
        C_5 = self.SAM(C_5)
        C_5 = self.conv_set(C_5)

        # pred
        prediction = self.pred(C_5)
        prediction = prediction.view(C_5.size(0), 1 + self.num_classes + 4, -1).permute(0, 2, 1)
        B, HW, C = prediction.size()

        # Divide prediction to obj_pred, txtytwth_pred and cls_pred   
        # [B, H*W, 1]
        conf_pred = prediction[:, :, :1]
        # [B, H*W, num_cls]
        cls_pred = prediction[:, :, 1 : 1 + self.num_classes]
        # [B, H*W, 4]
        txtytwth_pred = prediction[:, :, 1 + self.num_classes:]

        # test
        if not self.trainable:
            with torch.no_grad():
                # batch size = 1
                all_conf = torch.sigmoid(conf_pred)[0]           # 0 is because that these is only 1 batch.
                all_bbox = torch.clamp((self.decode_boxes(txtytwth_pred) / self.scale_torch)[0], 0., 1.)
                all_class = (torch.softmax(cls_pred[0, :, :], 1) * all_conf)
                
                # separate box pred and class conf
                all_conf = all_conf.to('cpu').numpy()
                all_class = all_class.to('cpu').numpy()
                all_bbox = all_bbox.to('cpu').numpy()
                
                bboxes, scores, cls_inds = self.postprocess(all_bbox, all_class)

                return bboxes, scores, cls_inds
        else:
            conf_loss, cls_loss, txtytwth_loss, total_loss = tools.loss(pred_conf=conf_pred, pred_cls=cls_pred,
                                                                        pred_txtytwth=txtytwth_pred,
                                                                        label=target)

            return conf_loss, cls_loss, txtytwth_loss, total_loss

基本实现思路：
A、获取目标区域

将图片划分为S X S大小
确认边界及概率
对目标的概率映射
获取最终的目标边框

B、对目标进行分类

将目标区域转换为特定大小的尺寸
将转换后的图像输入特征提取网络
对输入目标进行分类

总结：
yolo系列算法是一种比较成熟的目标检测算法框架，基于这种框架的算法还在不断地迭代中，当然解决的问题也越来越细化，比如候选区精度、比如小尺度检测等。基本上YoloV3及以上版本的算法可以在很多场景下得到现实应用。当然，问题总是在不断地出现和得到补充的，期待能够看到更加高效准确的基于Yolo算法设计思路而来的新算法。

深度学习之目标检测常用算法原理+实践精讲

...,一步步带大家了解和完成目标检测实战案例，尽快进入深度学习领域.课程目录:第1章课程介绍本章节主要介绍课程的主要内容、核心知识点、课程涉及到的应用案例、深度学习算法设计通用流程、适应人群、学习本门课程的前... 查看详情

深度学习之目标检测常用算法原理+实践精讲

...课程的主要内容、核心知识点、课程涉及到的应用案例、深度学习算法设计通用流程、适应人群、学习本门课程的前置条件、学习后达到的效果等，帮助大家从整体上了解本门课程的整体脉络。1-1课程导学第2章目标检测算法基... 查看详情

深度学习之目标检测常用算法原理+实践精讲yolo/fasterrcnn/ssd/文本检测/多任务网络

深度学习之目标检测常用算法原理+实践精讲YOLO/FasterRCNN/SSD/文本检测/多任务网络资源获取链接：点击这里第1章课程介绍本章节主要介绍课程的主要内容、核心知识点、课程涉及到的应用案例、深度学习算法设计通用流程、适应... 查看详情

深度学习之线性回归+基础优化(代码片段)

线性回归可以看作一个最简单的神经网络模型损失函数在我们开始考虑如何用模型拟合（fit）数据之前，我们需要确定一个拟合程度的度量。损失函数（lossfunction）能够量化目标的实际值与预测值之间的差距... 查看详情

深度学习之模型构建(代码片段)

标准模型fromkeras.utilsimportplot_modelfromkeras.modelsimportModelfromkeras.layersimportInputfromkeras.layersimportDensevisible=Input(shape=(10,))hidden1=Dense(10,activation=‘relu‘)(visible)hidden2=Dense( 查看详情

深度学习之图像分类（二十六）--convmixer网络详解(代码片段)

深度学习之图像分类（二十六）ConvMixer网络详解目录深度学习之图像分类（二十六）ConvMixer网络详解1.前言2.ASimpleModel:ConvMixer2.1PatchEmbedding2.2ConvMixerLayer2.3ConvMixer网络结构2.4实现代码：3.WeightVisualizations4 查看详情

深度学习之gru算法例子(代码片段)

首先下载代码：https://github.com/whk6688/rnn例子1：预测下文privatevoidtrain(CharTextctext,doublelr)Map<Integer,String>indexChar=ctext.getIndexChar();Map<String,DoubleMatrix>charVector=ctext.getCharVect 查看详情

浅谈机器学习之深度学习(代码片段)

个人公众号yk坤帝获取更多学习资料，学习建议1.1.4深度学习之“深度”深度学习是机器学习的一个分支领域：它是从数据中学习表示的一种新方法，强调从连续的层（layer）中进行学习，这些层对应于越来... 查看详情

深度学习之nlp(代码片段)

知识点"""1)fromgensim.modelimportWord2Vec　　importjieba2)opencc：将繁体字转换为简体字转换命令：opencc-itexts.txt-otest.txt-ct2s.json3)自然语言处理：1、拼写检查、关键字检索2、文本挖掘3、文本分类（二分类）4、机器翻译5、客服系统6、复杂... 查看详情

深度学习之pythonpandas(代码片段)

...用的工具，在数据科学细分领域大数据（通常和深度学习有关）这部分，本篇博客从pandas重要函数开始，到数据变换以及数据分析。pandas提供了数据变换、数据清理、数据可视化以及数据提取等主要数据处理... 查看详情

深度学习之pythonpandas(代码片段)

深度学习之线性代数(代码片段)

这里主要介绍一些张量的计算，如求和，转置等标量运算importtorchx=torch.tensor(3.0)y=torch.tensor(4.0)print(x*y,x+y,x-y,x**y,x/y)#这种单一元素(标量)可以进行各种四则运算等print(x.shape)矩阵转置importtorchx=torch.arange(20) 查看详情

深度学习之tensorflow安装与初体验(代码片段)

深度学习之TensorFlow安装与初体验学习前搞懂一些关系和概念首先，搞清楚一个关系：深度学习的前身是人工神经网络，深度学习只是人工智能的一种，深层次的神经网络结构就是深度学习的模型，浅层次的神经网络结构是浅度... 查看详情

深度学习之epochbatchiteration(代码片段)

知识点无论是使用yolo3，4都是一样的过程，例如使用yolo3去训练的时候，使用参数tran来训练，darknet的好处是可以使用opencv直接来进行模型推理，但是在训练过程中，我们经常会遇到以下这些单词，1Epoch... 查看详情

机器学习之深度学习二分类多分类多标签分类多任务分类(代码片段)

多任务学习可以运用到许多的场景。首先，多任务学习可以学到多个任务的共享表示，这个共享表示具有较强的抽象能力，能够适应多个不同但相关的目标，通常可以使主任务获取更好的泛化能力。此外，由... 查看详情

深度学习之自然语言处理bert(代码片段)

...分析、机器客服、语音识别、机器翻译等。transformer这一深度网络架构在NLP领域占有举足轻重的地位，BERT是基于transformer的自然语言模型，相比于同样基于transformer的GTP3自然语言模型，tra 查看详情

深度学习之python3基础上(代码片段)

深度学习之初识篇——小白也能跑通的深度学习万能框架交通标识牌检测(代码片段)

目录环境下载；点击即可数据集下载；点击即可深度学习环境配置点击下载深度学习环境数据集准备使用自己标注的数据集使用标注软件数据准备VOC标签格式转yolo格式并划分训练集和测试集部署和训练深度学习项目克隆... 查看详情