SSD-Tensorflow超詳細解析【一】:載入模型對圖片進行測試

SSD-Tensorflow超詳細解析【一】:載入模型對圖片進行測試

SSD-tensorflow——github下載地址:SSD-Tensorflow

目標檢測的塊速實現

下載完成之後我們開啟工程,可以看到如下圖所示的檔案佈局:

首先我們開啟checkpoints檔案,解壓縮ssd_300_vgg.ckpt.zip檔案到checkpoints目錄下面。注意:解壓到checkpoints資料夾下即可,不要有子資料夾。

然後開啟notebooks的ssd_tests.ipyb檔案(使用Jupyter或者ipython)。為了方便除錯,將其儲存為.py檔案,或者直接在notebooks下新建一個test.py檔案,然後複製如下程式碼:

# coding: utf-8
# In[1]:
import os
import math
import random
import numpy as np
import tensorflow as tf
import cv2
slim = tf.contrib.slim
# In[2]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# In[3]:
import sys
sys.path.append('./')
# In[4]:
from nets import ssd_vgg_300, ssd_common, np_methods
from preprocessing import ssd_vgg_preprocessing
from notebooks import visualization
# In[5]:
# TensorFlow session: grow memory when needed. TF, DO NOT USE ALL MY GPU MEMORY!!!
gpu_options = tf.GPUOptions(allow_growth = True)
config = tf.ConfigProto(log_device_placement = False, gpu_options = gpu_options)
isess = tf.InteractiveSession(config = config)
# ## SSD 300 Model
#
# The SSD 300 network takes 300x300 image inputs. In order to feed any image, the latter is resize to this input shape (i.e.`Resize.WARP_RESIZE`). Note that even though it may change the ratio width / height, the SSD model performs well on resized images (and it is the default behaviour in the original Caffe implementation).
#
# SSD anchors correspond to the default bounding boxes encoded in the network. The SSD net output provides offset on the coordinates and dimensions of these anchors.
# In[6]:
# Input placeholder.
net_shape = (300, 300)
data_format = 'NHWC'
img_input = tf.placeholder(tf.uint8, shape = (None, None, 3))
# Evaluation pre-processing: resize to SSD net shape.
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
img_input, None, None, net_shape, data_format, resize = ssd_vgg_preprocessing.Resize.WARP_RESIZE)
image_4d = tf.expand_dims(image_pre, 0)
# Define the SSD model.
reuse = True if 'ssd_net' in locals() else None
ssd_net = ssd_vgg_300.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format = data_format)):
predictions, localisations, _, _ = ssd_net.net(image_4d, is_training = False, reuse = reuse)
# Restore SSD model.
ckpt_filename = 'D:\py project\SSD-Tensorflow\checkpoints/ssd_300_vgg.ckpt'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)
# SSD default anchor boxes.
ssd_anchors = ssd_net.anchors(net_shape)
# ## Post-processing pipeline
#
# The SSD outputs need to be post-processed to provide proper detections. Namely, we follow these common steps:
#
# * Select boxes above a classification threshold;
# * Clip boxes to the image shape;
# * Apply the Non-Maximum-Selection algorithm: fuse together boxes whose Jaccard score > threshold;
# * If necessary, resize bounding boxes to original image shape.
# In[7]:
# Main image processing routine.
def process_image(img, select_threshold = 0.5, nms_threshold = .45, net_shape = (300, 300)):
# Run SSD network.
rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
feed_dict = {img_input: img})
# Get classes and bboxes from the net outputs.
rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
rpredictions, rlocalisations, ssd_anchors,
select_threshold = select_threshold, img_shape = net_shape, num_classes = 21, decode = True)
rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k = 400)
rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold = nms_threshold)
# Resize bboxes to original image shape. Note: useless for Resize.WARP!
rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
return rclasses, rscores, rbboxes
# In[21]:
# Test on some demo image and visualize output.
path = 'D:\py project\SSD-Tensorflow\demo/'
image_names = sorted(os.listdir(path))
img = mpimg.imread(path   image_names[-1])
rclasses, rscores, rbboxes = process_image(img)
# visualization.bboxes_draw_on_img(img, rclasses, rscores, rbboxes, visualization.colors_plasma)
visualization.plt_bboxes(img, rclasses, rscores, rbboxes)

修改程式碼中的ckpt_filename為你的工程目錄下的chepoints下的ckpt檔案。path為你想要進行測試的圖片目錄(程式碼中只對該目錄下的最後一個資料夾進行測試,如果要想測試多幅圖片或者做成視訊的方式,需大家自行修改程式碼)。

SSD的網路結構

理論上來講,現在程式碼已經可以跑了。但是我們還是需要學習一下它是如何實現SSD這一目標檢測過程的。

我們開啟nets中的ssd_vgg_300.py檔案,裡面就是整個基於vgg的ssd目標檢測網路。其網路結構設計參照下圖:

下圖為程式碼的一個實現,

這個裡面實現的是input-conv-conv-block1-pool-conv-conv-block2-pool-conv-conv-conv-block3-pool-conv-conv-conv-block4-pool-conv-conv-conv-block5-pool-conv-block6-dropout-conv-block7-dropout-conv-conv-block8-conv-conv-block9-conv-conv-block11。

每一層的feature maps都是存到了end_points這個dic裡面,這是整個基本網路結構的設計。再看一下SSDParams裡面的定義:

    default_params = SSDParams(
img_shape=(300, 300),    #輸入的size
num_classes=21,        # 類數
no_annotation_label=21,
feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],    #需要抽取並做額外卷積來做目標檢測的層
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],            #對應上面的層的feature maps的size大小
anchor_size_bounds=[0.15, 0.90],        
# anchor_size_bounds=[0.20, 0.90],
anchor_sizes=[(21., 45.),       # anchor size,定義的default box的size,之後再計算anchor的時候用到
(45., 99.),
(99., 153.),
(153., 207.),
(207., 261.),
(261., 315.)],
# anchor_sizes=[(30., 60.),
#               (60., 111.),
#               (111., 162.),
#               (162., 213.),
#               (213., 264.),
#               (264., 315.)],
anchor_ratios=[[2, .5],            # 長寬比,即論文裡面提到的 ratios(2,1/2,3,1/3)
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5],
[2, .5]],
anchor_steps=[8, 16, 32, 64, 100, 300],    #caffe實現的時候使用的初始化anchor方法,後面會講到
anchor_offset=0.5,        #偏移,計算中心點時用到
normalizations=[20, -1, -1, -1, -1, -1],
prior_scaling=[0.1, 0.1, 0.2, 0.2]
)

其中feat-layers裡面儲存的是我們在目標檢測時需要用到的feature層,一共有6個層,其size對應feat_shapes裡面的數值。

我們再繼續看SSD是如何預測目標的類別以及框的位置的,下圖接SSD網路結構設計部分:

這裡的一個for迴圈依次計算feat_layers裡面的類別與位置——從block4、7、8、9、10、11,然後append到相應的陣列裡面。對於裡面的細節,我們轉到ssd_multibox_layer,如下程式碼所示:

def ssd_multibox_layer(inputs,
num_classes,
sizes,
ratios=[1],
normalization=-1,
bn_normalization=False):
"""Construct a multibox layer, return a class and localization predictions.
"""
net = inputs
if normalization > 0:
net = custom_layers.l2_normalization(net, scaling=True)    #對特徵層做l2的標準化處理
# Number of anchors.
num_anchors = len(sizes)   len(ratios)    #計算default box的數量,分別為 4 6 6 6 4 4
# Location.
num_loc_pred = num_anchors * 4        #預測的位置資訊= 4*num_anchors , 即 ymin,xmin,ymax,xmax.
loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,    #對特徵層做卷積,輸出channels為4*num_anchors
scope='conv_loc')
loc_pred = custom_layers.channel_to_last(loc_pred)  # ensure data format be "NWHC"
loc_pred = tf.reshape(loc_pred,
tensor_shape(loc_pred, 4)[:-1] [num_anchors, 4])    #將得到的feature maps reshape為[N,W,H,num_anchors,4]
# Class prediction.
num_cls_pred = num_anchors * num_classes        # 預測的類別資訊 21*4=84
cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,    
scope='conv_cls')
cls_pred = custom_layers.channel_to_last(cls_pred)
cls_pred = tf.reshape(cls_pred,
tensor_shape(cls_pred, 4)[:-1] [num_anchors, num_classes]) # reshape 為[N,W,H,num_anchors,classes]
return cls_pred, loc_pred

該段程式碼實現對指定feature layers的位置預測以及類別預測。首先計算anchors的數量,對於位置資訊,輸出16通道的feature map,將其reshape為[N,W,H,num_anchors,4]。對於類別資訊,輸出84通道的feature maps,再將其reshape為[N,W,H,num_anchors,num_classes]。返回計算得到的位置和類別預測。

anchor box的初始化

初始過程的思想大致如此:設定好相應的anchor size,ratios(長寬比),offset(偏移)等,對於每一層feature layers,計算出相對的anchor size,論文中如右所示:。smin=0.2。300*0.2=60,而SSDNet中定義的anchor size 為(21., 45.),(45., 99.),(99., 153.), (153., 207.),(207., 261.),(261., 315.)]。可見其大小為自己設定的,並非由公式得到的。

程式碼中的實現,以第一層feat_layer(block4,38×38)為例:對於y,x各生成一個相對應的網格矩陣,矩陣數值如下:

然後對其做標準化處理,比較簡單的方法為(y offset)/feat_shape,以第一層為例,即(y 0.5)/38可得到歸一化的座標,歸一化的陣列如下:

然後定義anchors的寬度和高度,其寬度和高度分別對應論文中,以及當ratios=1時對應。其中ar即為定義好的anchor_ratios。以第一層為例,共有4個anchors,其中第一個的h和w為 21/300。第二個h和w為

,對應公式

第三個和第四個為,分別對應寬度和高度的計算公式。

其初始化主要由下面三個函式完成,其呼叫關係為從上到下:

def ssd_anchor_one_layer(img_shape,
feat_shape,
sizes,
ratios,
step,
offset=0.5,
dtype=np.float32):
"""Computer SSD default anchor boxes for one feature layer.
Determine the relative position grid of the centers, and the relative
width and height.
Arguments:
feat_shape: Feature shape, used for computing relative position grids;
size: Absolute reference sizes;
ratios: Ratios to use on these features;
img_shape: Image shape, used for computing height, width relatively to the
former;
offset: Grid offset.
Return:
y, x, h, w: Relative x and y grids, and height and width.
"""
# Compute the position grid: simple way.                    #計算中心點的歸一化距離
# y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
# y = (y.astype(dtype)   offset) / feat_shape[0]
# x = (x.astype(dtype)   offset) / feat_shape[1]
# Weird SSD-Caffe computation using steps values...
y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]       #生成一個網格矩陣 y,x均為[38,38],其中y為從上到下從0到37。y為從左到右0到37
y = (y.astype(dtype)   offset) * step / img_shape[0]    #以第一個元素為例,簡單方法為:(0 0.5)/38即為相對距離,而SSD-Caffe使用的是(0 0.5)*step/img_shape
x = (x.astype(dtype)   offset) * step / img_shape[1]
# Expand dims to support easy broadcasting.
y = np.expand_dims(y, axis=-1)      # [38,38,1]
x = np.expand_dims(x, axis=-1)
# Compute relative height and width.
# Tries to follow the original implementation of SSD for the order.
num_anchors = len(sizes)   len(ratios)
h = np.zeros((num_anchors, ), dtype=dtype)
w = np.zeros((num_anchors, ), dtype=dtype)
# Add first anchor boxes with ratio=1.
h[0] = sizes[0] / img_shape[0]
w[0] = sizes[0] / img_shape[1]
di = 1
if len(sizes) > 1:
h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
di  = 1
for i, r in enumerate(ratios):
h[i di] = sizes[0] / img_shape[0] / math.sqrt(r)
w[i di] = sizes[0] / img_shape[1] * math.sqrt(r)
return y, x, h, w

最終返回y,x,h,w的數值資訊。

邊界框的選取


上面講了SSDNet的網路結構以及anchor_bbox的初始化,通過上述方法可以得到初步的類別與位置預測。這裡我們進入到np_methods.ssd_bboxes_select函式裡面,檢視SSD是如何選擇對應的bboxes的。

其函式如下(函式呼叫關係為從上往下):

def ssd_bboxes_select_layer(predictions_layer,
localizations_layer,
anchors_layer,
select_threshold=0.5,
img_shape=(300, 300),
num_classes=21,
decode=True):
"""Extract classes, scores and bounding boxes from features in one layer.
Return:
classes, scores, bboxes: Numpy arrays...
"""
# First decode localizations features if necessary.
if decode:
localizations_layer = ssd_bboxes_decode(localizations_layer, anchors_layer)    #解碼成ymin,xmin,ymax,xmax的形式
# Reshape features to: Batches x N x N_labels | 4.
p_shape = predictions_layer.shape    #以第一層為例,這裡的值為(1,38,38,4,21)單幅圖片
batch_size = p_shape[0] if len(p_shape) == 5 else 1 
predictions_layer = np.reshape(predictions_layer,
(batch_size, -1, p_shape[-1]))    #reshape之後為(1,5776,21) 
l_shape = localizations_layer.shape    #(1,38,38,4,4) 為邊界框的座標資訊
localizations_layer = np.reshape(localizations_layer,
(batch_size, -1, l_shape[-1]))    #reshape之後為(1,5776,4)
# Boxes selection: use threshold or score > no-label criteria.
if select_threshold is None or select_threshold == 0:
# Class prediction and scores: assign 0. to 0-class
classes = np.argmax(predictions_layer, axis=2)
scores = np.amax(predictions_layer, axis=2)
mask = (classes > 0)
classes = classes[mask]
scores = scores[mask]
bboxes = localizations_layer[mask]
else:
sub_predictions = predictions_layer[:, :, 1:]        #去掉背景類別
idxes = np.where(sub_predictions > select_threshold)    #得到預測的類別中閾值大於select_threshold的索引值。
classes = idxes[-1] 1    #因為去掉了背景類別,全體的類別索引減了1,所以該處的類需要 1
scores = sub_predictions[idxes]    #得到所有已選定(大於給定閾值)的類別的分數。這裡的話陣列維度應該會大幅減少
bboxes = localizations_layer[idxes[:-1]] #得到所有已選定的類的座標值
return classes, scores, bboxes

對於每一層的feat_layer與anchor boxes,需要先對其decode,返回邊界框的ymin,xmin,ymax,xmax。再根據給定的閾值選擇大於其值的類和邊界框。並appen到一個陣列中返回。

邊界框的裁剪

此時得到的rbboxes陣列中的部分值是大於1的,而在我們的歸一化表示中,整個框的寬度和高度都是等於1的,因此需要對其進行裁剪,保證最大值不超過1,最小值不小於0。其程式碼如下所示:

def bboxes_clip(bbox_ref, bboxes):
"""Clip bounding boxes with respect to reference bbox.
"""
bboxes = np.copy(bboxes)
bboxes = np.transpose(bboxes)
bbox_ref = np.transpose(bbox_ref)            #其中 bbox_ref為一個[0,0,1,1]的陣列
bboxes[0] = np.maximum(bboxes[0], bbox_ref[0])
bboxes[1] = np.maximum(bboxes[1], bbox_ref[1])
bboxes[2] = np.minimum(bboxes[2], bbox_ref[2])
bboxes[3] = np.minimum(bboxes[3], bbox_ref[3])
bboxes = np.transpose(bboxes)
return bboxes

保留指定數量的邊界框

得到了裁剪之後的邊界框和類別,我們需要對其做一個降序排序,並保留指定數量的最優的那一部分。箇中引數可以自行調節,這裡只保留至多400個。其程式碼如下:

def bboxes_sort(classes, scores, bboxes, top_k=400):
"""Sort bounding boxes by decreasing order and keep only the top_k
"""
# if priority_inside:
#     inside = (bboxes[:, 0] > margin) & (bboxes[:, 1] > margin) & \
#         (bboxes[:, 2] < 1-margin) & (bboxes[:, 3] < 1-margin)
#     idxes = np.argsort(-scores)
#     inside = inside[idxes]
#     idxes = np.concatenate([idxes[inside], idxes[~inside]])
idxes = np.argsort(-scores)
classes = classes[idxes][:top_k]
scores = scores[idxes][:top_k]
bboxes = bboxes[idxes][:top_k]
return classes, scores, bboxes

非極大值抑制(NMS)

在得到了指定數量的邊界框和類別之後。對於同一個類存在多個框的情況下,要找到一個最合適的,並去掉其他冗餘的框,需要進行非極大值抑制的操作,其程式碼實現如下:

def bboxes_nms(classes, scores, bboxes, nms_threshold=0.45):
"""Apply non-maximum selection to bounding boxes.
    """
    keep_bboxes = np.ones(scores.shape, dtype=np.bool)
for i in range(scores.size-1):
if keep_bboxes[i]:
# Computer overlap with bboxes which are following.
overlap = bboxes_jaccard(bboxes[i], bboxes[(i 1):])     #計算當前框與之後所有框的IOU
# Overlap threshold for keeping   checking part of the same class
keep_overlap = np.logical_or(overlap < nms_threshold, classes[(i 1):] != classes[i]) #對於所有IOU小於0.45或者當前類別與之後類別不相同的位置,置為True
keep_bboxes[(i 1):] = np.logical_and(keep_bboxes[(i 1):], keep_overlap)         # 將上面得到的所有IOU小於0.45或類別不同的位置賦給keep_bboxes
idxes = np.where(keep_bboxes)
return classes[idxes], scores[idxes], bboxes[idxes]

其中bboxes_jaccard即計算兩個框的IOU

邊界框的Resize

程式碼中有一步是對bboxes進行resize,使其恢復到original image shape。但是由於這裡使用的是歸一化的表示方式,所以意義不大,但還是貼一下程式碼:

目標檢測的視覺化

在完成了上述所有操作之後,我們得到了最終的classes,scores,localizations。我們需要對其進行視覺化,將所有的目標類別、分數以及邊界框的座標畫在我們的原圖上,這裡由如下程式碼完成:

def plt_bboxes(img, classes, scores, bboxes, figsize=(10,10), linewidth=1.5):
"""Visualize bounding boxes. Largely inspired by SSD-MXNET!
"""
fig = plt.figure(figsize=figsize)
plt.imshow(img)
height = img.shape[0]
width = img.shape[1]
colors = dict()
for i in range(classes.shape[0]):
cls_id = int(classes[i])
if cls_id >= 0:
score = scores[i]
if cls_id not in colors:
colors[cls_id] = (random.random(), random.random(), random.random())
ymin = int(bboxes[i, 0] * height)
xmin = int(bboxes[i, 1] * width)
ymax = int(bboxes[i, 2] * height)
xmax = int(bboxes[i, 3] * width)
rect = plt.Rectangle((xmin, ymin), xmax - xmin,
ymax - ymin, fill=False,
edgecolor=colors[cls_id],
linewidth=linewidth)
plt.gca().add_patch(rect)
class_name = str(cls_id)
plt.gca().text(xmin, ymin - 2,
'{:s} | {:.3f}'.format(class_name, score),
bbox=dict(facecolor=colors[cls_id], alpha=0.5),
fontsize=12, color='white')
plt.show()

值得一提的是,因為邊界框的表示是歸一化的,所以要恢復成原圖的尺寸的話,只需要對相應的座標乘上原圖的Height和width即可。

最終的結果如下:

好了,本篇博文到此結束,下一篇講講如何利用SSD_tensorflow訓練自己的資料集。