我用YOLOv5做情感識別！

格物钛Graviti發表於2022-03-30

原文網址 : https://www.jiqizhixin.com/articles/2022-01-24-2

作者：陳信達，上海科技大學，Datawhale成員

AI技術已經應用到了我們生活中的方方面面，而目標檢測是其中應用最廣泛的演算法之一，疫情測溫儀器、巡檢機器人、甚至何同學的airdesk中都有目標檢測演算法的影子。下圖就是airdesk，何同學透過目標檢測演算法定位手機位置，然後控制無線充電線圈移動到手機下方自動給手機充電。這看似簡單的應用背後其實是複雜的理論和不斷迭代的AI演算法，今天筆者就教大家如何快速上手目標檢測模型YOLOv5，並將其應用到情感識別中。

我用YOLOv5做情感識別！

一、背景

今天的內容來源於2019年發表在T-PAMI上的一篇文章[1]，在這之前已經有大量研究者透過AI演算法識別人類情感，不過本文的作者認為，人們的情感不僅與面部表情和身體動作等有關，還和當前身處的環境息息相關，比如下圖的男孩應該是一個驚訝的表情：

我用YOLOv5做情感識別！不過加上週圍環境後，剛剛我們認為的情感就與真實情感不符：

我用YOLOv5做情感識別！

本文的主要思想就是將背景圖片和目標檢測模型檢測出的人物資訊結合起來識別情感。

其中，作者將情感分為離散和連續兩個維度。下面會解釋以方便理解，已經清楚的同學可以快劃跳過。

連續情感	解釋
Valence (V)	measures how positive or pleasant an emotion is, ranging from negative to positive（高興程度）
Arousal (A)	measures the agitation level of the person, ranging from non-active / in calm to agitated / ready to act（激動程度）
Dominance (D)	measures the level of control a person feels of the situation, ranging from submissive / non-control to dominant / in-control（氣場大小）

離散情感	解釋
Affection	fond feelings; love; tenderness
Anger	intense displeasure or rage; furious; resentful
Annoyance	bothered by something or someone; irritated; impatient; frustrated
Anticipation	state of looking forward; hoping on or getting prepared for possible future events
Aversion	feeling disgust, dislike, repulsion; feeling hate
Confidence	feeling of being certain; conviction that an outcome will be favorable; encouraged; proud
Disapproval	feeling that something is wrong or reprehensible; contempt; hostile
Disconnection	feeling not interested in the main event of the surrounding; indifferent; bored; distracted
Disquietment	nervous; worried; upset; anxious; tense; pressured; alarmed
Doubt/Confusion	difficulty to understand or decide; thinking about different options
Embarrassment	feeling ashamed or guilty
Engagement	paying attention to something; absorbed into something; curious; interested
Esteem	feelings of favourable opinion or judgement; respect; admiration; gratefulness
Excitement	feeling enthusiasm; stimulated; energetic
Fatigue	weariness; tiredness; sleepy
Fear	feeling suspicious or afraid of danger, threat, evil or pain; horror
Happiness	feeling delighted; feeling enjoyment or amusement
Pain	physical suffering
Peace	well being and relaxed; no worry; having positive thoughts or sensations; satisfied
Pleasure	feeling of delight in the senses
Sadness	feeling unhappy, sorrow, disappointed, or discouraged
Sensitivity	feeling of being physically or emotionally wounded; feeling delicate or vulnerable
Suffering	psychological or emotional pain; distressed; anguished
Surprise	sudden discovery of something unexpected
Sympathy	state of sharing others emotions, goals or troubles; supportive; compassionate
Yearning	strong desire to have something; jealous; envious; lust

二、準備工作與模型推理

2.1 快速入門

只需完成下面五步即可識別情感！

1.透過克隆或者壓縮包將專案下載到本地：git clone https://github.com/chenxindaaa/emotic.git

2.將解壓後的模型檔案放到emotic/debug_exp/models中。（模型檔案下載地址：連結：https://gas.graviti.com/dataset/datawhale/Emotic/discussion）

3.新建虛擬環境（可選）：

conda create -n emotic python=3.7
conda activate emotic

4.環境配置

python -m pip install -r requirement.txt

5.cd到emotic資料夾下，輸入並執行:

python detect.py

執行完後結果會儲存在emotic/runs/detect資料夾下。

2.2 基本原理

看到這裡可能會有小夥伴問了：如果我想識別別的圖片該怎麼改？可以支援影片和攝像頭嗎？實際應用中應該怎麼修改YOLOv5的程式碼呢？

對於前兩個問題，YOLOv5已經幫我們解決，我們只需要修改detect.py中的第158行：

parser.add_argument('--source', type=str, default='./testImages', help='source')  # file/folder, 0 for webcam

將'./testImages'改為想要識別的影像和影片的路徑，也可以是資料夾的路徑。對於呼叫攝像頭，只需要將'./testImages'改為'0'，則會呼叫0號攝像頭進行識別。

修改YOLOv5：

在detect.py中，最重要的程式碼就是下面幾行：

for *xyxy, conf, cls in reversed(det):
    c = int(cls)  # integer class
    if c != 0:
        continue
    pred_cat, pred_cont = inference_emotic(im0, (int(xyxy[0]), int(xyxy[1]), int(xyxy[2]), int(xyxy[3])))
    if save_img or opt.save_crop or view_img:  # Add bbox to image
        label = None if opt.hide_labels else (names[c] if opt.hide_conf else f'{names[c]} {conf:.2f}')
        plot_one_box(xyxy, im0, pred_cat=pred_cat, pred_cont=pred_cont, label=label, color=colors(c, True), line_thickness=opt.line_thickness)
        if opt.save_crop:
            save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

其中det是YOLOv5識別出來的結果，例如tensor([[121.00000, 7.00000, 480.00000, 305.00000, 0.67680, 0.00000], [278.00000, 166.00000, 318.00000, 305.00000, 0.66222, 27.00000]])就是識別出了兩個物體。

xyxy是物體檢測框的座標，對於上面的例子的第一個物體，xyxy = [121.00000, 7.00000, 480.00000, 305.00000]對應座標(121, 7)和(480, 305)，兩個點可以確定一個矩形也就是檢測框。conf是該物體的置信度，第一個物體置信度為0.67680。cls則是該物體對應的類別，這裡0對應的是“人”，因為我們只識別人的情感，所以cls不是0就可以跳過該過程。這裡我用了YOLOv5官方給的推理模型，其中包含很多類別，大家也可以自己訓練一個只有“人”這一類別的模型，詳細過程可以參考:

在識別出物體座標後輸入emotic模型就可以得到對應的情感，即

pred_cat, pred_cont = inference_emotic(im0, (int(xyxy[0]), int(xyxy[1]), int(xyxy[2]), int(xyxy[3])))

這裡我將原來的圖片視覺化做了些改變，將emotic的結果列印到圖片上：

def plot_one_box(x, im, pred_cat, pred_cont, color=(128, 128, 128), label=None, line_thickness=3):
    # Plots one bounding box on image 'im' using OpenCV
    assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to plot_on_box() input image.'
    tl = line_thickness or round(0.002 * (im.shape[0] + im.shape[1]) / 2) + 1  # line/font thickness
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(im, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(im, c1, c2, color, -1, cv2.LINE_AA)  # filled
        #cv2.putText(im, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
        for id, text in enumerate(pred_cat):
            cv2.putText(im, text, (c1[0], c1[1] + id*20), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

執行結果：

我用YOLOv5做情感識別！

完成了上面的步驟，我們就可以開始整活了。眾所周知，川普以其獨特的演講魅力征服了許多選民，下面我們就看看AI眼中的川普是怎麼演講的：

我用YOLOv5做情感識別！可以看出自信是讓人信服的必備條件之一。

三、模型訓練

3.1 資料預處理

首先透過格物鈦進行資料預處理，在處理資料之前需要先找到自己的accessKey(開發者工具AccessKey新建AccessKey)：

我用YOLOv5做情感識別！

我們可以在不下載資料集的情況下，透過格物鈦進行預處理，並將結果儲存在本地（下面的程式碼不在專案中，需要自己建立一個py檔案執行，記得填入AccessKey）：

from tensorbay import GAS
from tensorbay.dataset import Dataset
import numpy as np
from PIL import Image
import cv2
from tqdm import tqdm
import os

def cat_to_one_hot(y_cat):
    cat2ind = {'Affection': 0, 'Anger': 1, 'Annoyance': 2, 'Anticipation': 3, 'Aversion': 4,
               'Confidence': 5, 'Disapproval': 6, 'Disconnection': 7, 'Disquietment': 8,
               'Doubt/Confusion': 9, 'Embarrassment': 10, 'Engagement': 11, 'Esteem': 12,
               'Excitement': 13, 'Fatigue': 14, 'Fear': 15, 'Happiness': 16, 'Pain': 17,
               'Peace': 18, 'Pleasure': 19, 'Sadness': 20, 'Sensitivity': 21, 'Suffering': 22,
               'Surprise': 23, 'Sympathy': 24, 'Yearning': 25}
    one_hot_cat = np.zeros(26)
    for em in y_cat:
        one_hot_cat[cat2ind[em]] = 1
    return one_hot_cat

gas = GAS('填入你的AccessKey')
dataset = Dataset("Emotic", gas)
segments = dataset.keys()
save_dir = './data/emotic_pre'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
for seg in ['test', 'val', 'train']:
    segment = dataset[seg]
    context_arr, body_arr, cat_arr, cont_arr = [], [], [], []
    for data in tqdm(segment):
        with data.open() as fp:
            context = np.asarray(Image.open(fp))
        if len(context.shape) == 2:
            context = cv2.cvtColor(context, cv2.COLOR_GRAY2RGB)
        context_cv = cv2.resize(context, (224, 224))
        for label_box2d in data.label.box2d:
            xmin = label_box2d.xmin
            ymin = label_box2d.ymin
            xmax = label_box2d.xmax
            ymax = label_box2d.ymax
            body = context[ymin:ymax, xmin:xmax]
            body_cv = cv2.resize(body, (128, 128))
            context_arr.append(context_cv)
            body_arr.append(body_cv)
            cont_arr.append(np.array([int(label_box2d.attributes['valence']), int(label_box2d.attributes['arousal']), int(label_box2d.attributes['dominance'])]))
            cat_arr.append(np.array(cat_to_one_hot(label_box2d.attributes['categories'])))
    context_arr = np.array(context_arr)
    body_arr = np.array(body_arr)
    cat_arr = np.array(cat_arr)
    cont_arr = np.array(cont_arr)
    np.save(os.path.join(save_dir, '%s_context_arr.npy' % (seg)), context_arr)
    np.save(os.path.join(save_dir, '%s_body_arr.npy' % (seg)), body_arr)
    np.save(os.path.join(save_dir, '%s_cat_arr.npy' % (seg)), cat_arr)
    np.save(os.path.join(save_dir, '%s_cont_arr.npy' % (seg)), cont_arr)

等程式執行完成後可以看到多了一個資料夾emotic_pre，裡面有一些npy檔案則代表資料預處理成功。

3.2 模型訓練

開啟main.py檔案，35行開始是模型的訓練引數，執行該檔案即可開始訓練。

四、Emotic模型詳解

4.1 模型結構

我用YOLOv5做情感識別！

該模型的思想非常簡單，流程圖中的上下兩個網路其實就是兩個resnet18，上面的網路負責提取人體特徵，輸入為128×128的彩色圖片，輸出是512個1×1的特徵圖。下面的網路負責提取影像背景特徵，預訓練模型用的是場景分類模型places365，輸入是224×224的彩色圖片，輸出同樣是是512個1×1的特徵圖。然後將兩個輸出flatten後拼接成一個1024的向量，經過兩層全連線層後輸出一個26維的向量和一個3維的向量，26維向量處理26個離散感情的分類任務，3維向量則是3個連續情感的迴歸任務。

import torch 
import torch.nn as nn 

class Emotic(nn.Module):
  ''' Emotic Model'''
  def __init__(self, num_context_features, num_body_features):
    super(Emotic,self).__init__()
    self.num_context_features = num_context_features
    self.num_body_features = num_body_features
    self.fc1 = nn.Linear((self.num_context_features + num_body_features), 256)
    self.bn1 = nn.BatchNorm1d(256)
    self.d1 = nn.Dropout(p=0.5)
    self.fc_cat = nn.Linear(256, 26)
    self.fc_cont = nn.Linear(256, 3)
    self.relu = nn.ReLU()

    
  def forward(self, x_context, x_body):
    context_features = x_context.view(-1, self.num_context_features)
    body_features = x_body.view(-1, self.num_body_features)
    fuse_features = torch.cat((context_features, body_features), 1)
    fuse_out = self.fc1(fuse_features)
    fuse_out = self.bn1(fuse_out)
    fuse_out = self.relu(fuse_out)
    fuse_out = self.d1(fuse_out)    
    cat_out = self.fc_cat(fuse_out)
    cont_out = self.fc_cont(fuse_out)
    return cat_out, cont_out

離散感情是一個多分類任務，即一個人可能同時存在多種感情，作者的處理方法是手動設定26個閾值對應26種情感，輸出值大於閾值就認為該人有對應情感，閾值如下，可以看到engagement對應閾值為0，也就是說每個人每次識別都會包含這種情感：

>>> import numpy as np
>>> np.load('./debug_exp/results/val_thresholds.npy')
array([0.0509765 , 0.02937193, 0.03467856, 0.16765128, 0.0307672 ,
       0.13506265, 0.03581731, 0.06581657, 0.03092133, 0.04115443,
       0.02678059, 0.        , 0.04085711, 0.14374524, 0.03058549,
       0.02580678, 0.23389584, 0.13780132, 0.07401864, 0.08617007,
       0.03372583, 0.03105414, 0.029326  , 0.03418647, 0.03770866,
       0.03943525], dtype=float32)

4.2 損失函式：

對於分類任務，作者提供了兩種損失函式，一種是普通的均方誤差損失函式（即self.weight_type == 'mean'），另一種是加權平方誤差損失函式（即self.weight_type == 'static‘）。其中，加權平方誤差損失函式如下，26個類別對應的權重分別為[0.1435, 0.1870, 0.1692, 0.1165, 0.1949, 0.1204, 0.1728, 0.1372, 0.1620, 0.1540, 0.1987, 0.1057, 0.1482, 0.1192, 0.1590, 0.1929, 0.1158, 0.1907, 0.1345, 0.1307, 0.1665, 0.1698, 0.1797, 0.1657, 0.1520, 0.1537]。

class DiscreteLoss(nn.Module):
  ''' Class to measure loss between categorical emotion predictions and labels.'''
  def __init__(self, weight_type='mean', device=torch.device('cpu')):
    super(DiscreteLoss, self).__init__()
    self.weight_type = weight_type
    self.device = device
    if self.weight_type == 'mean':
      self.weights = torch.ones((1,26))/26.0
      self.weights = self.weights.to(self.device)
    elif self.weight_type == 'static':
      self.weights = torch.FloatTensor([0.1435, 0.1870, 0.1692, 0.1165, 0.1949, 0.1204, 0.1728, 0.1372, 0.1620,
         0.1540, 0.1987, 0.1057, 0.1482, 0.1192, 0.1590, 0.1929, 0.1158, 0.1907,
         0.1345, 0.1307, 0.1665, 0.1698, 0.1797, 0.1657, 0.1520, 0.1537]).unsqueeze(0)
      self.weights = self.weights.to(self.device)
    
  def forward(self, pred, target):
    if self.weight_type == 'dynamic':
      self.weights = self.prepare_dynamic_weights(target)
      self.weights = self.weights.to(self.device)
    loss = (((pred - target)**2) * self.weights)
    return loss.sum() 

  def prepare_dynamic_weights(self, target):
    target_stats = torch.sum(target, dim=0).float().unsqueeze(dim=0).cpu()
    weights = torch.zeros((1,26))
    weights[target_stats != 0 ] = 1.0/torch.log(target_stats[target_stats != 0].data + 1.2)
    weights[target_stats == 0] = 0.0001
    return weights

對於迴歸任務，作者同樣提供了兩種損失函式，L2損失函式：

我用YOLOv5做情感識別！

class ContinuousLoss_L2(nn.Module):
  ''' Class to measure loss between continuous emotion dimension predictions and labels. Using l2 loss as base. '''
  def __init__(self, margin=1):
    super(ContinuousLoss_L2, self).__init__()
    self.margin = margin
  
  def forward(self, pred, target):
    labs = torch.abs(pred - target)
    loss = labs ** 2 
    loss[ (labs < self.margin) ] = 0.0
    return loss.sum()


class ContinuousLoss_SL1(nn.Module):
  ''' Class to measure loss between continuous emotion dimension predictions and labels. Using smooth l1 loss as base. '''
  def __init__(self, margin=1):
    super(ContinuousLoss_SL1, self).__init__()
    self.margin = margin
  
  def forward(self, pred, target):
    labs = torch.abs(pred - target)
    loss = 0.5 * (labs ** 2)
    loss[ (labs > self.margin) ] = labs[ (labs > self.margin) ] - 0.5
    return loss.sum()

資料集連結：https://gas.graviti.com/dataset/datawhale/Emotic

[1]Kosti R, Alvarez J M, Recasens A, et al. Context based emotion recognition using emotic dataset[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(11): 2755-2766.

YOLOv5專案地址：https://github.com/ultralytics/yolov5

Emotic專案地址：https://github.com/Tandon-A/emotic

更多資訊請訪問格物鈦智慧科技官網

語音情感識別--RNN
2021-09-09
RNN
【YOLOv5】實現撲克牌的點數識別
2023-12-14
YOLO
如何用Python做情感分析？
2018-06-28
Python
電梯電車識別 yolov5 yolov4 yolov3
2020-11-22
YOLO
YOLOV5 模型和程式碼修改——針對小目標識別
2022-04-26
YOLO模型
【YOLOv5】手把手教你使用LabVIEW ONNX Runtime部署 TensorRT加速，實現YOLOv5實時物體識別
2022-11-24
YOLOView
C#實現控制檯傳參呼叫YoloV5進行人體識別
2024-07-08
C#YOLO
利用LSTM做語言情感分類
2018-09-08
【YOLOv5】手把手教你使用LabVIEW ONNX Runtime部署 TensorRT加速，實現YOLOv5實時物體識別（含原始碼）
2022-10-31
YOLOView原始碼
Python 做圖片清晰度識別
2018-08-17
Python
基於知識引入的情感分析
2020-10-21
分手後複合情感挽回需要怎麼做？
2020-12-14
搞定實體識別、關係抽取、事件抽取，我用指標網路
2022-12-06
事件指標
我們用代理IP可以做什麼？
2020-10-29
【YOLOv5】LabVIEW+YOLOv5快速實現實時物體識別（Object Detection）含原始碼
2022-11-24
YOLOViewObject原始碼
利用opencv 做一個簡單的人臉識別
2022-07-28
OpenCV
Java如何使用Tessdata做OCR圖片文字識別
2021-07-28
Java
文字識別解決方案-OCR識別應用場景解析
2024-10-15
肢體識別與應用
2021-09-29
人臉識別和手勢識別應用（face++）開發
2020-06-18
論文筆記：語音情感識別（五）語音特徵集之eGeMAPS，ComParE，09IS，BoAW
2018-12-22
筆記特徵
AI從入門到放棄：CNN的導火索，用MLP做影象分類識別？
2018-06-21
AICNN
AI從入門到放棄：CNN的導火索，用MLP做影像分類識別？
2018-06-21
AICNN
用飛槳做命名實體識別，手把手教你實現經典模型 BiGRU + CRF
2019-09-23
模型CRF
我用ABAP做過的那些無聊的事情
2018-10-14
我國開始部署步態識別技術
2018-11-08
螢幕可以截圖識別文字？我來教你
2019-06-21
人臉識別技術應用
2020-05-11
阿里巴巴論文提出針對影視作品的語音情感識別資訊融合框架
2018-04-10
阿里框架
技術實操丨SoundNet遷移學習之由聲音分類到語音情感識別
2021-09-11
遷移學習
YOLOv5系列（3）——YOLOv5修改網路結構
2022-03-01
YOLO
別做空想家！學好PyTorch，你的物件識別專案穩了
2018-11-30
PyTorch物件
INTERSPEECH2020 語音情感分析論文之我見
2021-04-01
「NLP-NER」如何使用BERT來做命名實體識別
2019-09-29
用機器學習實現情感分析
2021-09-09
機器學習
用Python實現類FaceID的人臉識別？一文告訴你該怎麼做
2018-03-15
Python
人臉識別與人體動作識別技術及應用pdf
2018-08-23
利用指紋識別或面部識別，為應用新增私密保護功能
2018-04-22

我用YOLOv5做情感識別！

一、背景

二、準備工作與模型推理

2.1 快速入門

2.2 基本原理

三、模型訓練

3.1 資料預處理

3.2 模型訓練

四、Emotic模型詳解

4.1 模型結構

4.2 損失函式：

相關文章