pytorch中的損失函式
- 0.前言
- 1.Loss Function

pytorch中的損失函式

0.前言

深度學習中優化方法直接作用的物件是損失函式。損失函式表示了預測值與真實值之間的差距程度，一個最優化問題的目標是將損失函式最小化，針對分類問題，直觀的表現就是.分類的正確樣本越多越好；迴歸問題中，直觀的表現就是預測值與實際值的誤差越小越好。

損失函式(Loss Function):
$Loss=f(y^,,y)$
代價函式(Cost Fuction):
$Cost=\frac{1}{N}\sum_{i=0}^{N}f(y_{i}^, y_{i})$
Pytorch中nn模組下提供了多種可以直接使用的損失函式，如交叉熵、均方誤差等，針對不同的問題，可以直接呼叫現有的損失函式，常用的損失函式以及適合的問題如下表。

類	損失函式名稱	適應問題
torch.nn.L1Loss()	平均絕對值損失	迴歸
torch.nn.MSELoss()	均方誤差損失	迴歸
torch.nn.CrossEntropyLoss()	交叉熵損失	多分類
torch.nn.CTCLoss()
torch.nn.NLLLoss()	負數對數似然函式損失	多分類
torch.nn.KLDivLoss()	KL散度損失	迴歸
torch.nn.BCELoss()	二分類交叉熵損失	二分類
torch.nn.MarginRankingLoss	評價相似度損失
torch.nn.MultiLabelMarginLoss	多標籤分類損失	多標籤分類
torch.nn.SmoothL1Loss	平滑L1損失	迴歸
torch.nn.SoftMarginLoss	多標籤二分類損失	多標籤二分類

接下來對部分損失函式，以及pytorch框架下的api進行整理說明。

1.Loss Function

1.1 _Loss基類

在pytorch中nn模組下定義的loss的原始碼類,分別定義LOSS的類以及的帶有權重係數的類。

from .module import Module
from .. import functional as F
from .. import _reduction as _Reduction

from torch import Tensor
from typing import Optional


class _Loss(Module):
    reduction: str

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction


class _WeightedLoss(_Loss):
    def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

1.2 nn.CrossEntropyLoss

1.2.1 有關交叉熵、資訊熵、相對熵的基本概念：

使用交叉熵是為衡量兩個資料概率分佈差異，所以交叉熵制越低兩個值相差越相似。
$\text{交叉熵 = 資訊熵 + 相對熵}$
1.交叉熵
$-\sum_{i=1}^NP(x_{i})logQ(x_{i})$
2.自資訊，衡量單個事件的不確定性
$l (x) = - l o g [p (x)]$
3.熵（資訊熵），簡答講事件的越不確定性越大，熵的值越大，自信的期望
$E_{x~p}[I(x)] = -\sum_{i}^NP(x_{i})logP(x_{i})$
4.相對熵（KL散度），衡量兩個分佈之間的差異，不具備對稱性。
$D_{KL}(P,Q) = E_{x~p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})[logP(x_{i})-logQ(x_{i})]\\=\sum_{i=1}^NP(x_{i})logP(x_{i})-\sum_{i=1}^NP(x_{i})logQ(x_{i})\\=H(P,Q)-H(P)$

結合上面的公式可以得出結論: $\text{交叉熵：}H(P, Q) = D_{KL}(P,Q)+H(P)$ ,其中P代表實際樣本的資料分佈，Q代表預測結果的分佈。

1.2.2 pytorch中的交叉熵

功能：nn.LogSoftmax()與nn.NLLLoss()結合，進行交叉熵計算。本該損失函式與公式中的交叉熵損失存在區別，採用了nn.LogSoftmax對資料進行歸一化處理，即[0,1]的區間。

在官網的計算公式如下：

無權重
$class)=-log(\frac{exp(x[class])}{\sum_{j}exp(x[j])}) \\=-x[class] + log(\sum_{j}exp(x[j]))$
有權重
$log(\sum_{j}exp(x[j])))$

其中 $x$ 表示輸出的概率值， $c l a s s$ 表示類別值；
將pytorch中的定義與原始交叉熵公式 $-\sum_{i=1}^NP(x_{i})logQ(x_{i})$ 相對缺少了求和以及 $P{x_{i}}$ 。因為pytorch中是對某一個元素求交叉熵，因此不需要求和項，而且已經確定的了是哪一個元素，因此 $P{x_{i}}=1$ ,綜上pytorch中的交叉熵公式可以簡單為 $H(P,Q)=-log(Q(x_{i}))$
主要引數：


torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None,  # 各類別loss設定的權重
                        size_average=None,                          
                        ignore_index: int = -100,                   # 忽略某個類別
                        reduce=None, 
                        reduction: str = 'mean')                    # 計算模式 可以為none/sum/mean,none-逐個元素計算；sum-所有元素求和； mean-加權平均，返回標量。

通過程式碼示例對此函式中的相關引數設定進行理解

import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

# ------------
flag = 0
if flag:
    
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

    # forward
    loss_none = loss_f_none(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print(f'Cross Entropy loss: \n{loss_none, loss_sum, loss_mean}')
>>>
Cross Entropy loss: 
(tensor([1.3133, 0.1269, 0.1269]), tensor(1.5671), tensor(0.5224))

為了進一步的熟悉pytorch中CrossEntropyLoss計算過程，手動編寫了一個計算過程，程式碼如下：

##--------------compute by hand
flag = 1
if flag:
    idx = 0
    #inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
    #targets = torch.tensor([0, 1, 1], dtype=torch.long)

    inputs_1 = inputs.detach().numpy()[idx]
    targets_1 = targets.numpy()[idx]

    # 第一項
    x_class = inputs_1[targets_1]
    
    # 第二項
    sigma_exp_x = np.sum(list(map(np.exp, inputs_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)

    # 輸出loss
    loss_1 = -x_class + log_sigma_exp_x
    print('第一個樣本loss 為：',loss_1)
>>>
'''
計算的過程：取出輸入的第一個元素[1, 2] loss = x[class] + log(exp(x[j])) 此處的log表示是數學中的ln
 log(exp(x[j])) = ln(e+e^2)
 x[class] = 1
 >>>loss = ln(e+e^2) -1 
'''
   第一個樣本loss 為： 1.3132617

比較上面的那個程式碼塊的執行結果可以發現，計算結果是一致的。

1.3 nn.NLLLoss

功能：實現負對數似然函式的負號功能，計算公式
$l(x, y)=L=(l_{i},....,l_{N})^T,l_{n}=-w_{yn}x_{n,y_{n}}$
主要引數：


nn.NLLLoss(weight=None, # 各類別的loss設定的權值
    size_average=None, 
    ignore_index=-100,  # 忽略某個類別
    reduce=None,
    reduce='mean')   # 計算模式

直接通過程式碼觀察此損失函式


import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

flag = 1
if flag:
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w =nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print('\nweights:', weights)
    print('nll loss', loss_none_w, loss_sum, loss_mean)
>>>>
weights: tensor([1., 1.])
nll loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

1.4 nn.BCELoss

功能：二分類的交叉熵損失函式，注意事項，輸入值得取值範圍必須在[0, 1]
$l_{n}=-w_{n}[y_{n}*logx_{n} + (1-y_{n})*log(1-x_{n})]$

其中 $x_{n}$ 表示模型輸出的概率取值， $y_{n}$ 表示標籤值，因為是二分類任務，因此 $y_{n}$ 的取值只能是0或者1.

主要引數：

    nn.BCELoss(weight=None,  # 各類別權重
            size_average=None,
            reduce=None,
            reduction='mean' # 計算模式)

程式碼示例

flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target

    # itarget
    inputs = torch.sigmoid(inputs)
    
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none = nn.BCELoss(weights, reduction='none')
    loss_f_sum = nn.BCELoss(weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\nweights: {weights}')
    print(f'BCELoss ', loss_none_w, loss_sum, loss_mean)
    >>>>
weights: tensor([1., 1.])
BCELoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.5 nn.BCEWithLogitsLoss

功能：結合sigmoid與二分類交叉熵，注意事項，網路最後不加sigmoid函式，公式如下：
$l_{n} = -w_{n}[y_{n}*log\delta(x_{n}) + (1-y_{n})*log(1-\delta(x_{n}))]$

主要引數即示例程式碼

'''
nn.BCEWithLogitsLoss()
'''
flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([3],dtype=torch.float)
    
    loss_f_none = nn.BCEWithLogitsLoss(weights, reduction='none',pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weights, reduction='mean', pos_weight=pos_w)

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\npos_w: {pos_w}')
    print(f'BCEWithLogitsLoss ', loss_none_w, loss_sum, loss_mean)

>>>
pos_w: tensor([3.])
BCEWithLogitsLoss  tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
# 當pos_w = torch.tensor([1],dtype=torch.float),從輸出結果中可以看出正樣本的loss,乘以了3倍，模型更加關注正樣本資料
>>>>pos_w: tensor([1.])
BCEWithLogitsLoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.6 nn.L1Loss（資料迴歸）

功能：計算inputs與target之差的絕對值，公式如下：
$l_{n}=|x_{n}-y_{n}|$
主要引數以及程式碼示例

'''
nn.L1Loss(reduce='none')
'''
flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.L1Loss(reduce='none')
    loss = loss_f(inputs, target)

    print(f'input:{inputs}\ntarget:{target}\nL1Loss:{loss}')
#>>>從下面的結果，可以驗證與公式的計算結果是一致的

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1Loss:tensor([[2., 2.],
        [2., 2.]])

1.7 nn.MSELoss（資料迴歸）

功能:計算inputs與target之差的平方，公式如下
$l_{n}=(x_{n}-y_{n})^2$
主要引數以及程式碼示例：

flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.MSELoss(reduction='none')
    loss = loss_f(inputs, target)

    print(f'input:{inputs}\ntarget:{target}\nMSELoss:{loss}')
>>>>
input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
MSELoss:tensor([[4., 4.],
        [4., 4.]])
#>>>如果 nn.MSELoss(reduction='sum')
MSELoss:16.0

1.8 nn.SmoothL1Loss（資料迴歸）

功能:平滑的L1Loss，先來看一下SmoothL1Loss的計算公式：
$y)=\frac{1}{n}\sum_{i}z_{i}$
$z_{i}=\begin{cases} 0.5(x_{i}-y_{i})^2, \ \text{if}|x_{i}-y_{i}|<1 \\ |x_{i}-y_{i}|-0.5, \text{otherwise} \end{cases}$
SmoothL1Loss如圖1所示：
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片儲存下來直接上傳(img-vowr00oD-1603382011459)(./out_imgs/loss/l1_smooth_l1.png)]
主要引數以及程式碼示例：

flag = 1

if flag:
    inputs = torch.linspace(-3, 3, steps=500)
    target = torch.zeros_like(inputs)

    loss_f = nn.SmoothL1Loss(reduction='none')
    loss_smooth = loss_f(inputs, target)
    loss_l1 = np.abs(inputs.numpy())
    plt.plot(inputs.numpy(), loss_smooth.numpy(), label='smooth_l1_loss')
    plt.plot(inputs.numpy(), loss_l1, label='l1 loss')
    plt.xlabel('x_i - y_i')
    plt.ylabel('loss')
    plt.legend()
    plt.grid()
    plt.savefig('../out_imgs/loss/l1_smooth_l1.png') ##儲存的即為上圖

1.9 nn.PoissonNLLLoss

功能：泊松分佈的負數對數似然損失函式,計算公式如下：

$log_input = True l o s s ( i n p u t , t a r g e t ) = e x p ( i n p u t ) − t a r g e t ∗ i n p u t \text{log\_input = True} \\loss(input, target)=exp(input) - target * input$

$log_input = False l o s s ( i n p u t , t a r g e t ) = i n p u t − t a r g e t ∗ l o g ( i n p u t + e p s ) \text{log\_input = False} \\loss(input, target)= input- target * log(input+eps)$

相關引數以及程式碼例項如下：

'''---------------------------PoissonNLLLoss
nn.PoissonNLLLoss(log_input=True,   # 輸入是否為對數形式，決定計算公式
                full=Flase,         # 計算所有loss，預設False
                reduction='mean',
                eps=1e-8            # 修正項，避免log(輸入)為nan 
                )
'''
flag = 1
if flag:
    inputs = torch.randn((2, 2))
    target = torch.randn((2, 2))
    # 有關reduction的其它計算模式在接下來的損失示例中不在一一描述
    loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
    loss = loss_f(inputs, target)
    print('inputs :{}\ntarget is{}\nPoissonNLLLoss :{}'.format(inputs, target, loss))

#---------------compute by hand 
flag = 1
if flag:
    idx = 0
    # 當full=False時，採用的計算公式
    loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]* inputs[idx, idx]
    print('第一個元素的loss', loss_1)
#>>>> 從輸出結果可以看出，手動計算的結果與pytorch api 呼叫輸出的結果是一致的
inputs :tensor([[ 0.0553,  0.2444],
        [-0.5864,  0.1678]])
target istensor([[-1.1071, -0.4799],
        [ 1.1683, -1.4043]])
PoissonNLLLoss :tensor([[1.1180, 1.3942],
        [1.2415, 1.4185]])
第一個元素的loss tensor(1.1180)

1.10 nn.KLDivLoss

功能：計算KLD(divergence),前文介紹交叉熵也曾提到過，KLD即相對熵(計算兩個分佈的距離)。注意事項，需要提前將輸入計算log-probabilities,如通過計算nn.logsoftmax,計算公式下：
$D_{KL}(P||Q) = E_{x-p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})(logP(x_{i})-log(Q(x_i)))$
其中P表示真實資料分佈，Q表示擬合出來的資料分佈。但是實際pytorch中使用瞭如下的計算公式：
$l_{n} = y_{n}*(logy_{n}-x_{n})$
其中 $y_{n}$ 表示標籤， $x_{n}$ 模型的輸出值。
比較上面的兩個公式，一一對應來看，括號中減去的輸入資料（模型的預測值）並沒有像上式那樣進行取對數，但是從實際理論出現KL散度是比較兩個資料分佈的關係，所以依據注意事項中的內容需要對輸入的資料計算log-probabilities。
相關引數以及程式碼示例

flag =1 
if flag:
    # input tensor_size:(2, 3)，為了方便理解可以想像成全連線的最終輸出是3個神經元,2個batch的資料
    inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.2, 0.5]])  
    inputs_log = torch.log(inputs)
    target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

    loss_f_none = nn.KLDivLoss(reduction='none')
    loss_f_mean = nn.KLDivLoss(reduction='mean')
    # 根據inputs的維度的batcsize的大小為2
    loss_f_batch_mean = nn.KLDivLoss(reduction='batchmean')

    loss_none = loss_f_none(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    loss_bs_mean = loss_f_batch_mean(inputs, target)

    print('loss_none:{}\nloss_mean:{}\nloss_bs_mean:{}'.format(loss_none, loss_mean, loss_bs_mean))
#-----------------compute by hand
flag = 1
if flag:
    idx = 0
    
    # 理論上需要對後一項括號中的inputs[idx, idx]取對數，但是此處輸入值直接採用了[0,1]之間的數模擬概率值，同時也是直接模擬pytorch中所採用的計算公式。
    loss_1 = target[idx, idx]*(torch.log(target[idx, idx])-inputs[idx, idx])
    print('loss_1', loss_1)
# >>> 可以看出手動計算的第一個元素的loss與api的結果一致
loss_none:tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.3897, -0.4219]])
loss_mean:-0.3218694031238556
loss_bs_mean:-0.9656082391738892
loss_1 tensor(-0.5448)

1.11 nn.MarginRankingLoss

功能：計算兩個向量之間的相似度，用於排序任務。計算公式如下：
$loss(x, y) = max(0, -y * (x_{1}-x_{2}) + margin)$
$y$ 表示取值標籤，只能是1或者-1， $x_{1}$ 、 $x_{2}$ 表示向量的每個元素，因此可以得到以下的結論：

y = 1時，希望 $x_{1}>x_{2}$ ，當 $x_{1}>x_{2}$ 時，不會產生loss
y = -1時，希望 $x_{2}>x_{1}$ ，當 $x_{2}>x_{1}$ 時，不會產生loss

特別說明，該方法計算兩組資料之間的差異，返回一個n*n的loss矩陣。
主要引數以及程式碼示例：

flag = 1
if flag:
    x1 = torch.tensor([[1], [2], [3]],dtype=torch.float)
    x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

    target = torch.tensor([1, 1, -1], dtype=torch.float)

    loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
    
    loss =loss_f_none(x1, x2, target)
    print('MarginRankingLoss', loss)
#>>>
MarginRankingLoss tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])
'''
1.對計算結果進行一個簡單說明，輸入的是2*3的一個矩陣，利用x1矩陣中每一個元素與x2中的每個元素進行比較，每個結果就是一個輸出的loss,因此最終會生成一個3*3的輸出loss.
2.以x1中的第一個元素為例，1將於x2中的每個元素進行比較，因為target[0]=1,根據上述公式當x1>x2是loss為0，否則為x2-x1+margin(0)。逐個元素去比較，1<2,loss[0][0] = 2-1
'''

1.12 nn.MultiLabelMarginLoss（多標籤分類）

功能：多標籤邊界損失函式，對於多標籤即一張圖片對應多個類別。
如：四分類任務，樣本x屬於0類和3類，標籤[0, 3, -1, -1],不是[1, 0, 0,1]
計算公式如下：
$y)=\sum_{ij}\frac{max(0, 1-(x[y[j]]-x[i]))}{x.size(0)}$

$\text{where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等於y[j] for all i and j}$
對於公式中分子括號中的簡單理解為使用標籤神經元減去非標籤神經元，為什麼需要這樣設計，對於多標籤分類，希望是標籤的輸出大於非標籤預測輸出，因此使用 $m a x (0, 1 - (x [y [j]]) - x [i])$
主要引數以及程式碼示例：

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
    y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

    loss_f = nn.MultiLabelMarginLoss(reduction='none')
    loss = loss_f(x, y)
    print('MultiLabelMarginLoss', loss)
# ------------compute by hand
flag = 1
if flag:
    x = x[0]

    item_1 = (1-(x[0]-x[1])) + (1 - (x[0]-x[2]))
    item_2 = (1-(x[3]-x[1])) + (1-(x[3]-x[2]))
    loss_h = (item_1 + item_2) / x.shape[0]
    print('compute by hand ', loss_h)
# >>>
MultiLabelMarginLoss tensor([0.8500])
compute by hand  tensor(0.8500)

1.13 nn.SoftMarginLoss（二分類）

功能：計算二分類的logistic損失，計算公式如下：
$y)=\sum_{i}\frac{log(1+exp(-y[i] * x[i]))}{x.nelement}$
主要引數以及程式碼示例：

flag = 1
if flag:
    
    inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
    target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

    loss_f = nn.SoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)

    print('SoftMarginLoss', loss)

#-----------compute by hand
flag = 1
if flag:
    idx = 0

    inputs_i = inputs[idx, idx]
    target_i = target[idx, idx]

    loss_h = np.log(1+ np.exp(-target_i * inputs_i))
    
    print('compute by hand', loss_h)
# >>>
SoftMarginLoss tensor([[0.8544, 0.4032],
        [0.4741, 0.9741]])
compute by hand tensor(0.8544)

1.14 MultiLabelSoftMarginLoss

功能：SoftMarginLoss多標籤版本，計算公式如下：
$y)=-\frac{1}{C} * \sum_{i}y[i]*log((1+exp(-x[i]))^{-1})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])})$
$\text{C表示標籤的數量}，y[i]為標籤，x[i]表示模型的輸出值。以四分類為例，此處的y[i]必須是一個[1,0,0, 1]形式，根據公式可以看出當y[i]是標籤時，採用公式前面一項計算，否則採用後面的公式計算$
主要引數以及程式碼示例：

flag = 1
if flag:
    # 三分類任務
    inputs = torch.tensor([[0.3, 0.7, 0.8]])
    target = torch.tensor([[0, 1, 1]], dtype=torch.float)

    loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    print('MultiLabelSoftMarginLoss', loss)
# --------------compute by hand
flag = 1
if flag:
    # MultiLabelSoftMarginLoss需要對每個神經元進行計算

    # 非標籤計算，計算公式後一項
    i_0 = torch.log(torch.exp(-inputs[0, 0])/ (1+torch.exp(-inputs[0, 0])))

    # 標籤計算，採用公式第一項計算
    i_1 = torch.log(1 / (1+ torch.exp(-inputs[0, 1])))
    i_2 = torch.log(1 / (1+ torch.exp(-inputs[0, 2])))

    loss_h = (i_0 + i_1 + i_2) / -3
    print('compute by hand', loss_h)
>>>>
MultiLabelSoftMarginLoss tensor([0.5429])
compute by hand tensor(0.5429)

1.15 nn.MultiMarginLoss（多分類）

功能：計算多分類的摺頁損失，計算公式如下：
$\frac{\sum_{i}max(0, margin-x[y]+x[i])^p}{x.size(0)}$

where $\in {0, ..., x.size(0)-1}, y \in {0,...,y.size(0)-1}, 0 \leq y[j] \leq x.size(0)-1,$ and $\neq y[j]$ for all i and j
其中 $x [y]$ 表示了標籤所在的神經元， $x [i]$ 非標籤所在神經元,
主要引數以及程式碼示例：

# nn.MultiMarginLoss(p=1,    # 可選1或2
#                 margin=1.0,  
#                 weight=None,  # 各類別的loss設定許可權
#                 reduction='none'  # 計算模式，可選none/sum/mean)

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
    y = torch.tensor([1, 2], dtype=torch.long)
    
    loss_f = nn.MultiMarginLoss(reduction='none')
    
    loss = loss_f(x, y)
    print('MultiMarginLoss', loss)

#--------compute by hand
flag = 1
if flag:
    # 以輸入的第一個資料為例，in:[0.1, 0.2, 0.7],相當於三分類最後的預測得分，對應的標籤為1,即0.2為此類。
    # 根據公式，分別使用0.2(標籤值)與0.1、0.7(非標籤值)做差,再相加後除以資料總數
    x = x[0]

    margin = 1

    i_0 = margin - (x[1] -x[0])

    i_2 = margin - (x[1] - x[2])
    
    loss_h = (i_0 + i_2) / x.shape[0]
    print('compute by hand',loss_h)
>>>>
MultiMarginLoss tensor([0.8000, 0.7000])
compute by hand tensor(0.8000)

1.16 TripletMarginLoss（三元組損失）

功能：計算三元組損失，人臉驗證中常用。計算公式如下：
$L(a,p,n)=max({d(a_{i}, p_{i}) - d(a_{i}, n_{i}) + margin, 0}) \\d(x_{i}, y_{i}) = ||x_{i}-y_{i}||_{p}$

主要引數以及程式碼示例：

# --------------
# nn.TripletMarginLoss(margin=1.0, # 邊界值
#                     p =2.0,   # 範數的階，預設為2
#                     eps=1e-6,
#                     swap=False,
#                     reduction='none'  # 計算模式 none/sum/mean)
flag = 1
if flag:
    anchor =torch.tensor([[1.]])
    pos = torch.tensor([[2.]])
    neg = torch.tensor([[0.5]])

    loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
    loss = loss_f(anchor, pos, neg)

    print('TripletMarginLoss:', loss)
>>>>
TripletMarginLoss: tensor(1.5000)

1.17 TripletMarginLoss(非線性embedding和半監督學習)

功能：計算兩個輸入的相似性，特別注意：輸入x應為兩個輸入之差的絕對值.計算公式如下：
$l_{n} = \begin{cases} x_{n}, if y_{n} = 1,\\max{0, \Delta-x_{n}}, if y_{n} = -1 \end{cases}$