




  • 損失函式(Loss Function):
    L o s s = f ( y , , y ) Loss=f(y^,,y) Loss=f(y,,y)
  • 代價函式(Cost Fuction):
    C o s t = 1 N ∑ i = 0 N f ( y i , y i ) Cost=\frac{1}{N}\sum_{i=0}^{N}f(y_{i}^, y_{i}) Cost=N1i=0Nf(yi,yi)


1.Loss Function

1.1 _Loss基類


from .module import Module
from .. import functional as F
from .. import _reduction as _Reduction

from torch import Tensor
from typing import Optional

class _Loss(Module):
    reduction: str

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
            self.reduction = reduction

class _WeightedLoss(_Loss):
    def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

1.2 nn.CrossEntropyLoss

1.2.1 有關交叉熵、資訊熵、相對熵的基本概念:

交叉熵 = 資訊熵 + 相對熵 \text{交叉熵 = 資訊熵 + 相對熵} 交叉熵 = 資訊熵 + 相對熵
H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i}) H(P,Q)=i=1NP(xi)logQ(xi)
l ( x ) = − l o g [ p ( x ) ] l(x) = -log[p(x)] l(x)=log[p(x)]
H ( P ) = E x   p [ I ( x ) ] = − ∑ i N P ( x i ) l o g P ( x i ) H(P) = E_{x~p}[I(x)] = -\sum_{i}^NP(x_{i})logP(x_{i}) H(P)=Ex p[I(x)]=iNP(xi)logP(xi)
D K L ( P , Q ) = E x   p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) [ l o g P ( x i ) − l o g Q ( x i ) ] = ∑ i = 1 N P ( x i ) l o g P ( x i ) − ∑ i = 1 N P ( x i ) l o g Q ( x i ) = H ( P , Q ) − H ( P ) D_{KL}(P,Q) = E_{x~p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})[logP(x_{i})-logQ(x_{i})]\\=\sum_{i=1}^NP(x_{i})logP(x_{i})-\sum_{i=1}^NP(x_{i})logQ(x_{i})\\=H(P,Q)-H(P) DKL(P,Q)=Ex p[logQ(x)P(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)[logP(xi)logQ(xi)]=i=1NP(xi)logP(xi)i=1NP(xi)logQ(xi)=H(P,Q)H(P)

  結合上面的公式可以得出結論: 交叉熵: H ( P , Q ) = D K L ( P , Q ) + H ( P ) \text{交叉熵:}H(P, Q) = D_{KL}(P,Q)+H(P) 交叉熵:H(P,Q)=DKL(P,Q)+H(P),其中P代表實際樣本的資料分佈,Q代表預測結果的分佈。

1.2.2 pytorch中的交叉熵



  1. 無權重
    l o s s ( x , c l a s s ) = − l o g ( e x p ( x [ c l a s s ] ) ∑ j e x p ( x [ j ] ) ) = − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) loss(x, class)=-log(\frac{exp(x[class])}{\sum_{j}exp(x[j])}) \\=-x[class] + log(\sum_{j}exp(x[j])) loss(x,class)=log(jexp(x[j])exp(x[class]))=x[class]+log(jexp(x[j]))
  2. 有權重
    l o s s ( x , c l a s s ) = w e i g h t [ c l a s s ] ( − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) ) loss(x, class) = weight[class](-x[class] + log(\sum_{j}exp(x[j]))) loss(x,class)=weight[class](x[class]+log(jexp(x[j])))

其中 x x x表示輸出的概率值, c l a s s class class表示類別值;
  將pytorch中的定義與原始交叉熵公式 H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i}) H(P,Q)=i=1NP(xi)logQ(xi)相對缺少了求和以及 P x i P{x_{i}} Pxi。因為pytorch中是對某一個元素求交叉熵,因此不需要求和項,而且已經確定的了是哪一個元素,因此 P x i = 1 P{x_{i}}=1 Pxi=1,綜上pytorch中的交叉熵公式可以簡單為 H ( P , Q ) = − l o g ( Q ( x i ) ) H(P,Q)=-log(Q(x_{i})) H(P,Q)=log(Q(xi))

torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None,  # 各類別loss設定的權重
                        ignore_index: int = -100,                   # 忽略某個類別
                        reduction: str = 'mean')                    # 計算模式 可以為none/sum/mean,none-逐個元素計算;sum-所有元素求和; mean-加權平均,返回標量。


import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

# ------------
flag = 0
if flag:
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

    # forward
    loss_none = loss_f_none(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print(f'Cross Entropy loss: \n{loss_none, loss_sum, loss_mean}')
Cross Entropy loss: 
(tensor([1.3133, 0.1269, 0.1269]), tensor(1.5671), tensor(0.5224))


##--------------compute by hand
flag = 1
if flag:
    idx = 0
    #inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
    #targets = torch.tensor([0, 1, 1], dtype=torch.long)

    inputs_1 = inputs.detach().numpy()[idx]
    targets_1 = targets.numpy()[idx]

    # 第一項
    x_class = inputs_1[targets_1]
    # 第二項
    sigma_exp_x = np.sum(list(map(np.exp, inputs_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)

    # 輸出loss
    loss_1 = -x_class + log_sigma_exp_x
    print('第一個樣本loss 為:',loss_1)
計算的過程:取出輸入的第一個元素[1, 2] loss = x[class] + log(exp(x[j])) 此處的log表示是數學中的ln
 log(exp(x[j])) = ln(e+e^2)
 x[class] = 1
 >>>loss = ln(e+e^2) -1 
   第一個樣本loss 為: 1.3132617 


1.3 nn.NLLLoss

l ( x , y ) = L = ( l i , . . . . , l N ) T , l n = − w y n x n , y n l(x, y)=L=(l_{i},....,l_{N})^T,l_{n}=-w_{yn}x_{n,y_{n}} l(x,y)=L=(li,....,lN)T,ln=wynxn,yn

nn.NLLLoss(weight=None, # 各類別的loss設定的權值
    ignore_index=-100,  # 忽略某個類別
    reduce='mean')   # 計算模式


import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

flag = 1
if flag:
    weights = torch.tensor([1, 1], dtype=torch.float)
    loss_f_none_w =nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print('\nweights:', weights)
    print('nll loss', loss_none_w, loss_sum, loss_mean)
weights: tensor([1., 1.])
nll loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

1.4 nn.BCELoss

  功能:二分類的交叉熵損失函式,注意事項,輸入值得取值範圍必須在[0, 1]
l n = − w n [ y n ∗ l o g x n + ( 1 − y n ) ∗ l o g ( 1 − x n ) ] l_{n}=-w_{n}[y_{n}*logx_{n} + (1-y_{n})*log(1-x_{n})] ln=wn[ynlogxn+(1yn)log(1xn)]

其中 x n x_{n} xn表示模型輸出的概率取值, y n y_{n} yn表示標籤值,因為是二分類任務,因此 y n y_{n} yn的取值只能是0或者1.


    nn.BCELoss(weight=None,  # 各類別權重
            reduction='mean' # 計算模式)


flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    target_bce = target

    # itarget
    inputs = torch.sigmoid(inputs)
    weights = torch.tensor([1, 1], dtype=torch.float)
    loss_f_none = nn.BCELoss(weights, reduction='none')
    loss_f_sum = nn.BCELoss(weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\nweights: {weights}')
    print(f'BCELoss ', loss_none_w, loss_sum, loss_mean)
weights: tensor([1., 1.])
BCELoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.5 nn.BCEWithLogitsLoss

l n = − w n [ y n ∗ l o g δ ( x n ) + ( 1 − y n ) ∗ l o g ( 1 − δ ( x n ) ) ] l_{n} = -w_{n}[y_{n}*log\delta(x_{n}) + (1-y_{n})*log(1-\delta(x_{n}))] ln=wn[ynlogδ(xn)+(1yn)log(1δ(xn))]


flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    target_bce = target
    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([3],dtype=torch.float)
    loss_f_none = nn.BCEWithLogitsLoss(weights, reduction='none',pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weights, reduction='mean', pos_weight=pos_w)

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\npos_w: {pos_w}')
    print(f'BCEWithLogitsLoss ', loss_none_w, loss_sum, loss_mean)

pos_w: tensor([3.])
BCEWithLogitsLoss  tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
# 當pos_w = torch.tensor([1],dtype=torch.float),從輸出結果中可以看出正樣本的loss,乘以了3倍,模型更加關注正樣本資料
>>>>pos_w: tensor([1.])
BCEWithLogitsLoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.6 nn.L1Loss(資料迴歸)

l n = ∣ x n − y n ∣ l_{n}=|x_{n}-y_{n}| ln=xnyn

flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    loss_f = nn.L1Loss(reduce='none')
    loss = loss_f(inputs, target)


input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1Loss:tensor([[2., 2.],
        [2., 2.]])

1.7 nn.MSELoss(資料迴歸)

l n = ( x n − y n ) 2 l_{n}=(x_{n}-y_{n})^2 ln=(xnyn)2

flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    loss_f = nn.MSELoss(reduction='none')
    loss = loss_f(inputs, target)

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
MSELoss:tensor([[4., 4.],
        [4., 4.]])
#>>>如果 nn.MSELoss(reduction='sum')

1.8 nn.SmoothL1Loss(資料迴歸)

l o s s ( x , y ) = 1 n ∑ i z i loss(x, y)=\frac{1}{n}\sum_{i}z_{i} loss(x,y)=n1izi
z i = { 0.5 ( x i − y i ) 2 ,  if ∣ x i − y i ∣ < 1 ∣ x i − y i ∣ − 0.5 , otherwise z_{i}=\begin{cases} 0.5(x_{i}-y_{i})^2, \ \text{if}|x_{i}-y_{i}|<1 \\ |x_{i}-y_{i}|-0.5, \text{otherwise} \end{cases} zi={0.5(xiyi)2, ifxiyi<1xiyi0.5,otherwise

flag = 1

if flag:
    inputs = torch.linspace(-3, 3, steps=500)
    target = torch.zeros_like(inputs)

    loss_f = nn.SmoothL1Loss(reduction='none')
    loss_smooth = loss_f(inputs, target)
    loss_l1 = np.abs(inputs.numpy())
    plt.plot(inputs.numpy(), loss_smooth.numpy(), label='smooth_l1_loss')
    plt.plot(inputs.numpy(), loss_l1, label='l1 loss')
    plt.xlabel('x_i - y_i')
    plt.savefig('../out_imgs/loss/l1_smooth_l1.png') ##儲存的即為上圖

1.9 nn.PoissonNLLLoss


log_input = True l o s s ( i n p u t , t a r g e t ) = e x p ( i n p u t ) − t a r g e t ∗ i n p u t \text{log\_input = True} \\loss(input, target)=exp(input) - target * input log_input = Trueloss(input,target)=exp(input)targetinput

log_input = False l o s s ( i n p u t , t a r g e t ) = i n p u t − t a r g e t ∗ l o g ( i n p u t + e p s ) \text{log\_input = False} \\loss(input, target)= input- target * log(input+eps) log_input = Falseloss(input,target)=inputtargetlog(input+eps)


nn.PoissonNLLLoss(log_input=True,   # 輸入是否為對數形式,決定計算公式
                full=Flase,         # 計算所有loss,預設False
                eps=1e-8            # 修正項,避免log(輸入)為nan 
flag = 1
if flag:
    inputs = torch.randn((2, 2))
    target = torch.randn((2, 2))
    # 有關reduction的其它計算模式在接下來的損失示例中不在一一描述
    loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
    loss = loss_f(inputs, target)
    print('inputs :{}\ntarget is{}\nPoissonNLLLoss :{}'.format(inputs, target, loss))

#---------------compute by hand 
flag = 1
if flag:
    idx = 0
    # 當full=False時,採用的計算公式
    loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]* inputs[idx, idx]
    print('第一個元素的loss', loss_1)
#>>>> 從輸出結果可以看出,手動計算的結果與pytorch api 呼叫輸出的結果是一致的
inputs :tensor([[ 0.0553,  0.2444],
        [-0.5864,  0.1678]])
target istensor([[-1.1071, -0.4799],
        [ 1.1683, -1.4043]])
PoissonNLLLoss :tensor([[1.1180, 1.3942],
        [1.2415, 1.4185]])
第一個元素的loss tensor(1.1180)

1.10 nn.KLDivLoss

D K L ( P ∣ ∣ Q ) = E x − p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) ( l o g P ( x i ) − l o g ( Q ( x i ) ) ) D_{KL}(P||Q) = E_{x-p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})(logP(x_{i})-log(Q(x_i))) DKL(PQ)=Exp[logQ(x)P(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)(logP(xi)log(Q(xi)))
l n = y n ∗ ( l o g y n − x n ) l_{n} = y_{n}*(logy_{n}-x_{n}) ln=yn(logynxn)
其中 y n y_{n} yn表示標籤, x n x_{n} xn模型的輸出值。

flag =1 
if flag:
    # input tensor_size:(2, 3),為了方便理解可以想像成全連線的最終輸出是3個神經元,2個batch的資料
    inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.2, 0.5]])  
    inputs_log = torch.log(inputs)
    target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

    loss_f_none = nn.KLDivLoss(reduction='none')
    loss_f_mean = nn.KLDivLoss(reduction='mean')
    # 根據inputs的維度的batcsize的大小為2
    loss_f_batch_mean = nn.KLDivLoss(reduction='batchmean')

    loss_none = loss_f_none(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    loss_bs_mean = loss_f_batch_mean(inputs, target)

    print('loss_none:{}\nloss_mean:{}\nloss_bs_mean:{}'.format(loss_none, loss_mean, loss_bs_mean))
#-----------------compute by hand
flag = 1
if flag:
    idx = 0
    # 理論上需要對後一項括號中的inputs[idx, idx]取對數,但是此處輸入值直接採用了[0,1]之間的數模擬概率值,同時也是直接模擬pytorch中所採用的計算公式。
    loss_1 = target[idx, idx]*(torch.log(target[idx, idx])-inputs[idx, idx])
    print('loss_1', loss_1)
# >>> 可以看出手動計算的第一個元素的loss與api的結果一致
loss_none:tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.3897, -0.4219]])
loss_1 tensor(-0.5448)

1.11 nn.MarginRankingLoss

l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) loss(x, y) = max(0, -y * (x_{1}-x_{2}) + margin) loss(x,y)=max(0,y(x1x2)+margin)
y y y表示取值標籤,只能是1或者-1, x 1 x_{1} x1 x 2 x_{2} x2表示向量的每個元素,因此可以得到以下的結論:

  • y = 1時,希望 x 1 > x 2 x_{1}>x_{2} x1>x2, 當 x 1 > x 2 x_{1}>x_{2} x1>x2時,不會產生loss
  • y = -1時,希望 x 2 > x 1 x_{2}>x_{1} x2>x1, 當 x 2 > x 1 x_{2}>x_{1} x2>x1時,不會產生loss


flag = 1
if flag:
    x1 = torch.tensor([[1], [2], [3]],dtype=torch.float)
    x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

    target = torch.tensor([1, 1, -1], dtype=torch.float)

    loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
    loss =loss_f_none(x1, x2, target)
    print('MarginRankingLoss', loss)
MarginRankingLoss tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])
2.以x1中的第一個元素為例,1將於x2中的每個元素進行比較,因為target[0]=1,根據上述公式當x1>x2是loss為0,否則為x2-x1+margin(0)。逐個元素去比較,1<2,loss[0][0] = 2-1

1.12 nn.MultiLabelMarginLoss(多標籤分類)

如:四分類任務,樣本x屬於0類和3類,標籤[0, 3, -1, -1],不是[1, 0, 0,1]
l o s s ( x , y ) = ∑ i j m a x ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) ) x . s i z e ( 0 ) loss(x, y)=\sum_{ij}\frac{max(0, 1-(x[y[j]]-x[i]))}{x.size(0)} loss(x,y)=ijx.size(0)max(0,1(x[y[j]]x[i]))

where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等於y[j] for all i and j \text{where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等於y[j] for all i and j} where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等於y[j] for all i and j
對於公式中分子括號中的簡單理解為使用標籤神經元減去非標籤神經元,為什麼需要這樣設計,對於多標籤分類,希望是標籤的輸出大於非標籤預測輸出,因此使用 m a x ( 0 , 1 − ( x [ y [ j ] ] ) − x [ i ] ) max(0, 1-(x[y[j]])-x[i]) max(0,1(x[y[j]])x[i])

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
    y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

    loss_f = nn.MultiLabelMarginLoss(reduction='none')
    loss = loss_f(x, y)
    print('MultiLabelMarginLoss', loss)
# ------------compute by hand
flag = 1
if flag:
    x = x[0]

    item_1 = (1-(x[0]-x[1])) + (1 - (x[0]-x[2]))
    item_2 = (1-(x[3]-x[1])) + (1-(x[3]-x[2]))
    loss_h = (item_1 + item_2) / x.shape[0]
    print('compute by hand ', loss_h)
# >>>
MultiLabelMarginLoss tensor([0.8500])
compute by hand  tensor(0.8500)

1.13 nn.SoftMarginLoss(二分類)

l o s s ( x , y ) = ∑ i l o g ( 1 + e x p ( − y [ i ] ∗ x [ i ] ) ) x . n e l e m e n t loss(x, y)=\sum_{i}\frac{log(1+exp(-y[i] * x[i]))}{x.nelement} loss(x,y)=ix.nelementlog(1+exp(y[i]x[i]))

flag = 1
if flag:
    inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
    target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

    loss_f = nn.SoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)

    print('SoftMarginLoss', loss)

#-----------compute by hand
flag = 1
if flag:
    idx = 0

    inputs_i = inputs[idx, idx]
    target_i = target[idx, idx]

    loss_h = np.log(1+ np.exp(-target_i * inputs_i))
    print('compute by hand', loss_h)
# >>>
SoftMarginLoss tensor([[0.8544, 0.4032],
        [0.4741, 0.9741]])
compute by hand tensor(0.8544)

1.14 MultiLabelSoftMarginLoss

l o s s ( x , y ) = − 1 C ∗ ∑ i y [ i ] ∗ l o g ( ( 1 + e x p ( − x [ i ] ) ) − 1 ) + ( 1 − y [ i ] ) ∗ l o g ( e x p ( − x [ i ] ) 1 + e x p ( − x [ i ] ) ) loss(x, y)=-\frac{1}{C} * \sum_{i}y[i]*log((1+exp(-x[i]))^{-1})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])}) loss(x,y)=C1iy[i]log((1+exp(x[i]))1)+(1y[i])log(1+exp(x[i])exp(x[i]))
C表示標籤的數量 , y [ i ] 為 標 籤 , x [ i ] 表 示 模 型 的 輸 出 值 。 以 四 分 類 為 例 , 此 處 的 y [ i ] 必 須 是 一 個 [ 1 , 0 , 0 , 1 ] 形 式 , 根 據 公 式 可 以 看 出 當 y [ i ] 是 標 籤 時 , 採 用 公 式 前 面 一 項 計 算 , 否 則 採 用 後 面 的 公 式 計 算 \text{C表示標籤的數量},y[i]為標籤,x[i]表示模型的輸出值。以四分類為例,此處的y[i]必須是一個[1,0,0, 1]形式,根據公式可以看出當y[i]是標籤時,採用公式前面一項計算,否則採用後面的公式計算 C表示標籤的數量y[i]x[i]y[i][1,0,0,1]y[i]

flag = 1
if flag:
    # 三分類任務
    inputs = torch.tensor([[0.3, 0.7, 0.8]])
    target = torch.tensor([[0, 1, 1]], dtype=torch.float)

    loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    print('MultiLabelSoftMarginLoss', loss)
# --------------compute by hand
flag = 1
if flag:
    # MultiLabelSoftMarginLoss需要對每個神經元進行計算

    # 非標籤計算,計算公式後一項
    i_0 = torch.log(torch.exp(-inputs[0, 0])/ (1+torch.exp(-inputs[0, 0])))

    # 標籤計算,採用公式第一項計算
    i_1 = torch.log(1 / (1+ torch.exp(-inputs[0, 1])))
    i_2 = torch.log(1 / (1+ torch.exp(-inputs[0, 2])))

    loss_h = (i_0 + i_1 + i_2) / -3
    print('compute by hand', loss_h)
MultiLabelSoftMarginLoss tensor([0.5429])
compute by hand tensor(0.5429)

1.15 nn.MultiMarginLoss(多分類)

l o s s ( x , y ) = ∑ i m a x ( 0 , m a r g i n − x [ y ] + x [ i ] ) p x . s i z e ( 0 ) loss(x, y) = \frac{\sum_{i}max(0, margin-x[y]+x[i])^p}{x.size(0)} loss(x,y)=x.size(0)imax(0,marginx[y]+x[i])p

where x ∈ 0 , . . . , x . s i z e ( 0 ) − 1 , y ∈ 0 , . . . , y . s i z e ( 0 ) − 1 , 0 ≤ y [ j ] ≤ x . s i z e ( 0 ) − 1 , x \in {0, ..., x.size(0)-1}, y \in {0,...,y.size(0)-1}, 0 \leq y[j] \leq x.size(0)-1, x0,...,x.size(0)1,y0,...,y.size(0)1,0y[j]x.size(0)1, and i ≠ y [ j ] i \neq y[j] i=y[j] for all i and j
其中 x [ y ] x[y] x[y]表示了標籤所在的神經元, x [ i ] x[i] x[i]非標籤所在神經元,

# nn.MultiMarginLoss(p=1,    # 可選1或2
#                 margin=1.0,  
#                 weight=None,  # 各類別的loss設定許可權
#                 reduction='none'  # 計算模式,可選none/sum/mean)

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
    y = torch.tensor([1, 2], dtype=torch.long)
    loss_f = nn.MultiMarginLoss(reduction='none')
    loss = loss_f(x, y)
    print('MultiMarginLoss', loss)

#--------compute by hand
flag = 1
if flag:
    # 以輸入的第一個資料為例,in:[0.1, 0.2, 0.7],相當於三分類最後的預測得分,對應的標籤為1,即0.2為此類。
    # 根據公式,分別使用0.2(標籤值)與0.1、0.7(非標籤值)做差,再相加後除以資料總數
    x = x[0]

    margin = 1

    i_0 = margin - (x[1] -x[0])

    i_2 = margin - (x[1] - x[2])
    loss_h = (i_0 + i_2) / x.shape[0]
    print('compute by hand',loss_h)
MultiMarginLoss tensor([0.8000, 0.7000])
compute by hand tensor(0.8000)

1.16 TripletMarginLoss(三元組損失)

  功能:計算三元組損失 ,人臉驗證中常用。計算公式如下:
L ( a , p , n ) = m a x ( d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 ) d ( x i , y i ) = ∣ ∣ x i − y i ∣ ∣ p L(a,p,n)=max({d(a_{i}, p_{i}) - d(a_{i}, n_{i}) + margin, 0}) \\d(x_{i}, y_{i}) = ||x_{i}-y_{i}||_{p} L(a,p,n)=max(d(ai,pi)d(ai,ni)+margin,0)d(xi,yi)=xiyip


# --------------
# nn.TripletMarginLoss(margin=1.0, # 邊界值
#                     p =2.0,   # 範數的階,預設為2
#                     eps=1e-6,
#                     swap=False,
#                     reduction='none'  # 計算模式 none/sum/mean)
flag = 1
if flag:
    anchor =torch.tensor([[1.]])
    pos = torch.tensor([[2.]])
    neg = torch.tensor([[0.5]])

    loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
    loss = loss_f(anchor, pos, neg)

    print('TripletMarginLoss:', loss)
TripletMarginLoss: tensor(1.5000)

1.17 TripletMarginLoss(非線性embedding和半監督學習)

l n = { x n , i f y n = 1 , m a x 0 , Δ − x n , i f y n = − 1 l_{n} = \begin{cases} x_{n}, if y_{n} = 1,\\max{0, \Delta-x_{n}}, if y_{n} = -1 \end{cases} ln={xn,ifyn=1,max0,Δxn,ifyn=1


# nn.HingeEmbeddingLoss(margin=1.0,  # 邊界值
#                 reduction='none'  # 計算模式 可為none/sum/mean/
#                 )

flag = 1
if flag:
    inputs = torch.tensor([[1., 0.8, 0.5]])
    target = torch.tensor([[1, 1, -1]])
    loss_f = nn.HingeEmbeddingLoss(margin=1.0, reduction='none')

    loss = loss_f(inputs, target)
    print('HingeEmbeddingLoss:', loss)
# >>> 當標籤值為1時,直接輸出x,當標籤為-1時,使用margin-x與0做一個max
HingeEmbeddingLoss: tensor([[1.0000, 0.8000, 0.5000]])

1.18 CosineEmbeddingLoss(embedding和半監督學習)

l o s s ( x , y ) = { 1 − c o s ( x 1 , x 2 ) , i f y = 1 m a x ( 0 , c o s ( x 1 , x 2 ) − m a r g i n ) , i f y = − 1 loss(x, y) = \begin{cases} 1-cos(x_{1}, x_{2}), \qquad if \quad y =1\\max(0, cos(x_{1}, x_{2})-margin), \qquad if \quad y =-1 \end{cases} loss(x,y)={1cos(x1,x2),ify=1max(0,cos(x1,x2)margin),ify=1

c o s ( θ ) = A ∗ B ∣ ∣ A ∣ ∣ ∣ ∣ B ∣ ∣ = ∑ i = 1 n A i × B i ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 cos(\theta)=\frac{A*B}{||A||||B||}=\frac{\sum_{i=1}^nA_{i}\times B_{i}}{\sqrt{\sum_{i=1}^n(A_{i})^2}\times\sqrt{\sum_{i=1}^n(B_{i})^2}} cos(θ)=ABAB=i=1n(Ai)2 ×i=1n(Bi)2 i=1nAi×Bi

flag = 1
if flag:

    x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
    x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

    target = torch.tensor([[1, -1]], dtype=torch.float)

    loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
    loss = loss_f(x1, x2,target)
    print('CosineEmbeddingLoss:', loss)

# --------------------compute by hand
flag = 1
if flag:
    margin = 0.

    def cosine(a, b):
        numerator =, b)
        denpminator = torch.norm(a, 2)* torch.norm(b,2)
        return float(numerator / denpminator)

    l_1 = 1-(cosine(x1[0], x2[0]))
    l_2 = max(0, cosine(x1[0], x2[0]))

    print(l_1, l_2)
CosineEmbeddingLoss: tensor([[0.0167, 0.9833]])
0.016662120819091797 0.9833378791809082

1.19 nn.CTCLoss

  功能:計算CTC(Connectionist Temproal Classification)損失,解決時序類資料的分類.

    flag = 1
    if flag:
        T = 50   # input sequence length
        C = 20   # number of classes (including blank)
        N =16    # batch size 
        S = 30   # target sequence length of longest target in batch
        S_min =10  # minimum target length, for demonstration purposes

        # initialize random batch of input vector for *size = (T, N,C)  
        inputs = torch.randn(T,N, C).log_softmax(2).detach().requires_grad_()

        # initialize random batch of target (0 = blank, 1:c = classes)
        target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
        input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

        target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)

        ctc_loss = nn.CTCLoss()
        loss = ctc_loss(inputs, target, input_lengths, target_lengths)
        print('ctc loss:',loss)
ctc loss: tensor(6.6770, grad_fn=<MeanBackward0>)
