線性迴歸模型公式推導完整簡潔版

Ladisson-blog發表於2024-03-13

線性迴歸模型公式推導完整簡潔版

這裡做模型簡化,假設有4個樣本,每個樣本有3個特徵,然後使用平方誤差作為損失函式,公式推導反向傳播的求導過程。

設訓練樣本為

\[X = \left[ \begin{matrix} x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)}\\ x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)}\\ x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)}\\ x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)}\\ \end{matrix} \right] \]

其中,有4個樣本,每個樣本三個特徵值分別為\([x_1,x_2, x_3]\)

設標籤資料為

\[\hat{y} = \left[ \begin{matrix} y^{(1)} \\ y^{(2)} \\ y^{(3)} \\ y^{(4)} \\ \end{matrix} \right] \]

設需要學習的引數為\(w\)(3維度向量)和\(b\)(標量)

\[w = \left[ \begin{matrix} w_{1} \\ w_{2} \\ w_{3} \\ \end{matrix} \right] \]

則損失函式為

\[Loss(w,b) =\frac{1}{2n} || Xw + b - \hat{y} ||^{2} \]

則將損失函式化為具體矩陣

\[Loss(w,b) = \frac{1}{2n} ( \left[ \begin{matrix} x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)}\\ x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)}\\ x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)}\\ x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)}\\ \end{matrix} \right] \left[ \begin{matrix} w_{1} \\ w_{2} \\ w_{3} \\ \end{matrix} \right] + b - \left[ \begin{matrix} y^{(1)} \\ y^{(2)} \\ y^{(3)} \\ y^{(4)} \\ \end{matrix} \right] )^{向量內積} \]

\(b\)放入矩陣,簡化公式

\[Loss(w,b) = \frac{1}{2n} ( \left[ \begin{matrix} x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)} & 1\\ x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)} & 1\\ x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)} & 1\\ x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)} & 1\\ \end{matrix} \right] \left[ \begin{matrix} w_{1} \\ w_{2} \\ w_{3} \\ b \end{matrix} \right] - \left[ \begin{matrix} y^{(1)} \\ y^{(2)} \\ y^{(3)} \\ y^{(4)} \\ \end{matrix} \right] )^{向量內積} \]

則令

\[X = \left[ \begin{matrix} x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)} & 1\\ x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)} & 1\\ x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)} & 1\\ x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)} & 1\\ \end{matrix} \right] \space \space \space \space \space w = \left[ \begin{matrix} w_{1} \\ w_{2} \\ w_{3} \\ b \end{matrix} \right] \]

則公式化簡為

\[Loss(w) =\frac{1}{2n} (Xw - \hat{y})^{向量內積} =\frac{1}{2n} (Xw - \hat{y})^{T}(Xw - \hat{y}) \]

\(Loss(w)\)\(w\)求導

\[\frac{\partial Loss(w)}{\partial w} = \frac{1}{2n} \space \frac{\partial }{\partial w}(Xw - \hat{y})^{T}(Xw - \hat{y}) \tag{1} \]

根據標量對向量求導公式

\[\frac{\partial \mathrm{y}}{\partial x} = \frac{\partial x^{T}x}{x}=2x^{T} \]

因此,公式(1)根據鏈式求導規則化簡為

\[\frac{\partial Loss(w)}{\partial w} = \frac{1}{n} \space (Xw - \hat{y})^{T} \space \frac{\partial (Xw-\hat{y})}{\partial {w}} \\ =\frac{1}{n} (Xw-\hat{y})^{T} X \]

主要還是捋清楚標量對向量求導後的維數,參照這個圖
image-20240312221801233

相關文章