線性迴歸模型公式推導完整簡潔版
這裡做模型簡化,假設有4個樣本,每個樣本有3個特徵,然後使用平方誤差作為損失函式,公式推導反向傳播的求導過程。
設訓練樣本為
\[X = \left[
\begin{matrix}
x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)}\\
x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)}\\
x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)}\\
x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)}\\
\end{matrix}
\right]
\]
其中,有4個樣本,每個樣本三個特徵值分別為\([x_1,x_2, x_3]\)。
設標籤資料為
\[\hat{y} =
\left[
\begin{matrix}
y^{(1)} \\
y^{(2)} \\
y^{(3)} \\
y^{(4)} \\
\end{matrix}
\right]
\]
設需要學習的引數為\(w\)(3維度向量)和\(b\)(標量)
\[w =
\left[
\begin{matrix}
w_{1} \\
w_{2} \\
w_{3} \\
\end{matrix}
\right]
\]
則損失函式為
\[Loss(w,b) =\frac{1}{2n} || Xw + b - \hat{y} ||^{2}
\]
則將損失函式化為具體矩陣
\[Loss(w,b) =
\frac{1}{2n}
(
\left[
\begin{matrix}
x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)}\\
x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)}\\
x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)}\\
x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)}\\
\end{matrix}
\right]
\left[
\begin{matrix}
w_{1} \\
w_{2} \\
w_{3} \\
\end{matrix}
\right]
+ b -
\left[
\begin{matrix}
y^{(1)} \\
y^{(2)} \\
y^{(3)} \\
y^{(4)} \\
\end{matrix}
\right]
)^{向量內積}
\]
將\(b\)放入矩陣,簡化公式
\[Loss(w,b) =
\frac{1}{2n}
(
\left[
\begin{matrix}
x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)} & 1\\
x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)} & 1\\
x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)} & 1\\
x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)} & 1\\
\end{matrix}
\right]
\left[
\begin{matrix}
w_{1} \\
w_{2} \\
w_{3} \\
b
\end{matrix}
\right]
-
\left[
\begin{matrix}
y^{(1)} \\
y^{(2)} \\
y^{(3)} \\
y^{(4)} \\
\end{matrix}
\right]
)^{向量內積}
\]
則令
\[X = \left[
\begin{matrix}
x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)} & 1\\
x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)} & 1\\
x_{1}^{(3)} & x_{2}^{(3)} & x_{3}^{(3)} & 1\\
x_{1}^{(4)} & x_{2}^{(4)} & x_{3}^{(4)} & 1\\
\end{matrix}
\right]
\space
\space
\space
\space
\space
w =
\left[
\begin{matrix}
w_{1} \\
w_{2} \\
w_{3} \\
b
\end{matrix}
\right]
\]
則公式化簡為
\[Loss(w) =\frac{1}{2n} (Xw - \hat{y})^{向量內積}
=\frac{1}{2n} (Xw - \hat{y})^{T}(Xw - \hat{y})
\]
則\(Loss(w)\)對\(w\)求導
\[\frac{\partial Loss(w)}{\partial w} =
\frac{1}{2n} \space \frac{\partial }{\partial w}(Xw - \hat{y})^{T}(Xw - \hat{y})
\tag{1}
\]
根據標量對向量求導公式
\[\frac{\partial \mathrm{y}}{\partial x} =
\frac{\partial x^{T}x}{x}=2x^{T}
\]
因此,公式(1)根據鏈式求導規則化簡為
\[\frac{\partial Loss(w)}{\partial w} =
\frac{1}{n} \space (Xw - \hat{y})^{T} \space \frac{\partial (Xw-\hat{y})}{\partial {w}} \\
=\frac{1}{n} (Xw-\hat{y})^{T} X
\]
主要還是捋清楚標量對向量求導後的維數,參照這個圖