多元複合函式二階導數與向量微積分的思考

Katoumegumi發表於2021-01-21

多元複合函式二階導數與向量微積分的思考

引入

對於形似\(z=f(u_1,u_2,...,u_n),\)其中\(u_i=g_i(x_i)\)的多元複合函式,對其二階導數的考察常常會經過繁瑣而重複的運算,且容易在連續運用鏈式法則時犯錯。本文將提出該類題型的通解以及理論推導過程供參考。

例1:\(z=f(x^2-y^2,e^{xy})\),其中\(f\)具有二階連續偏導數,求 \(\frac{\partial ^2z}{ \partial x \partial y}\).

通過鏈式法則,我們可以得到結果\(\frac{\partial ^2z}{ \partial x \partial y}=-4xyf^{''}_{11}+2(x^2-y^2)e^{xy}f^{''}_{12}+xye^{2xy}f{''}_{22}+e^{xy}(1+xy)f^{'}_2\)

對於式子中的\(f^{''}_{11}、f^{''}_{12}\)的出現,我們可以聯想到矩陣的下標,由此引發我們對該式子簡化形式甚至該類題型通解的思考。


梯度矩陣

我們定義[1],對於一個函式\(f: ℝ^n\rightarrow ℝ ,\pmb{x} \rightarrow f(\pmb{x}),\pmb{x}\in ℝ^n\),即,\(\pmb{x}=[x_1,x_2,x_3,...,x_n]^T\),偏導數為:

\[\frac{\partial f}{\partial x_1}= \lim_{h \rightarrow 0} f\frac{(x_1+h,x_2,...,x_n)-f(\pmb{x})}{h}\\.\\.\\.\\\frac{\partial f}{\partial x_n}= \lim_{h \rightarrow 0}\frac{f(x_1,x_2,...,x_n+h)-f(\pmb{x})}{h} \tag{2.1} \]

我們寫作行向量的形式,記作:

\[∇_{\pmb{x}}f=grad\ f=\left[\begin{matrix}\frac{\partial f(\pmb{x})}{\partial x_1} & \frac{\partial f(\pmb{x})}{\partial x_1} & ... & \frac{\partial f(\pmb{x})}{\partial x_n}\\\end{matrix} \right] \inℝ^n \tag{2.2} \]

例如,對於函式\(f(x,y)=(x+2y^3)^2\),我們有:

\[∇f=\left[\begin{matrix}2(x+2y^3) & 12(x+2y^3)y^2\end{matrix} \right] \inℝ^{1×2} \tag{2.3} \]

為了探求文章開始所提出問題通解形式的探討,繞不開的一個重要步驟是對梯度矩陣\(∇f\)進行求導,我們將在推導的過程中單獨進行分析。


多元複合函式的二階導數與黑塞矩陣

\(z=f(u_1,u_2,...,u_n),\)其中\(u_i=g_i(x_i)\),求\(\frac{\partial ^2z}{ \partial x_i \partial x_j}\).

\[\frac{\partial z}{ \partial x_i}=\frac{\partial z}{ \partial \pmb{u} }·\frac{\partial \pmb{u}}{ \partial x_i} =\left[\begin{matrix}\frac{\partial f}{\partial u_1} & \frac{\partial f}{\partial u_2} & ... & \frac{\partial f}{\partial u_n}\end{matrix} \right] \left[\begin{matrix}\frac{\partial u_1}{\partial x_i} \\ \frac{\partial u_2}{\partial x_i} \\ ... \\ \frac{\partial u_n}{\partial x_i}\end{matrix} \right] \tag{3.1} \]

為了簡化形式,我們令:

\[\pmb{X_i}=\left[\begin{matrix}\frac{\partial u_1}{\partial x_i} & \frac{\partial u_2}{\partial x_i} & ... & \frac{\partial u_n}{\partial x_i}\end{matrix} \right]^T \tag{3.2} \]

那麼:

\[\frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \tag{3.3} \]

接下來,我們需要求解

\[\frac{\partial {}}{ \partial x_i}(∇_{\pmb{u}}f·\pmb{X_i} \tag{3.4}) \]

\[\frac{\partial {}}{ \partial x_j}(∇_{\pmb{u}}f·\pmb{X_i} )=\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i} + ∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i} \tag{3.5} \]

\(\frac{\partial {}}{ \partial x_j}\pmb{X_i}\)的答案容易得到的,我們著重於討論\(\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i}\),尤其是\(\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f\)的結果。

經分析:

\[\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f=\frac{\partial {}}{ \partial \pmb{u}^T}·\frac{\partial {\pmb{u}^T}}{ \partial x_j}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f \tag{3.6} \]

問題被簡化轉化為解決向量\((∇_{\pmb{u}}f)\)對向量\((\pmb{u}^T)\)求導的問題。

我們對這個運算進行進一步分析,這個運算的實質是梯度矩陣中的元素逐個對\(u_i\)分別求導,結果顯然是一個\(2×2\)的方陣,而這個矩陣在數學上被定義為 黑塞矩陣(Hessian Matrix),記作\(H(f)\),它的具體形式是:

\[A= \left[\begin{matrix} \frac{\partial^2 f}{\partial x_1\partial x_1} & \frac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial x_2\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \frac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n} \end{matrix} \right]\tag{3.7} \]

其規律是顯而易見的。

於是,引入\(H(f)\)後,我們可以繼續化簡:

\[\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\left[\begin{matrix} \frac{\partial^2 f}{\partial u_1\partial u_1} & \frac{\partial^2 f}{\partial u_1\partial u_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial u_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial u_2\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_2\partial u_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial u_n\partial u_1} & \frac{\partial^2 f}{\partial u_n\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_1\partial u_n} \end{matrix} \right]=\pmb{X_j}^T·H_{\pmb{u}}(f)\tag{3.7} \]

所以

\[\frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{3.8} \]

其中

\[\pmb{X_{ij}}=\left[\begin{matrix}\frac{\partial^2 u_1}{\partial x_i\partial x_j} & \frac{\partial^2 u_2}{\partial x_i\partial x_j} & ... & \frac{\partial^2 u_n}{\partial x_i\partial x_j}\end{matrix} \right]^T\tag{3.9} \]

當然在實際計算過程中,由於\(\pmb{X_i}\)的值已經被計算,所以直接計算\(\frac{\partial {}}{ \partial x_j}\pmb{X_i}\)或許更為便捷。


總結

\(z=f(u_1,u_2,...,u_n),\)其中\(u_i=g_i(x_i)\),求\(\frac{\partial ^2z}{ \partial x_i \partial x_j}\).

\[\frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \\\frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{end} \]


參考

  • [1] 《MATHEMATICS FOR MACHINE LEARNING》(Marc Peter Deisenroth,A. Aldo Faisal ,Cheng Soon Ong)

相關文章