# 深度學習/機器學習入門基礎數學知識整理（二）：梯度與導數，矩陣求導，泰勒展開等

## 導數與梯度

f′(a)=limh→0f(a h)−f(a)h

f'(a) = \lim_{h \rightarrow 0} \frac{f(a h)-f(a)}{h}

∇f(X)=∂f(X)∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂f(X)∂x1∂f(X)∂x2⋮∂f(X)∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\nabla f(\bf{X}) = \frac{\partial f(\bf{X})}{\partial \bf{X}} = \begin{bmatrix}
\frac{\partial f(\bf{X})}{\partial {x_1}} \\
\frac{\partial f(\bf{X})}{\partial {x_2}} \\
\vdots\\
\frac{\partial f(\bf{X})}{\partial {x_n}} \\
\end{bmatrix}

• 二階導數，Hessian矩陣：
H(x)=∇2f(X)=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂2f(X)∂x12∂2f(X)∂x2∂x1⋮∂2f(X)∂xn∂x1∂2f(X)∂x1∂x2∂2f(X)∂x22⋮∂2f(X)∂xn∂x2⋯⋯⋱⋯∂2f(X)∂x1∂xn∂2f(X)∂x2∂xn⋮∂2f(X)∂xn2⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\bf{H}(x)= \nabla^2f(\bf{X}) = \begin{bmatrix}
\frac{\partial ^2 f(\bf{X})}{\partial {x_1}^2} & \frac{\partial ^2 f(\bf{X})}{\partial {x_1}\partial {x_2}} & \cdots & \frac{\partial ^2 f(\bf{X})}{\partial {x_1}\partial {x_n}} &\\
\frac{\partial ^2 f(\bf{X})}{\partial {x_2}\partial {x_1}} & \frac{\partial ^2 f(\bf{X})}{\partial {x_2}^2} & \cdots & \frac{\partial ^2 f(\bf{X})}{\partial {x_2}\partial {x_n}} &\\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial ^2 f(\bf{X})}{\partial {x_n}\partial {x_1}} & \frac{\partial ^2 f(\bf{X})}{\partial {x_n}\partial {x_2}} & \cdots & \frac{\partial ^2 f(\bf{X})}{\partial {x_n}^2} &\\
\end{bmatrix}

f(xk δ)≈f(xk) f′(xk)δ 12f′′(xk)δ2 ⋯ 1n!f(n)(xk)δn

f(x_k \delta) \approx f(x_k) f'(x_k)\delta \frac{1}{2}f”(x_k)\delta^2 \cdots \frac{1}{n!}f^{(n)}(x_k)\delta^n

f(xk δ)≈f(xk) ∇Tf(xk)δ 12δTf′′(xk)δ

f(\bf{x}_k \bf{\delta}) \approx f(x_k) \nabla^Tf(\bf{x}_k) \bf{\delta} \frac{1}{2}\bf{\delta^T}f”(\bf{x}_k)\bf{\delta}

## 矩陣求導總結

（1）對標量求導

• 標量關於標量x的求導：
∂y∂x

\frac {\partial y}{\partial x}

• 向量關於標量x的求導：
向量y=⎡⎣⎢⎢⎢⎢⎢y1y2⋮yn⎤⎦⎥⎥⎥⎥⎥{\bf y} = \begin {bmatrix} y_1 \\ y_2\\ \vdots \\ y_n\end{bmatrix}關於標量x 的求導就是 y 的每一個元素分別對x求導，可以表示為

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂yn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\frac {\partial \bf y}{\partial x} = \begin {bmatrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \end{bmatrix}

• 矩陣·關於標量x的求導：
矩陣對標量的求導類似於向量關於標量的求導，也就是矩陣的每個元素分別對標量x求導

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂yn1∂x∂y12∂x∂y22∂x⋮∂yn2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ynn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\frac {\partial \bf Y}{\partial x} = \begin {bmatrix} \frac{\partial y_{11} }{\partial x } & \frac{\partial y_{12} }{\partial x }& \cdots & \frac{\partial y_{1n} }{\partial x } \\ \frac{\partial y_{21}}{\partial x } & \frac{\partial y_{22}}{\partial x } & \cdots & \frac{\partial y_{2n}}{\partial x } \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{n1} }{\partial x } & \frac{\partial y_{n2} }{\partial x } & \cdots & \frac{\partial y_{nn}}{\partial x } \end{bmatrix}

（2）對向量求導

• 標量關於向量x的導數
標量y 關於向量 x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥{\bf x } = \begin {bmatrix} x_1 \\ x_2\\ \vdots \\ x_n\end{bmatrix} 的求導可以表示為

=[∂y∂x1 ∂y∂x2 ⋯ ∂y∂xn]

= \begin {bmatrix} \frac{\partial y}{\partial x_{1} }\ \frac{\partial y}{\partial x_{2} } \ \cdots \ \frac{\partial y}{\partial x_{n} } \end{bmatrix}

• 向量關於向量 x 的導數
向量函式（即函式組成的向量）y=⎡⎣⎢⎢⎢⎢⎢y1y2⋮yn⎤⎦⎥⎥⎥⎥⎥{\bf y} = \begin {bmatrix} y_1 \\ y_2\\ \vdots \\ y_n\end{bmatrix}關於x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥{\bf x } = \begin {bmatrix} x_1 \\ x_2\\ \vdots \\ x_n\end{bmatrix}的導數

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂yn∂x1∂y1∂x2∂y2∂x2⋮∂yn∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂yn∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\frac {\partial \bf y}{\partial \bf x} = \begin {bmatrix} \frac{\partial y_{1} }{\partial x_{1} } & \frac{\partial y_{1} }{\partial x_{2} }& \cdots & \frac{\partial y_{1} }{\partial x_{n} } \\ \frac{\partial y_{2}}{\partial x_{1} } & \frac{\partial y_{2}}{\partial x_{2} } & \cdots & \frac{\partial y_{2}}{\partial x_{n} } \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{n} }{\partial x_{1} } & \frac{\partial y_{n} }{\partial x_{2} } & \cdots & \frac{\partial y_{n}}{\partial x_{n} } \end{bmatrix}

• 矩陣關於向量的導數
矩陣Y=⎡⎣⎢⎢⎢⎢⎢y11y21⋮yn1y12y22⋮yn2⋯⋯⋱⋯y1ny2n⋮ynn⎤⎦⎥⎥⎥⎥⎥{\bf Y} = \begin {bmatrix} y_{11} & y_{12} & \cdots & y_{1n} \\ y_{21} & y_{22} & \cdots & y_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nn} \end{bmatrix}關於 x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥{\bf x } = \begin {bmatrix} x_1 \\ x_2\\ \vdots \\ x_n\end{bmatrix}的導數是推導中最複雜的一種，表示為

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x1∂y21∂x1⋮∂yn1∂x1∂y1n∂x2∂y22∂x2⋮∂yn2∂x2⋯⋯⋱⋯∂y1n∂xn∂y2n∂xn⋮∂ynn∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\frac {\partial \bf Y}{\partial \bf x} = \begin {bmatrix} \frac{\partial y_{11} }{\partial x_{1} } & \frac{\partial y_{1n} }{\partial x_{2} }& \cdots & \frac{\partial y_{1n} }{\partial x_{n} } \\ \frac{\partial y_{21}}{\partial x_{1} } & \frac{\partial y_{22}}{\partial x_{2} } & \cdots & \frac{\partial y_{2n}}{\partial x_{n} } \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{n1} }{\partial x_{1} } & \frac{\partial y_{n2} }{\partial x_{2} } & \cdots & \frac{\partial y_{nn}}{\partial x_{n} } \end{bmatrix}
（3）對矩陣求導

一般只考慮標量關於矩陣的導數，即標量y 對矩陣 X 的導數，此時的導數是梯度矩陣，可以表示為下式：

∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1n∂y∂x21∂y∂x22⋮∂y∂x2n⋯⋯⋱⋯∂y∂xn1∂y∂xn2⋮∂y∂xnn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

\frac {\partial y}{\partial \bf X} =\begin {bmatrix} \frac{\partial y }{\partial x_{11} } & \frac{\partial y }{\partial x_{21} }& \cdots & \frac{\partial y }{\partial x_{n1} } \\ \frac{\partial y}{\partial x_{12} } & \frac{\partial y}{\partial x_{22} } & \cdots & \frac{\partial y}{\partial x_{n2} } \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y }{\partial x_{1n} } & \frac{\partial y }{\partial x_{2n} } & \cdots & \frac{\partial y}{\partial x_{nn} } \end{bmatrix}