簡介

hθ(x)=11 exp(−θTx)

h_\theta(x) = \frac{1}{1 exp(-\theta^Tx)}

J(θ)=−1m[∑i=1my(i)loghθ(x(i)) (1−y(i))log(1−hθ(x(x)))]

J(\theta) = -\frac{1}{m}\left[\sum_{i = 1}^my^{(i)}\log h_\theta(x^{(i)}) (1-y^{(i)})\log(1-h_\theta(x^{(x)}))\right]

，那麼對應的對數影象是：

hθ(x(i))=⎡⎣⎢⎢⎢⎢⎢p(y(i)=1∣x(i);θ)p(y(i)=2∣x(i);θ)⋮p(y(i)=k∣x(i);θ)⎤⎦⎥⎥⎥⎥⎥=1∑kj=1eθTjx(i)⎡⎣⎢⎢⎢⎢⎢⎢eθT1x(i)eθT2x(i)⋮eθTkx(i)⎤⎦⎥⎥⎥⎥⎥⎥

h_{\theta}(x^{(i)})=\left[\begin{matrix}p(y^{(i)} = 1\mid x^{(i)};\theta) \\
p(y^{(i)} = 2\mid x^{(i)};\theta) \\
\vdots \\p(y^{(i)} = k\mid x^{(i)};\theta)\end{matrix}\right]=\frac{1}{\sum_{j=1}^ke^{\theta_j^Tx^{(i)}}}\left[\begin{matrix}e^{\theta_1^Tx^{(i)}}\\e^{\theta_2^Tx^{(i)}}\\ \vdots\\ e^{\theta_k^Tx^{(i)}}\end{matrix}\right]

softmax模型引數
softmax模型的引數是k個n 1維的θ\theta組成的矩陣，輸出的是向量。

θ=⎡⎣⎢⎢⎢⎢⎢θT1θT2⋮θTk⎤⎦⎥⎥⎥⎥⎥

\theta=\left[\begin{matrix}\theta_1^T\\\theta_2^T\\\vdots \\ \theta_k^T \end{matrix}\right]

代價函式

1{⋅}1\{\cdot\}是indicator function,表示{}內的表示式正確為1，否則為0.

J(θ)=−1m⎡⎣∑i=1m∑j=1k1{y(i)=j}logeθTjx(i)∑kl=1eθTlx(i)⎤⎦

J(\theta) = – \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}\right]

J(θ)=−1m[∑i=1m(1−y(i))log(1−hθ(x(i))) y(i)loghθ(x(i))]=−1m⎡⎣∑i=1m∑j=011{y(i)=j}logp(y(i)=j|x(i);θ)⎤⎦

J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) y^{(i)} \log h_\theta(x^{(i)}) \right] \\
= – \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=0}^{1} 1\left\{y^{(i)} = j\right\} \log p(y^{(i)} = j | x^{(i)}; \theta) \right]

p(y(i)=j|x(i);θ)=eθTjx(i)∑kl=1eθTlx(i)

p(y^{(i)} = j | x^{(i)} ;\theta) = \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)}} }

∇θjJ(θ)=−1m∑i=1m[x(i)(1{y(i)=j}−p(y(i)=j|x(i);θ))]

\nabla_{\theta_j} J(\theta) = – \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} \left( 1\{ y^{(i)} = j\} – p(y^{(i)} = j | x^{(i)}; \theta) \right) \right] }

∇θjJ(θ)\nabla_{\theta_j} J(\theta) 是一個向量，它的第ll個元素∂J(θ)∂θjl\frac{\partial J(\theta)}{\partial \theta_{jl}}是J(θ)J(\theta)對θj\theta_j的第ll個分量的偏導。

權重衰減

J(θ)=−1m⎡⎣∑i=1m∑j=1k1{y(i)=j}logeθTjx(i)∑kl=1eθTlx(i)⎤⎦ λ2∑i=1k∑j=0nθ2ij

J(\theta) = – \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]
\frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2

,求導變成：

∇θjJ(θ)=−1m∑i=1m[x(i)(1{y(i)=j}−p(y(i)=j|x(i);θ))] λθj

\nabla_{\theta_j} J(\theta) = – \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} – p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } \lambda \theta_j

• 2018.07.21

• 2018.07.21

• 2018.07.21

• 2018.07.21