# 【機器學習】交叉熵函式的使用及推導

## 1.從方差代價函式說起

w <—— w – η* ∂C/∂w = w – η * a *σ′(z)

b <—— b – η* ∂C/∂b = b – η * a * σ′(z)

## 交叉熵損失函式

J(θ)=−1m∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))),J(θ)=−1m∑i=1my(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i))),

∂∂θjJ(θ)=1m∑i=1m(hθ(x(i))−y(i))x(i)j∂∂θjJ(θ)=1m∑i=1m(hθ(x(i))−y(i))xj(i)

• logistic迴歸（是非問題）中，y(i)y(i)取0或者1；
• softmax迴歸（多分類問題）中，y(i)y(i)取1,2…k中的一個表示類別標號的一個數（假設共有k類）。

θTx(i):=θ0+θ1x(i)1+⋯+θpx(i)p.θTx(i):=θ0+θ1×1(i)+⋯+θpxp(i).

hθ(x(i))=11+e−θTx(i)hθ(x(i))=11+e−θTx(i)

P(y^(i)=1|x(i);θ)=hθ(x(i))P(y^(i)=1|x(i);θ)=hθ(x(i))
P(y^(i)=0|x(i);θ)=1−hθ(x(i))P(y^(i)=0|x(i);θ)=1−hθ(x(i))

logP(y^(i)=1|x(i);θ)=loghθ(x(i))=log11+e−θTx(i),log⁡P(y^(i)=1|x(i);θ)=log⁡hθ(x(i))=log⁡11+e−θTx(i),
logP(y^(i)=0|x(i);θ)=log(1−hθ(x(i)))=loge−θTx(i)1+e−θTx(i).log⁡P(y^(i)=0|x(i);θ)=log⁡(1−hθ(x(i)))=log⁡e−θTx(i)1+e−θTx(i).

I{y(i)=1}logP(y^(i)=1|x(i);θ)+I{y(i)=0}logP(y^(i)=0|x(i);θ)=y(i)logP(y^(i)=1|x(i);θ)+(1−y(i))logP(y^(i)=0|x(i);θ)=y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))I{y(i)=1}log⁡P(y^(i)=1|x(i);θ)+I{y(i)=0}log⁡P(y^(i)=0|x(i);θ)=y(i)log⁡P(y^(i)=1|x(i);θ)+(1−y(i))log⁡P(y^(i)=0|x(i);θ)=y(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))

function），簡單理解為{ }內條件成立時，取1，否則取0，這裡不贅言。

∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))∑i=1my(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))

J(θ)=−1m∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))J(θ)=−1m∑i=1my(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))

## 交叉熵損失函式的求導

①  logab=loga−logb  log⁡ab=log⁡a−log⁡b
②  loga+logb=log(ab)  log⁡a+log⁡b=log⁡(ab)
③  a=logea  a=log⁡ea

J(θ)=−1m∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))J(θ)=−1m∑i=1my(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))

loghθ(x(i))=log11+e−θTx(i)=−log(1+e−θTx(i)) ,log(1−hθ(x(i)))=log(1−11+e−θTx(i))=log(e−θTx(i)1+e−θTx(i))=log(e−θTx(i))−log(1+e−θTx(i))=−θTx(i)−log(1+e−θTx(i))①③ .log⁡hθ(x(i))=log⁡11+e−θTx(i)=−log⁡(1+e−θTx(i)) ,log⁡(1−hθ(x(i)))=log⁡(1−11+e−θTx(i))=log⁡(e−θTx(i)1+e−θTx(i))=log⁡(e−θTx(i))−log⁡(1+e−θTx(i))=−θTx(i)−log⁡(1+e−θTx(i))①③ .

J(θ)=−1m∑i=1m[−y(i)(log(1+e−θTx(i)))+(1−y(i))(−θTx(i)−log(1+e−θTx(i)))]=−1m∑i=1m[y(i)θTx(i)−θTx(i)−log(1+e−θTx(i))]=−1m∑i=1m[y(i)θTx(i)−logeθTx(i)−log(1+e−θTx(i))]③=−1m∑i=1m[y(i)θTx(i)−(logeθTx(i)+log(1+e−θTx(i)))]②=−1m∑i=1m[y(i)θTx(i)−log(1+eθTx(i))]J(θ)=−1m∑i=1m[−y(i)(log⁡(1+e−θTx(i)))+(1−y(i))(−θTx(i)−log⁡(1+e−θTx(i)))]=−1m∑i=1m[y(i)θTx(i)−θTx(i)−log⁡(1+e−θTx(i))]=−1m∑i=1m[y(i)θTx(i)−log⁡eθTx(i)−log⁡(1+e−θTx(i))]③=−1m∑i=1m[y(i)θTx(i)−(log⁡eθTx(i)+log⁡(1+e−θTx(i)))]②=−1m∑i=1m[y(i)θTx(i)−log⁡(1+eθTx(i))]

∂∂θjJ(θ)=∂∂θj(1m∑i=1m[log(1+eθTx(i))−y(i)θTx(i)])=1m∑i=1m[∂∂θjlog(1+eθTx(i))−∂∂θj(y(i)θTx(i))]=1m∑i=1m⎛⎝x(i)jeθTx(i)1+eθTx(i)−y(i)x(i)j⎞⎠=1m∑i=1m(hθ(x(i))−y(i))x(i)j∂∂θjJ(θ)=∂∂θj(1m∑i=1m[log⁡(1+eθTx(i))−y(i)θTx(i)])=1m∑i=1m[∂∂θjlog⁡(1+eθTx(i))−∂∂θj(y(i)θTx(i))]=1m∑i=1m(xj(i)eθTx(i)1+eθTx(i)−y(i)xj(i))=1m∑i=1m(hθ(x(i))−y(i))xj(i)

∂∂θjJ(θ)=1m∑i=1m(hθ(x(i))−y(i))x(i)j

from：https://blog.csdn.net/jasonzzj/article/details/52017438

：https://blog.csdn.net/u012162613/article/details/44239919