当前位置:文档之家› 最大熵值法

最大熵值法

~ LP (P ) ≡ log ∏ P(w | h ) h,w ~ P (h , w )
~ = ∑ P (h, w) log P(w | h )
h,w
指数型代入
~ ~ ) ~ LP (P ) = ∑ P (h, w) ∑ λi f i (h, w) ∑ P (h, w)log ∑ exp ∑ λi f i (h, w h ,w w i h ,w i ~ ~ = ∑ P (h, w)∑ λi f i (h, w) ∑ P (h )log ∑ exp ∑ λi f i (h, w) h ,w i h w i
h ,w
~ P( f ) = P ( f )
(任意的机率分布)
– 称为限制(方程式)
特徵与限制
~ ∑ P(h, w) f (h, w) = ∑ P (h, w) f (h, w)
h,w h,w
P(h, w) P (w | h ) = P(h )
~ P (h ) ≈ P (h )
~ ~ P (h )P(w | h ) f (h, w) = ∑ P (h, w) f (h, w) ∑
∵ (A) h, ∑ P(w | h ) = 1
w
γ
γ ∑ P(w | h ) = ∑ exp ∑ λi f i (h, w) exp ~ 1 = 1 P (h ) w w i exp exp γ ~ 1 ∑ exp ∑ λi f i (h, w) = 1 P (h ) w i γ ~ 1 = P (h ) 1 ∑ exp ∑ λ f (h, w)
简介
差补法(interpolation)与最大熵值法的差别
– 个别与整体训练
掷骰子问题
P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1 1 P (1) + P(2 ) = 2
P(2 ) + P(5) = 3 10
– 满足限制的组合有无穷多种
简介
熵值计算:
h,w i h
∑ exp ∑ (λ
w
+ δ i ) f i (h, w) i ∑ exp ∑ λi f i (h, w) w i
i i i i i
~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )
h,w i h
∑ exp ∑ λ f (h, w) exp ∑ δ f (h, w)
i i w i
指数型
经推导后
P * (w | h ) = exp ∑ λi f i (h, w) i exp ∑ λi f i (h, w) ∑ i w 1 = exp ∑ λi f i (h, w) Z (h ) i 1
最大熵值法与最大相似法
训练语料之对数相似值
(
#
(h, w))
IIS演算法
~ 输入 : n个特徵 f 1 , f 2 , , f n 与训练语料的机率分布 P (h, w ) 输出 : 最佳参数 ∧ = (λ1 , λ 2 , , λ n ) 2. 对每一个 λi 进行以下运算 a.由下式中解得 δ i ~ ~ P (h )P∧ (w|h ) f i (h,w ) exp δi f # (h,w ) = ∑ P (h,w ) f i (h,w ) ∑ 1. 所有 λi 的初始值设为 0
~ ~ Z (h ) ≥ ∑ P (h, w)∑ δ i f i (h, w) + ∑ P (h )1 ∧ + Z ∧ (h ) h,w i h
Z (h ) ~ ~ ~ ~ LP (∧ + ) LP (∧ ) = ∑ P (h, w)∑ δ i f i (h, w) ∑ P (h )log ∧ + Z ∧ (h ) h,w i h
~ = arg max ∑ P (h )P(w | h )log P(w | h ) P∈C h,w
指数型
A. B. C.
0 ≤ P(w | h ) ≤ 1 h, w
h,w h ,w
∑ P (w | h ) = 1
w
h
~ ~ P (h )P(w | h ) f i (h, w) = ∑ P (h, w) f i (h, w) ∑
–当 – 熵值:
H (P ) = log 2 (1 / N )
H (P ) = ∑ P (wi ) log 2 P(wi )
i =1 N
P(w1 ) = P(w2 ) = ... = P(w N ) = 1
N
– 平均分布←→最大熵值
特徵与限制
『交通』
昨天 今天 台北 高雄
二连语言模型
W=很好
日期 地点
~ H (P ) = ∑ P (h )P(w | h ) log P(w | h )
h,w
for i = 1,..., n
使用Lagrange multiplier ~ ξ (P, ∧, γ ) = ∑ P (h )P(w | h ) log P(w | h )
h ,w
~ ~ ∑ P (h )P (w | h ) f i (h, w) ∑ P (h, w) f i (h, w) + ∑ λi i h ,w h ,w γ ∑ P (w | h ) 1 w
w i i
∑ exp ∑ λ f (h, w)
i i w i
exp ∑ λi f i (h, w) ~ ~ i exp δ f (h, w) = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ ∑ i i h,w i h w i ∑ exp ∑ λi f i (h, w) w i ~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ p ∧ (w | h )exp ∑ δ i f i (h, w) h,w i h w i
IIS 演算法
~ A( | ∧ ) = ∑ P (h, w)∑ δ i f i (h, w)
h,w i
f (h, w) ~ + 1 ∑ P (h )∑ p ∧ (w | h ) exp f # (h, w)∑ δ i i# f (h, w) h w i
A( | ∧ )
f (h, w) ~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ P∧ (w | h )exp f # (h, w)∑ δ i i# f (h, w) h,w i h w i f (h, w) ~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ P∧ (w | h )exp ∑ i# δ i f # (h, w) h,w i h w i f (h, w)
h,w h,w
~ = P ( f i ) P( f i )
找到ㄧ个机率函数 P (w | h) 满足所有的特徵, 并且使预测训练语料之对数相似值为最大
*
IIS (Improved Iterative Scaling)演算法
~ ~ LP (∧ + ) LP (∧ ) ~ ~ = ∑ P (h, w) log P∧ + (w | h ) ∑ P (h, w) log P∧ (w | h ) h,w h,w
指数型
ξ ~ ~ = P (h )(1 + log P(w | h )) + ∑ λi P (h ) f i (h, w) γ = 0 P(w | h ) i ~ ~ P (h ) 1 + log P * (w | h ) = ∑ λi P (h ) f i (h, w) γ
(
)
i
log P * (w | h ) = ∑ λi f i (h, w) ~ 1 P (h ) i γ P * (w | h ) = exp ∑ λi f i (h, w) exp ~ 1 P (h ) i
h,w h,w
指数型
满足n个特徵的机率模型C
~ C ≡ P ∈ P | P( f i ) = P ( f i ) for i = 1,2,..., n
{
}
欲求条件熵值: ~ H (P ) ≡ ∑ P (h )∑ P (w | h ) log P (w | h )
h w
从集合C找出最大熵值机率模型 P * = arg max H (P ) P∈C
(
)
IIS 演算法
f (h, w) ~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ P∧ (w | h ) ∑ i# exp δ i f # (h, w) h,w i h w i f (h, w) B( | ∧ )
(
)
B( | ∧ ) ~ ~ = ∑ P (h, w) f i (h, w) ∑ P (h )∑ P∧ (w | h ) f i (h, w) exp δ i f δ i h,w h w
exp ∑ P( x )Q( x ) ≤ ∑ P( x )exp(Q( x )) Jensen不等式: x x
(
)
f (h, w) ~ ~ ≥ ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )∑ P∧ (w | h ) ∑ i# exp δ i f # (h, w) h,w i h w i f (h, w)
IIS 演算法
log a ≥ 1 a, a > 0
Байду номын сангаас
~ ~ Z (h ) = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h ) ∧ + Z (h ) h,w i h ∧ ~ ~ = ∑ P (h, w)∑ δ i f i (h, w) + 1 ∑ P (h )
相关主题