当前位置:文档之家› 机器学习中的贝叶斯进阶

机器学习中的贝叶斯进阶

• Even the subjective nature of the conclusions through their dependence on the choice of prior is seen by some as a source of difficulty.
• Reducing the dependence on the prior is one motivation for socalled noninformative priors. However, these lead to difficulties when comparing different models, and indeed Bayesian methods based on poor choices of prior can give poor results with high confidence.
• Frequentist evaluation methods offer some protection from such problems, and techniques such as cross-validation remain useful in areas such as model comparison.
• More recently, highly efficient deterministic approximation schemes such as variational Bayes and expectation propagation have been developed.
• These offer a complementary alternative to sampling methods and have allowed Bayesian techniques to be used in large-scale applications
Graphical Models.(有向无环图)
Modeling
Question: 1.P(Letter)=P(Letter | Grade)? 2.P(Letter)=P(Letter | Difficulty)? 3.P(Letter|Grade)=P(Letter | Grade,Difficulty)? 4.P(Grade|Difficulty)=P(Grade |Difficulty,Intelligence)? 5. P(Grade|Intelligence)=P(Grade |SAT,Intelligence)? 6. P(Difficulty)=P(Difficulty | Intelligence)? 7. P(Difficulty|Grade)=P(Difficulty |Grade,Intelligence)?
Conjugate prior
• 似然是二项分布,先验是贝塔分布,则后 验也是贝塔分布
• 似然是泊松、正态、指数分布,先验是 Gama分布,则后验也是Gama分布
• 似然是正态分布,先验正态分布,则后验 也是正态分布
• 似然是多项分布,先验Dirichlet分布,则后 验也是Dirichlet分布
• Loopy Belief Propagation。之前我们讲到Belief Propagation 算法, 是应用在无环的图上的,然后LBP就是不管图有没有环,都直接使用 这个算法去求解。
Approximate Inference
Learning
• Learning可以分很多情况,比如估计参数, 或者估计模型的结构,或者两者都有。根 据变量是否被观察到和结构是否已知可以 对learning方法分类(见下图):
Probabilistic Graphical Models
• Modeling(how to encode a graph) • Inference(given data,known graph) • Learning(given data,learning parameter;
learning structure of graph) • Ads of PGM:Knowledge meets data • Bayes network is a kind of Probabilistic
《PRML》
Difficulty In Bayes
• MCMC
Difficulty In Bayes
• Monte Carlo methods are very flexible and can be applied to a wide range of models. However, they are computationally intensive and have mainly been used for small-scale problems.
• 但是对很多贝叶斯推断问题来说,有时候后验分布过于复 杂,使得积分没有显示结果,数值方法也很难应用;有时 候需要计算多重积分(比如后验分布是多元分布时)。
Difficulty In Bayes
• Review the practical application of Bayesian methods was for a long time severely limited by the difficulties in carrying through the full Bayesian procedure, particularly the need to marginalize (sum or integrate) over the whole of parameter space, which, as we shall see, is required in order to make predictions or to compare different models.
Conjugate prior
• Beta分布:
Beta( 20, 20)
Beta(
,
)
( ) ( )( )
x
1 (1
x)
1
正面概率p先验分布为Beta(α, β) 抛掷硬币得10次正面,5次反面 则其后验分布为Beta(αr
• 似然是二项分布,先验是Beta分布(这里设α=β =1),后验也是Beta分布
n
先验分布:
似然函数:
后验分布:
那么当样本数 n 时呢?
Bayes Review
Why we use Bayes in machine learing:
《machine learing》 [Mitchell]
Bayes Review
《machine learing》 [Mitchell]
Conjugate prior
Bayes' theory
Content
• Bayes Review • Conjugate Prior • Difficulty In Bayes • Bayes Network
Bayes Review
• 抛掷硬币三次,结果为正面,正面,正面
• 频率学派:P(Biased coin)=1
• 贝叶斯学派:
• Variational Infernece。变分推断采取的是另一种做法,通过限制近似 分布的类型,得到一种局部最优,但具有确定解的近似后验分布。最 简单的方法叫做mean-field approximation,就是把概率图里的点全部 解耦,看做相互独立的,然后对每个点引进一个变分参数,通过循环 地迭代参数来最小化近似分布和真实分布的KL距离。
《PRML》
Difficulty In Bayes
• 在上节我们看到,贝叶斯统计学是利用后验分布对θ进行 推断。这种推断的计算很多情况下要用积分计算来完成。 比如,我们要计算θ的函数g(θ)的期望:
E(g( | x)) g( ) f|x ( | x) d
• 其中函数f表示后验分布。当g(θ)=θ时,得到的就是关于 θ的点估计。
Modeling
Question: 1.P(Letter)=P(Letter | Grade)? 2.P(Letter)=P(Letter | Difficulty)? 3.P(Letter|Grade)=P(Letter | Grade,Difficulty)? 4.P(Grade|Difficulty)=P(Grade |Difficulty,Intelligence)? 5. P(Grade|Intelligence)=P(Grade |SAT,Intelligence)? 6. P(Difficulty)=P(Difficulty | Intelligence)? 7. P(Difficulty|Grade)=P(Difficulty |Grade,Intelligence)?
• Qustion:
Inference
• Exact Inference • Approximate Inference
– Loopy Belief Propagation – Variational Inference
Mean Field Approximation
– Monte Carlo Methods:
Markov Chain Monte Carlo
Approximate Inference
• Sampling(Monte Carlo) methods。采样做法就是通过抽取大量的样 本来逼近真实的分布。最简单的是importance sampling,根据出现的 结果的比例采样。在高维空间中更有效的方法叫Markov Chain Monte Carlo,是利用马科夫链的性质来生成某个分布的样本,包括 Metropolis-Hastings算法和Gibbs Samppling算法等。
错,错,对,错,对,对,错
相关主题