当前位置：文档之家› 理解机器学习的一种角度：数学工程- 张志华

理解机器学习的一种角度：数学工程- 张志华

➢ 特征工程：从数据到表示牵涉深入的领域背景知识。
➢ 自然语言处理(NLP)需要深厚的语言学背景；
➢ 视觉或者图像则需要通过认知、神经科学等来获取表示。
➢ 基于规则的模型对于浅层推理有效，但没法用来进行深层次的推理。
➢ 大规模规则空间搜索往往导致“维数灾难 ”问题。
统计机器学习
数据数据
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. NIPS 2012
Deep Residual Network
• Easy to optimize • Enable very deep structures
➢ Back-Propagation for Gradient computation (Rumelhart et al., 1986)
➢ Stochastic Gradient Descent with Momentum (Polyak, 1964) and Adam (Kingma and Ba, 2014)
◆ “Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.”
数据
表示特征
预测决策
机器学习：基于规则的学习
语义
数据
认知
策
深度强化学习
思路：利用规则或与环境交互获得奖赏来形成一个学习优化问题
1. 随机规划、最优控制、强化学习 2. 强化学习（Reinforcement Learning）与深度学习结合，
以“深度强化学习”为名得以复兴 3. MDPs或最优控制理论是研究深度学习的一个潜在的数学工具
➢ Markov Decision Processes ➢ Bellman Optimality Equation ➢ Banach Fixed Point Theory ➢ Value Iteration and Policy Iteration ➢ Q-Learning ➢ Deep Q-Networks ➢ Policy Gradient Methods (Actor-Critic)
生成对抗网络
生成对抗模型的结构 (Goodfellow et al., 2015)
生成器(Generator)和判别器(Discriminator) 引入判别器来鉴定输入的图像是真实的还是生成的生成器的目标就是使判别器无法分辨真实图像和生成
图像，它依赖这一目标调整参数；而判别器的目标是尽可能区分真实图像和生成图像，避免犯错
equivariant representation
Bayesian Perspective： Hierarchical Bayesian Model
➢ 句法模式识别 Syntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features. (King-Sun Fu 傅京孙)
➢ Deep Architecture for Representation (2006)
➢ Convolutional Structure (LeCun, 1989) and Pooling (Zhou and Chellappa, 1988)
➢ Rectified Linear Unit (ReLU) (2009, 2010)
➢ 深度模型 ➢ 计算机视觉: ImageNet ➢ GPU 实现；
Geoffrey E. Hinton and R R Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science, 2006.
AlexNet
规则
表示特征
逻辑
预测决策
思路：试图把人类对目标的认知形式化表示，从而自然地形成规则推理
传统方法 --- 基于规则学习
➢ 专家系统 An expert system is divided into two subsystems: knowledge base and inference engine. The knowledge base represents facts and rules. The inference engine applies the rules to the known facts to deduce new facts. ( Edward Feigenbaum)
⚫ 可计算性 (Computability)/易处理性(Tractability) ⚫ 稳定性 (Stability)
模型稳健性、对抗性，算法适定性，数据隐私性 ⚫ 可解释性(Interpretability)
机器学习：数学工程
➢ 统计为求解问题提供了数据驱动的建模途径 ➢ 概率论、随机分析、微分方程、微分流形等工
表示学习
➢ 机器学习的关键在于表示学习
➢ 表示需要适合预测 ➢ 表示需要适合计算
➢ 深度表示的挑战
➢ 由于大数据的需要，可能导致过参数化 ➢ 由于多层的表示，导致问题高度非凸化
机器学习的基础原则
⚫ 可预测性(Predictability)/泛化性(Generalization) 模型泛化性、算法泛化性
语义认知
模型
表示
计规算则
表示
逻辑
预测决策
预测决策
思路：用一个强大的非线性学习模型来弱化数据到表示这个过程的作用
统计机器学习：黄金发展十年
➢ 统计学习：统计建模+算法计算 ➢ SVM, Boosting，Sparse learning, Kernel machines, Nonparametric Bayes, etc.
➢ Different layers can implement different tasks such as dimensionality reduction, local approximation, local average, etc.
➢ Feedback can avoid saturation. ➢ Layer by Layer fits GPU computing very well. ➢ Convolution: sparse interaction, parameter sharing, and
➢ Batch Normalization (Ioffe and Szegedy, 2015)
无监督深度学习---生成模型
机器学习
数据
机器学习
预测决策
思路：把无监督问题形成为与有监督类似的一个学习优化过程
无监督深度学习--生成模型 (Generative Models)
真实的数据(采样点)
生成模型
-- Over 100 layers for ImageNet model
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition 多重网格方法思路
神经机器翻译：编码器-解码器架构
1. 基本单元可以是单词、子词单元和字符，如英文单词，中文字符
➢ In Machine learning scenarios, one not only cares about fitting performance but also performance measure w.r.t. the test set (Generalization).
➢ For pure optimization, minimizing the objective function is a goal in and of itself.
Boosting, Bagging 方差减少，加速 ⚫ 自适应技术
为什么深度？
➢ Shallow nets cannot provide localized approximation, but deep nets provide localized approximation [Chui et al. 1994]
理解机器学习的一种视角：
数学工程
张志华
北京大学数学科学学院
北京智源人工智能研究院 2019.05.09
机器学习的科学、神经科学 ➢ 认知科学 ➢ 机器学习：计算机科学、数学、统计学等的交叉
机器学习
◆ “Unlike artificial intelligence, machine learning aims to not mimic human thoughts and behaviors but to improve experience and interaction. ”
图片来源：/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
深度强化学习
机器学习
规则
强化学习
有监督学习
数据
预测决
➢ Deep nets (at least two hidden layers) have universal approximation with finite many neurons, but shallow nets have universal approximation with possibly infinite many neurons [Maiorov and Pinkus, 1999; Ismailov, 2016]

e商务文档

理解机器学习的一种角度：数学工程- 张志华

相关文档推荐：