视频大数据技术趋势及实践
Surprising results are happening
The intersection of deep and reinforcement learning continues
If you aren't usห้องสมุดไป่ตู้ng batch normalization you should
来源:Brad Neuberg, 10 Deep Learning Trends at NIPS 2015, 2015.12
LSTM-E (C3D) [15] Yao et al. [32] Joint-BiLSTM reinforced (ours) 29.9 29.6 30.3
来源:Bidirectional Long-Short Term Memory for Video Description, 2016.06
深度学习技术趋势
基础 网络 数据
网上视频数据,除了有节目内容外,还包括个人 身份信息、位置信息、通信录、应用偏好信息, 可能还有账号信息等。
物理空间
现实世界
网络世界
信息空间
来源:邬贺铨,视频大数据的发展与应用,2016年03月
视频数据的常规处理
4
图像识别 图像描述 参考帧 图像定位 图像生成 语音转文本 字幕分离 OCR 特征转换 时空信息 文件信息
MU
time
MU
SU a
Model S2VT-unidirectional S2VT-bidirectional S2VT-BiLSTM reinfored Joint-LSTM unidirectional Joint-LSTM bidirectional Joint-BiLSTM reinforced
Uni: a person is cutting a potato. Bi: a person is slicing a potato. Re: a person is peeling a potato.
Uni: a man is playing. Bi: a man is playing a piano. Re: a man is playing the piano.
8
Architectures more complex and sophisticated
All the cool kids are using LSTMs
Attention models are showing up
Neural Turing Machines remain interesting
7
CNN (VGG)
Forward Pass
Backward Unit BU
FU
Backward Pass
FU
FU
FU
FU Language Model
BU
Merge
BU
BU
BU
BU <BOS> a SU man man SU is is SU riding
… …
SU
MU
MU
MU Visual Model
METEOR 28.7 28.6 29.5 29.5 29.9 30.3
… <EOS>
Uni: a man is riding a bike. Uni: a man is playing on a stage. Bi: a group of a man is dancing. Bi: a man is riding on a motorcycle. Re: a group of people are dancing. Re: a man is riding a motorcycle.
深度学习用于视频分析
6
来源:Translating Videos to Natural Language Using Deep Recurrent Neural Networks, 2015.04
深度学习用于视频分析
Forward Unit FU Merge Unit MU Sentence Unit SU
视频大数据技术趋势及实践
目录
2
01
02
03
04
技术发展概述
常规处理框架
具体实践说明
现场疑问解答
三元空间视频资源分类
三元数据 网络层面
自媒体 数据
社交网络、 博客、微博
3
物理层面
传感器、条码、二维码
社会层面
政府、企事业、平面媒体
人类认知
社会空间
日志 数据
富媒体 数据
搜索引擎、网 文本、音视频、 信令、定位、 购、金融支付 图片、照片 计费
标题 关键词 用户名 IP地址
视频内容挖掘
视频关联分析
采集日期 视频时长 运行时间 监控编号
来源:Microsoft,HAL,NBA,Karen's .etc
深度学习用于图像描述
5
Google和Microsoft的工程师开发的机器学习算法, 能自动生成图片题注说明,准确地描述图像的内容
来源:Google,2014;Microsoft,2015
Model METEOR Table 2: Comparing with several state-of-the-art LSTM 26.9 models (reported in percentage, higher is better). Joint-LSTM unidirectinal (ours) 29.5 S2VT [23] Model METEOR -RGB (VGG) 29.2 LSTM 26.9 -RGB (VGG)+Flow (AlexNet) 29.8 Joint-LSTM unidirectinal (ours) 29.5 LSTM-E (VGG) [15] 29.5 S2VT [23] LSTM-E (C3D) [15] 29.9 -RGB (VGG) 29.2 -RGB (VGG)+Flow (AlexNet) 29.8 Yao et al. [32] 29.6 LSTM-E (VGG) [15] 29.5 Joint-BiLSTM reinforced (ours) 30.3
Deep learning for CV and NLP are crosshybridizing each other Neural network research and productionisation go hand in hand
Symbolic differentiation is becoming even more important