当前位置:
文档之家› Oozie工作流框架使用指南
Oozie工作流框架使用指南
2013年10月
System Installation Job Design Oozie Overview WorkFlow Design Coordinator Design Reference
Requirements
◦ Cloudera Manager ◦ CDH
Oozie
Job Designer Job Design Example
◦ Fs Action ◦ MapReduce Action
Oozie
WorkFlow
◦ A workflow scheduler system to manage Apache Hadoop jobs
◦ Workflow jobs are Directed Acyclical Graphs (DAGs) of actions ◦ Coordinator jobs are recurrent workflow jobs triggered by time (frequency) and data availabilty. ◦ MR\FS\Email\Shell\Ssh\Hive\Pig\Sqoop\Distcp\Java
Coordinator
Integrated
Scalable Reliable Extensible
Does not support circles Workflow Nodes
◦ Control Flow Nodes ◦ Workflow Action Nodes
WorkFlow Recovery
Oozie
◦ Home page ◦ Documents
Hale Waihona Puke Hue◦ Home page ◦ Tutorials
Expression Language
◦ Tutorials
◦ oozie.wf.rerun.failnodes ◦ oozie.wf.rerun.skip.nodes
Workflow definitions can be parameterized When workflow node is executed by Oozie all the ELs are resolved into concrete values EL expressions can be used in the configuration values of action and decision nodes Workflow Job Properties (or Parameters) Expression Language Functions
Start
Map Reduce1
输出>500行
是
Map Reduce2
End
Coordinator execute workflow jobs:
◦ Recurrent ◦ Interdependent
Coordinator Based:
◦ Time intervals ◦ Data availability ◦ Time intervals and/or data availability
Datetime, Frequency and Time-Period Coordinator Action Parameterization of Coordinator Coordinator Design
定期创建文件夹
◦ ◦ ◦ ◦ 开始时间: 结束时间: 频率:每3分钟一次 文件夹名字为标定时间格式化yyyy-MM-ddTHH-mm
◦ Install ◦ Config
Hue
◦ Install ◦ Config (2 items) ◦ Basic Operation
Action Overview
◦ An execution/computation task (Map-Reduce job, Pig job, a shell command). It can also be referred as task or 'action node'.