当前位置:文档之家› Oozie工作流框架使用指南

Oozie工作流框架使用指南

2013年10月



System Installation Job Design Oozie Overview WorkFlow Design Coordinator Design Reference

Requirements
◦ Cloudera Manager ◦ CDH

Oozie

Job Designer Job Design Example
◦ Fs Action ◦ MapReduce Action

Oozie

WorkFlow
◦ A workflow scheduler system to manage Apache Hadoop jobs
◦ Workflow jobs are Directed Acyclical Graphs (DAGs) of actions ◦ Coordinator jobs are recurrent workflow jobs triggered by time (frequency) and data availabilty. ◦ MR\FS\Email\Shell\Ssh\Hive\Pig\Sqoop\Distcp\Java

Coordinator


Integrated
Scalable Reliable Extensible


Does not support circles Workflow Nodes
◦ Control Flow Nodes ◦ Workflow Action Nodes

WorkFlow Recovery

Oozie
◦ Home page ◦ Documents
Hale Waihona Puke Hue◦ Home page ◦ Tutorials

Expression Language
◦ Tutorials
◦ oozie.wf.rerun.failnodes ◦ oozie.wf.rerun.skip.nodes





Workflow definitions can be parameterized When workflow node is executed by Oozie all the ELs are resolved into concrete values EL expressions can be used in the configuration values of action and decision nodes Workflow Job Properties (or Parameters) Expression Language Functions
Start
Map Reduce1
输出>500行

Map Reduce2
End

Coordinator execute workflow jobs:
◦ Recurrent ◦ Interdependent

Coordinator Based:
◦ Time intervals ◦ Data availability ◦ Time intervals and/or data availability



Datetime, Frequency and Time-Period Coordinator Action Parameterization of Coordinator Coordinator Design

定期创建文件夹
◦ ◦ ◦ ◦ 开始时间: 结束时间: 频率:每3分钟一次 文件夹名字为标定时间格式化yyyy-MM-ddTHH-mm
◦ Install ◦ Config

Hue
◦ Install ◦ Config (2 items) ◦ Basic Operation

Action Overview
◦ An execution/computation task (Map-Reduce job, Pig job, a shell command). It can also be referred as task or 'action node'.
相关主题