CloudWF: A computational workflow system for clouds based on Hadoop

Chen Zhang, Hans De Sterck

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

35 Citations (Scopus)

Abstract

This paper describes CloudWF, a scalable and lightweight computational workflow system for clouds on top of Hadoop. CloudWF can run workflow jobs composed of multiple Hadoop MapReduce or legacy programs. Its novelty lies in several aspects: a simple workflow description language that encodes workflow blocks and block-to-block dependencies separately as standalone executable components; a new workflow storage method that uses Hadoop HBase sparse tables to store workflow information internally and reconstruct workflow block dependencies implicitly for efficient workflow execution; transparent file staging with Hadoop DFS; and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. This paper describes the design and implementation of CloudWF.

Original languageEnglish
Title of host publicationCloud Computing - First International Conference, CloudCom 2009, Proceedings
Pages393-404
Number of pages12
Volume5931 LNCS
DOIs
Publication statusPublished - 2009
Externally publishedYes
EventIEEE International Conference on Cloud Computing Technology and Science 2009 - Beijing, China
Duration: 1 Dec 20094 Dec 2009
Conference number: 1st

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5931 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

ConferenceIEEE International Conference on Cloud Computing Technology and Science 2009
Abbreviated titleCloudCom 2009
CountryChina
CityBeijing
Period1/12/094/12/09

Cite this