TY - JOUR
T1 - Efficient data consolidation in grid networks and performance analysis
AU - Kokkinos, P.
AU - Christodoulopoulos, K.
AU - Varvarigos, E.
PY - 2011/2
Y1 - 2011/2
N2 - We examine a task scheduling and data migration problem for grid networks, which we refer to as the Data Consolidation (DC) problem. DC arises when a task concurrently requests multiple pieces of data, possibly scattered throughout the grid network, that have to be present at a selected site before the task's execution starts. In such a case, the scheduler and the data manager must select (i) the data replicas to be used, (ii) the site where these data will be gathered for the task to be executed, and (iii) the routing paths to be followed; this is assuming that the selected datasets are transferred concurrently to the execution site. The algorithms or policies for selecting the data replicas, the data consolidating site and the corresponding paths comprise a Data Consolidation scheme. We propose and experimentally evaluate several DC schemes of polynomial number of operations that attempt to estimate the cost of the concurrent data transfers, to avoid congestion that may appear due to these transfers and to provide fault tolerance. Our simulation results strengthen our belief that DC is an important problem that needs to be addressed in the design of data grids, and can lead, if performed efficiently, to significant benefits in terms of task delay, network load and other performance parameters.
AB - We examine a task scheduling and data migration problem for grid networks, which we refer to as the Data Consolidation (DC) problem. DC arises when a task concurrently requests multiple pieces of data, possibly scattered throughout the grid network, that have to be present at a selected site before the task's execution starts. In such a case, the scheduler and the data manager must select (i) the data replicas to be used, (ii) the site where these data will be gathered for the task to be executed, and (iii) the routing paths to be followed; this is assuming that the selected datasets are transferred concurrently to the execution site. The algorithms or policies for selecting the data replicas, the data consolidating site and the corresponding paths comprise a Data Consolidation scheme. We propose and experimentally evaluate several DC schemes of polynomial number of operations that attempt to estimate the cost of the concurrent data transfers, to avoid congestion that may appear due to these transfers and to provide fault tolerance. Our simulation results strengthen our belief that DC is an important problem that needs to be addressed in the design of data grids, and can lead, if performed efficiently, to significant benefits in terms of task delay, network load and other performance parameters.
KW - Data migration
KW - Grids
KW - Routing
KW - Task scheduling
UR - http://www.scopus.com/inward/record.url?scp=77958008748&partnerID=8YFLogxK
U2 - 10.1016/j.future.2010.08.005
DO - 10.1016/j.future.2010.08.005
M3 - Article
AN - SCOPUS:77958008748
SN - 0167-739X
VL - 27
SP - 182
EP - 194
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
IS - 2
ER -