TY - JOUR
T1 - A scheduling algorithm for computational grids that minimizes centralized processing in genome assembly of next-generation sequencing data
AU - Lima, Jakelyne
AU - Cerdeira, Louise Teixeira
AU - Bol, Erick
AU - Schneider, Maria Paula Cruz
AU - Silva, Artur
AU - Azevedo, Vasco
AU - Abelém, Antônio Jorge Gomes
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality.
AB - Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality.
KW - Computational grids
KW - Genome assembly
KW - NGS
KW - Task scheduling
UR - http://www.scopus.com/inward/record.url?scp=84876039509&partnerID=8YFLogxK
U2 - 10.3389/fgene.2012.00038
DO - 10.3389/fgene.2012.00038
M3 - Article
AN - SCOPUS:84876039509
VL - 3
JO - Frontiers in Genetics
JF - Frontiers in Genetics
SN - 1664-8021
IS - MAR
M1 - Article 38
ER -