Abstract

Motivation: With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users’ experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow.

Results: Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease.

Accepted manuscript

Notes

  • BioQueue open platform is now migrated to open.bioqueue.org.
  • In newer versions of BioQueue, we modified the syntax, and wildcards are now encapsulated in {{xxx}} instead of {xxx}. For instance, the parameter for hisat2 in Table 1 should now be written as -p {{ThreadN}} --dta -x {{HISAT2_HG38}} -1 {{InputFile:1}} -2 {{InputFile:2}} -S {{EXP}}.sam.