Trinity () is a software package combining three independent software modules (Inchworm, Chrysalis, Butterfly) to process large volumes of RNA-seq reads. Running Trinity from beginning to end on large data sets may exceed the walltime limit for a single job. Trinity provides a mechanism to run the workflow in four separate steps. Each step may be run as its own job, providing a workaround for the single job walltime limit. This page describes how to run Trinity in this manner under the SLURM scheduler and provides example submit scripts.
Generally, the same Trinity command is run for each step, aside from one option that determines how far Trinity will progress before stopping. On the last step, the Trinity command is run as normal. For example,
# Step 1 Trinity.pl <options> --no_run_chrysalis # Step 2 Trinity.pl <options> --no_run_quantifygraph # Step 3 Trinity.pl <options> --no_run_butterfly # Step 4 Trinity.pl <options> |
SLURM submit scripts that will request 16 CPUs and 200GB of RAM for each step are given as examples.
#!/bin/sh #SBATCH --job-name=trinity_step1 #SBATCH --time=168:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --mem=200gb #SBATCH --output=trinity_step1.stdout #SBATCH --error=trinity_step1.stderr module load trinity/r2013-02-25bowtie/1.0.0 Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \ --right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \ --bflyCPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis |
#!/bin/sh #SBATCH --job-name=trinity_step2 #SBATCH --time=168:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --mem=200gb #SBATCH --output=trinity_step2.stdout #SBATCH --error=trinity_step2.stderr module load trinity/r2013-02-25bowtie/1.0.0 Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \ --right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \ --bflyCPU $SLURM_NTASKS_PER_NODE --no_run_quantifygraph |
#!/bin/sh #SBATCH --job-name=trinity_step3 #SBATCH --time=168:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --mem=200gb #SBATCH --output=trinity_step3.stdout #SBATCH --error=trinity_step3.stderr module load trinity/r2013-02-25bowtie/1.0.0 Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \ --right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \ --bflyCPU $SLURM_NTASKS_PER_NODE --no_run_butterfly |
#!/bin/sh #SBATCH --job-name=trinity_step4 #SBATCH --time=168:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --mem=200gb #SBATCH --output=trinity_step4.stdout #SBATCH --error=trinity_step4.stderr module load trinity/r2013-02-25bowtie/1.0.0 Trinity.pl --output trinity_out --seqType fq --JM 200G --left leftreads.fastq \ --right rightreads.fastq --CPU $SLURM_NTASKS_PER_NODE --inchworm_cpu $SLURM_NTASKS_PER_NODE \ --bflyCPU $SLURM_NTASKS_PER_NODE |
The job dependency feature of SLURM can be used to run each step sequentially as the previous step completes. All four jobs can be submitted at once and they will run in the proper order without needing any further interaction from the user. The job ID of each step is used in the submit command for the next to order the jobs. Assuming the four scripts above are saved in the working directory with the input dataset, they would be submitted as follows:
$ sbatch trinity_step1.submit Submitted batch job 366910 $ sbatch -d afterok:366910 trinity_step2.submit Submitted batch job 366911 $ sbatch -d afterok:366911 trinity_step3.submit Submitted batch job 366912 $ sbatch -d afterok:366912 trinity_step4.submit Submitted batch job 366913 |
The -d afterok option instructs SLURM to only run the submitted job if the existing specified job completes successfully. If for some reason Trinity exits with an error code for one step, SLURM will not run the next step.
Tips: Check Command
1.Check the status of your job:
$ squeue -u <username> |
Output:
JobID JobName State ExitCode Start End Elapsed ------------ ------------------------------ ---------- -------- ------------------- ------------------- ---------- [@login.tusker ~]$ squeue -u JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 426290 batch trinity_ PD 0:00 1 (Dependency) 426291 batch trinity_ PD 0:00 1 (Dependency) 426289 batch trinity_ R 10:33:59 1 c2417
2.Check a specific JOB,such as JOBID=426289
$scontrol show job426289 |
[@login.tusker ~]$ scontrol show job 426289JobId=426289 Name=trinity_step2 UserId= (3557) GroupId= (11156) Priority=30208 Account= QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=10:38:38 TimeLimit=7-00:00:00 TimeMin=N/A SubmitTime=2013-08-19T15:12:44 EligibleTime=2013-08-21T00:36:51 StartTime=2013-08-21T00:37:09 EndTime=2013-08-28T00:37:09 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=batch AllocNode:Sid=login:62036 ReqNodeList=(null) ExcNodeList=(null) NodeList=c2417 BatchHost=c2417 NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqS:C:T=*:*:* MinCPUsNode=16 MinMemoryNode=250G MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm/trinity_step2.submit WorkDir=/lustre/work/entomology/hwang4/WCR_RNAseq_2013/Fallarmyworm
3.Check your job history after a specific date. For example, all jobs run since 08-14-2013.
$ sacct -u <username> -S081413-o JobId,JobName%30,State,ExitCode,Start,End,Elapse |
Output:
JobID JobName State ExitCode Start End Elapsed ------------ ------------------------------ ---------- -------- ------------------- ------------------- ---------- 382339 trinity_step1 COMPLETED 0:0 2013-08-13T09:47:18 2013-08-13T22:03:39 12:16:21 382339.batc+ batch COMPLETED 0:0 2013-08-13T09:47:18 2013-08-13T22:03:39 12:16:21 382846 trinity_step2 CANCELLED+ 0:0 2013-08-13T22:03:39 2013-08-14T15:40:45 17:37:06 426288 trinity_step1 RUNNING 0:0 2013-08-20T15:24:23 Unknown 00:14:21 426289 trinity_step2 PENDING 0:0 Unknown Unknown 00:00:00 426290 trinity_step3 PENDING 0:0 Unknown Unknown 00:00:00 426291 trinity_step4 PENDING 0:0 Unknown Unknown 00:00:00