sag-mg-recruit
Metagenomic read recruitment workflow developed by the Stepanauskas Group, used in Pachiadaki et al. 2017.
Available on C1 and C2 via SCGC's anaconda module.
You'll also need to load the dependencies flash and bwa.
To load into environment:
module use /mod/scgc/
module load anaconda
module load flash
module load bwa
For instructions on how to run type:
sag-mg-recruit --help
Which should return something like:
Usage: sag-mg-recruit [OPTIONS] INPUT_MG_TABLE INPUT_SAG_TABLE
Options:
--outdir TEXT directory location to place output files
--cores INTEGER number of cores to run on [default: 8]
--mmd FLOAT for join step: mismatch density [default: 0.05]
--mino INTEGER for join step: minimum overlap [default: 35]
--maxo INTEGER for join step: maximum overlap [default: 150]
--minlen INTEGER for alignment and mg read count: minimum alignment
length to include; minimum read size to include
[default: 150]
--pctid INTEGER for alignment: minimum percent identity to keep
within overlapping region [default: 95]
--overlap INTEGER for alignment: percent read that must overlap with
reference sequence to keep [default: 0]
--log TEXT name of log file, else, log sent to standard out
--concatenate BOOLEAN include concatenated SAG in analysis [default: True]
--checkm BOOLEAN should checkm be run on the SAGs? [default: True]
--keep_coverage if you want to keep the genome coverage table (large)
-h, --help Show this message and exit.
Each run requires a table listing input metagenomes and a table listing input SAGs. Example input tables can be found here. Make sure you also specify a new directory for output files using the --outdir parameter.
This workflow is not necessarily optimized for our current HPC environment as it was written pre-scheduler installation. It runs metagenomic read recruitment to SAGs one pair at a time. Good parameters to run this workflow might be 12 - 30 cores and a walltime dependent upon how many metagenomes and sags you are looking to compare as well as the size of your input metagenomes, something between 24 hours and a week.
It's worth noting that this workflow was designed with the recruitment of metagenomic reads generated by Illumina sequencers in mind.
Last modified 3yr ago