Search…
sag-mg-recruit
Metagenomic read recruitment workflow developed by the Stepanauskas Group, used in Pachiadaki et al. 2017.
Package github page where extensive instructions can be found: sag-mg-recruit
Available on C1 and C2 via SCGC's anaconda module.
You'll also need to load the dependencies flash and bwa.
To load into environment:
1
module use /mod/scgc/
2
module load anaconda
3
module load flash
4
module load bwa
Copied!
For instructions on how to run type: sag-mg-recruit --help
Which should return something like:
1
Usage: sag-mg-recruit [OPTIONS] INPUT_MG_TABLE INPUT_SAG_TABLE
2
Options:
3
--outdir TEXT directory location to place output files
4
--cores INTEGER number of cores to run on [default: 8]
5
--mmd FLOAT for join step: mismatch density [default: 0.05]
6
--mino INTEGER for join step: minimum overlap [default: 35]
7
--maxo INTEGER for join step: maximum overlap [default: 150]
8
--minlen INTEGER for alignment and mg read count: minimum alignment
9
length to include; minimum read size to include
10
[default: 150]
11
--pctid INTEGER for alignment: minimum percent identity to keep
12
within overlapping region [default: 95]
13
--overlap INTEGER for alignment: percent read that must overlap with
14
reference sequence to keep [default: 0]
15
--log TEXT name of log file, else, log sent to standard out
16
--concatenate BOOLEAN include concatenated SAG in analysis [default: True]
17
--checkm BOOLEAN should checkm be run on the SAGs? [default: True]
18
--keep_coverage if you want to keep the genome coverage table (large)
19
-h, --help Show this message and exit.
Copied!
Each run requires a table listing input metagenomes and a table listing input SAGs. Example input tables can be found here. Make sure you also specify a new directory for output files using the --outdir parameter.
This workflow is not necessarily optimized for our current HPC environment as it was written pre-scheduler installation. It runs metagenomic read recruitment to SAGs one pair at a time. Good parameters to run this workflow might be 12 - 30 cores and a walltime dependent upon how many metagenomes and sags you are looking to compare as well as the size of your input metagenomes, something between 24 hours and a week.
It's worth noting that this workflow was designed with the recruitment of metagenomic reads generated by Illumina sequencers in mind.
Last modified 2yr ago
Export as PDF
Copy link