Search…
AAI Calculation
comparem is a toolbox for comparative genomics. We are using it to calculate AAI (Average Amino Acid Identity) between genomes. It is installed into a conda env within SCGC's anaconda3 module, so to access it you need to enter:
1
module use /mod/scgc
2
module load anaconda3
3
source activate comparem
Copied!
To see what comparem can do, type:
1
$ comparem -h
Copied!
Output:
1
...::: CompareM v0.0.23 :::...
2
3
Common workflows:
4
aai_wf -> Calculate AAI between all pairs of genomes
5
(runs call_genes => similarity => aai)
6
classify_wf -> Identify similar genomes based on AAI values
7
(runs call_genes => similarity => classify)
8
9
Gene prediction:
10
call_genes -> Identify genes within genomes
11
12
Gene homology and genome similarity:
13
similarity -> Perform reciprocal sequence similarity search between proteins
14
aai -> Calculate AAI between all pairs of genomes
15
classify -> Identify similar genomes based on AAI value
16
17
Usage profiles:
18
aa_usage -> Calculate amino acid usage within each genome
19
codon_usage -> Calculate codon usage within each genome
20
kmer_usage -> Calculate kmer usage within each genome
21
stop_usage -> Calculate stop codon usage within each genome
22
23
Lateral gene transfer:
24
lgt_di -> Calculate dinuceotide (3rd,1st) usage of genes to identify putative LGT events
25
lgt_codon -> Calculate codon usage of genes to identify putative LGT events
26
27
Visualization and exploration:
28
diss -> Calculate the dissimilarity between usage profiles
29
hclust -> Perform hierarchical clustering
30
31
Use: comparem <command> -h for command specific help.
32
33
Feature requests or bug reports can be sent to Donovan Parks ([email protected])
34
or posted on GitHub (https://github.com/dparks1134/comparem).
Copied!
For instructions on CompareM's aai calculation workflow type:
1
$ comparem aai_wf -h
Copied!
Output:
1
usage: comparem aai_wf [-h] [-e EVALUE] [-p PER_IDENTITY] [-a PER_ALN_LEN]
2
[-x FILE_EXT] [--proteins] [--force_table FORCE_TABLE]
3
[--blastp] [--sensitive] [--keep_headers] [--keep_rbhs]
4
[--tmp_dir TMP_DIR] [-c CPUS] [--silent]
5
input_files output_dir
6
7
Calculate AAI between all pairs of genomes
8
9
positional arguments:
10
input_files genome files
11
output_dir output directory
12
13
optional arguments:
14
-h, --help show this help message and exit
15
-e, --evalue EVALUE e-value cutoff for identifying initial blast hits
16
(default: 0.001)
17
-p, --per_identity PER_IDENTITY
18
percent identity for defining homology (default: 30.0)
19
-a, --per_aln_len PER_ALN_LEN
20
percent alignment length of query sequence for
21
defining homology (default: 70.0)
22
-x, --file_ext FILE_EXT
23
extension of files to process (default: fna)
24
--proteins indicates the input files contain protein sequences
25
--force_table FORCE_TABLE
26
force use of specific translation table
27
--blastp use blastp instead of diamond
28
--sensitive use sensitive mode of DIAMOND
29
--keep_headers indicates FASTA headers already have the format
30
<genome_id>~<gene_id>
31
--keep_rbhs create file with reciprocal best hits
32
--tmp_dir TMP_DIR specify alternative directory for temporary files
33
(default: /tmp)
34
-c, --cpus CPUS number of CPUs to use (default: 1)
35
--silent suppress output
Copied!
Last modified 2yr ago
Export as PDF
Copy link