6 Metabolic pathway

1. BlastKOALA

BlastKOALA is an automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem.

BlastKOALA takes as input amino acid sequences in FASTA format. We therefore use prodigal to translate nucleic acid sequences to the corresponding peptide sequences.By default prodigal takes as input FASTA files with the .fna extension.

Use perl to add the name of the sample to the beginning of every file and change the file type to .fna.

for i in *.fa ; do  perl -lne 'if(/^>(\S+)/){ print ">$ARGV $1"} else{ print }' $i > $i.fna ; done

Replace space between the name of you bins and the scaffhold to underscores

for i in *.fna ; do sed -i 's/ /_/g' $i ; done

Run prodigal

for i in *.fna ; do prodigal -i $i -o output.txt -a $i.faa ; done

Remove all the characters after the first space in the header

for i in *.faa; do sed -i 's/\s.*$//' $i; done

Transfer the files to your local computer and follow the instructions on the BlastKOALA website for submission.

2. kofamscan

Move or copy the generated amino acids (.faa) to the user genomics2 and log in to titan under this user
Activate the environnement

conda activate kofamscan

Execute the annotation using a foor loop and nohup

for i in *.faa; do exec_annotation -p /home/SCRIPT/kofamscan/profiles -k /home/SCRIPT/kofamscan/ko_list \
--cpu 40 -f detail-tsv --tmp-dir kofamscan-tmp -o ~/$i $i ; done

Resulting file will have the .tsv extension and can easily be parsed using R.

** 3. Metabolic **

User genomics2

conda activate METABOLIC_v4.0
# Run this command using nohup with a script savec in the metabolic directory
perl METABOLIC-G.pl -in-gn /home/genomics2/karine/ARC_NANO_FAA/ -o /home/genomics2/karine/arc_metabolic_out

3. Other programs we aren’t using anymore

DRAM

DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.

Activate environment

conda activate DRAM

Execute DRAM. Create a bash script called run_dram.sh with the following commands and execute the script using nohup.Note : DRAM requires the full path (home/genomics/…) and won’t accept path starting from the home directory (~).

DRAM.py annotate -i '/home/genomics/yourname/samples/*.fa' -o /home/genomics/yourname/samples/annotation --threads 40

Once annotation is done, the following command will summarize all the results from the folder annotate

DRAM.py distill -i annotation/annotations.tsv -o summaries --trna_path annotation/trnas.tsv --rrna_path annotation/rrnas.tsv

Metabolic

METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes) enables the prediction of metabolic and biogeochemical functional trait profiles to any given genome datasets.

Activate environment

conda activate metabolic

METABOLIC takes as input amino acid sequences in FASTA format. Create a bash script called run_metabolic.sh with the following commands and execute the script using nohup. Before running update the path to reflect the actual path where your amino acid FASTA files are located and where you want METABOLIC to store the output files.

#!/bin/bash
perl /home/genomics/METABOLIC/METABOLIC-G.pl -in /home/genomics/user/07_metabolic_pathway/ -o /home/genomics/user/07_metabolic_pathway/metabolic_outut/

MEBS

[MEBS] (https://github.com/valdeanda/mebs)(Multigenomic Entropy-Based Score) allows the user to synthesizes genomic information into a single informative value. This entropy score can be used to infer the likelihood that microbial taxa perform specific metabolic-biogeochemical pathways.

For the moment MEBS shoudl be installed on the user’s local computer. See manual for installation and instructions.