4 Taxonomic profiles

Barrnapp

1. Running Barrnap

Barrnap (BAsic Rapid Ribosomal RNA Predictor) predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), metazoan mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). Barrnap takes FASTA DNA sequence as input and can be used on both the assembled contigs (community) (directory 04_contigs) and MAGs (directory 06_mags).

  1. Add the name of the sample at the beginning of every scaffhold ID and change the extension type to .fna (same as .fasta, we are simply using a different extension to distinguish new file with the sample name in front of each scaffhold from the old file).
for i in *.fasta ; do  perl -lne 'if(/^>(\S+)/){ print ">$ARGV $1"} else{ print }' $i > $i.fna ; done
  1. Change the space between the name and the scaffhold number to an underscore
sed -i 's/ /_/g' *.fna
  1. Run Barrnap. Create a bash script called run_barrnap.sh with the following commands and execute the script using nohup.
#!/bin/bash 
for i in *.fna ; do barrnap $i > $i.barrnap_hits.gff --threads 20 ; done
  • Output : For each given file Barrnap generates a GFF file as output which includes the coordinates of where the rRNA genes are encoded.
  • To extract the ribosomal sequences as FASTA we are using the function getfasta from bedtools.

2. Extract sequences

The bedtools utilies are a swiss-army knife of tools for a wide-range of genomics analysis tasks. Here we use the getfasta command to extract sequences from a FASTA file for each of the intervals defined in the GFF3 file generated by Barrnap.

#!/bin/bash 
for i in *.fna ; do bedtools getfasta -fi $i -bed $i.barrnap_hits.gff -fo $i.out_rRNA.fasta ; done

Transfer the .out_rRNA.fasta to your local computer and use NCBI-BLAST to classify the rRNA sequences identified by Barrnap.

MetaPhlAn

conda activate mpa

metaphlan metagenome.fastq –input_type fastq -o profiled_metagenome.txt