4 Taxonomic profiles
Barrnapp
1. Running Barrnap
Barrnap (BAsic Rapid Ribosomal RNA Predictor) predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), metazoan mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). Barrnap takes FASTA DNA sequence as input and can be used on both the assembled contigs (community) (directory 04_contigs) and MAGs (directory 06_mags).
- Add the name of the sample at the beginning of every scaffhold ID and change the extension type to
.fna(same as .fasta, we are simply using a different extension to distinguish new file with the sample name in front of each scaffhold from the old file).
for i in *.fasta ; do perl -lne 'if(/^>(\S+)/){ print ">$ARGV $1"} else{ print }' $i > $i.fna ; done- Change the space between the name and the scaffhold number to an underscore
- Run Barrnap. Create a bash script called
run_barrnap.shwith the following commands and execute the script using nohup.
- Output : For each given file Barrnap generates a GFF file as output which includes the coordinates of where the rRNA genes are encoded.
- To extract the ribosomal sequences as FASTA we are using the function
getfastafrombedtools.
2. Extract sequences
The bedtools utilies are a swiss-army knife of tools for a wide-range of genomics analysis tasks. Here we use the getfasta command to extract sequences from a FASTA file for each of the intervals defined in the GFF3 file generated by Barrnap.
#!/bin/bash
for i in *.fna ; do bedtools getfasta -fi $i -bed $i.barrnap_hits.gff -fo $i.out_rRNA.fasta ; doneTransfer the .out_rRNA.fasta to your local computer and use NCBI-BLAST to classify the rRNA sequences identified by Barrnap.
MetaPhlAn
metaphlan metagenome.fastq –input_type fastq -o profiled_metagenome.txt