Protocol for metagenomic analyses
2026-01-28
1 Introduction
This guide assumes the following :
The user has access to servers from the Lazar lab.
The user has an active VPN access.
The paired-end fastq files from Illumina Miseq sequencing were transferred to the user’s home directory.
The user has followed the
Introduction to linuxguide and is comfortable with basic command line functions such as :- listing files inside a current directory (
ls) ; - moving from one directory to the other (
cd) ; - creating new directory (
mkdir) ; and - moving / copying (
mv/cp) files from one directory to the other.
- listing files inside a current directory (
1. Setting up your environment
Keeping your files organized is a skill that has a high long-term payoff. As you are in the thick of an analysis, you may underestimate how many files/folders you have floating around. But a short time later, you may return to your files and realize your organization was not as clear as you hoped, which can ultimately lead to significantly slower research progress. Furthermore, one must keep in mind that someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why.
While there’s a lot of ways to keep your files organized, and there’s not a “one size fits all” organizational solution, below we propose a simple organizational scheme which is project-oriented, maintainable and ultimately follows consistent patterns for metagenome sequence processing Please note that the proposed workflow assumes such organization.
┌─ ~ -------------------------------- Your home directory
│ ├── chapter1_metagenomics_aquifer ------ Project with a short but meaningfull name
│ ├── 01_raw_data -------------------- Raw files (i.e. fastq files generated by sequencing)
│ ├── 02_preprocess ------------------ Intermediates files from the trimming and interleaving process
│ ├── interleave
│ └── fastqc
│ └── single
│ ├── 03_trim_interleave ------------- Trimmed and interleaved fastq and fasta files used for assembly
│ └── spades_out
│ ├── 04_contigs --------------------- Assembled contigs
│ ├── 05_binning --------------------- Intermediates files from the binning process
│ ├── metabat2
│ ├── sample_01
│ ├── sample_02
│ ├── sample_03
│ ├── sorted_bam
│ ├── 06_mags ------------------------ Clean and complete metagenomes assembled genomes (MAGs)
│ ├── 07_metabolic_pathway ----------- metabolic_pathways
│ └── 08_phylo_tree ------------------ Project-specific scripts
└──────────────────────────────────────────────────────────────────
Further reading about organizing files and folders :
Organizing your project by the Johns Hopkins Data Science Lab
A Quick Guide to Organizing Computational Biology Projects by William Stafford Noble, 2009
Organizing your data by The Max Delbrück Center
2. Download fastqs files from Illumina BaseSpace Sequence Hub
Open the link found in the email sent by the sequencing center.
If this is your first time downloading your fastqs create a new BaseSpace Sequence Hub account using the email address to which the email from the sequencing center was addressed (normally this would be your UQAM’s email)
On the pop-up window informing you that the sequencing center has shared the following item with you click
ACCEPT.Click on the
PROJECTStab in the upper section of the page.Select your project and then click on the second round logo from the left which looks like a blank page and in the drop-down menu select
DOWNLOADthenPROJECT.If required, download the Illumina Basespace downloader by clicking
INSTALL DOWNLOADand follow the instructions. Otherwise simply clickDOWNLOADto begin downloading your fastqs.Once the download is complete you will find inside the folder a folder for each of your sample inside which the forward and reverse read are both found in another folder. Instead of going into each folder individually and copying the fastqs manually we can use the terminal to do the job for us. From a new local terminal window navigate to the folder containing all the folders and execute the following command after having modified
/path/to/directory/where/to/move/fastqsto the actual path where you wish to move your fastqs.
Finally you can transfer your fastqs to the folder
01_raw_dataunder your folder on the server using any File Transfer Protocol (FTP) clients (such as FileZilla or Cyberduck) or using the SCP (secure copy) command-line utility.For SCP you can copy an entire folder by opening a new local terminal window and navigating to the directory containing the folder with the fastqs. From that directory execute the following command. You will then be asked to enter the password for your user on the server.
Most process take all lot of time complete and therefore nohup should be used to execute the commands. For more details on how to use and examples see section Nohup.