1 Introduction

This guide assumes the following :

The user has access to servers from the Lazar lab.
The user has an active VPN access.
The paired-end fastq files from Illumina Miseq sequencing were transferred to the user’s home directory.
The user has followed the Introduction to linux guide and is comfortable with basic command line functions such as :
- listing files inside a current directory (ls) ;
- moving from one directory to the other (cd) ;
- creating new directory (mkdir) ; and
- moving / copying (mv/cp) files from one directory to the other.

1. Setting up your environment

Keeping your files organized is a skill that has a high long-term payoff. As you are in the thick of an analysis, you may underestimate how many files/folders you have floating around. But a short time later, you may return to your files and realize your organization was not as clear as you hoped, which can ultimately lead to significantly slower research progress. Furthermore, one must keep in mind that someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why.

While there’s a lot of ways to keep your files organized, and there’s not a “one size fits all” organizational solution, below we propose a simple organizational scheme which is project-oriented, maintainable and ultimately follows consistent patterns for metagenome sequence processing Please note that the proposed workflow assumes such organization.

┌─ ~ -------------------------------- Your home directory
│   ├── chapter1_metagenomics_aquifer ------ Project with a short but meaningfull name 
│       ├── 01_raw_data -------------------- Raw files (i.e. fastq files generated by sequencing)
│       ├── 02_preprocess ------------------ Intermediates files from the trimming and interleaving process  
│           ├── interleave
│               └── fastqc
│           └── single
│       ├── 03_trim_interleave ------------- Trimmed and interleaved fastq and fasta files used for assembly
│           └── spades_out
│       ├── 04_contigs --------------------- Assembled contigs
│       ├── 05_binning --------------------- Intermediates files from the binning process 
│           ├── metabat2
│           ├── sample_01
│           ├── sample_02
│           ├── sample_03 
│           ├── sorted_bam 
│       ├── 06_mags ------------------------ Clean and complete metagenomes assembled genomes (MAGs)
│       ├── 07_metabolic_pathway ----------- metabolic_pathways
│       └── 08_phylo_tree ------------------ Project-specific scripts
└──────────────────────────────────────────────────────────────────

Further reading about organizing files and folders :

Organizing your project by the Johns Hopkins Data Science Lab
A Quick Guide to Organizing Computational Biology Projects by William Stafford Noble, 2009
Reddit post
Organizing your data by The Max Delbrück Center

2. Download fastqs files from Illumina BaseSpace Sequence Hub

Open the link found in the email sent by the sequencing center.

If this is your first time downloading your fastqs create a new BaseSpace Sequence Hub account using the email address to which the email from the sequencing center was addressed (normally this would be your UQAM’s email)
On the pop-up window informing you that the sequencing center has shared the following item with you click ACCEPT.
Click on the PROJECTS tab in the upper section of the page.
Select your project and then click on the second round logo from the left which looks like a blank page and in the drop-down menu select DOWNLOAD then PROJECT.
If required, download the Illumina Basespace downloader by clicking INSTALL DOWNLOAD and follow the instructions. Otherwise simply click DOWNLOAD to begin downloading your fastqs.
Once the download is complete you will find inside the folder a folder for each of your sample inside which the forward and reverse read are both found in another folder. Instead of going into each folder individually and copying the fastqs manually we can use the terminal to do the job for us. From a new local terminal window navigate to the folder containing all the folders and execute the following command after having modified /path/to/directory/where/to/move/fastqs to the actual path where you wish to move your fastqs.

find ./ -name "*.gz" -exec cp -prv "{}" "/path/to/directory/where/to/move/fastqs" ";"

Finally you can transfer your fastqs to the folder 01_raw_data under your folder on the server using any File Transfer Protocol (FTP) clients (such as FileZilla or Cyberduck) or using the SCP (secure copy) command-line utility.

For SCP you can copy an entire folder by opening a new local terminal window and navigating to the directory containing the folder with the fastqs. From that directory execute the following command. You will then be asked to enter the password for your user on the server.

scp -r name_of_foler_with_fastqs username@server.bio.uqam.ca:/path/to/copy/folder

Most process take all lot of time complete and therefore nohup should be used to execute the commands. For more details on how to use and examples see section Nohup.

Protocol for metagenomic analyses

Protocol for metagenomic analyses

1 Introduction