Skip to the content.

Workshop

eDNA metabarcoding: From raw data to RDA (Currently in development)

Description

[English]
With the democratisation of high-throughput sequencing, the use of DNA as an identification method has become standard practice. The marker genes (16S rRNA, 18S rRNA, ITS, etc.) can be compared to databases and give an overview of the communities present in your samples. No more cultivating, isolating and identifying based on morphology and chemical reactions ! However time-gaining DNA sequencing may be, the full process of turning raw DNA reads into exploitable taxonomic units can be tricky. Due to the quick turnover in bioinformatics techniques and the profusion of methods available, it can be hard to decide on a pipeline. In this workshop, we would like to present to you a complete reproducible workflow in R dedicated to the processing of raw DNA sequences using the DADA2 package, followed by community analysis and visualisation using the Phyloseq package.

[Español]
Desde la democratización de la secuenciación de alto rendimiento, el uso del ADN como método de identificación se ha convertido en una práctica habitual. Gracias a los genes marcadores (16S rRNA, 18S rRNA, ITS, etc.) y a las bases de datos taxonómicas, es posible identificar las comunidades presentes en sus muestras. No es necesario cultivar, aislar e identificar basándose en la morfología y las reacciones químicas. Sin embargo, todo el proceso de identificación de las secuencias de ADN en unidades taxonómicas utilizables puede ser difícil debido a la rápida evolución de las técnicas bioinformáticas y a la riqueza de los métodos disponibles. En este taller, presentaremos un flujo operativo accesible y reproducible en lenguaje R dedicado al procesamiento de secuencias de ADN en bruto utilizando la biblioteca DADA2. A continuación, se llevará a cabo un análisis de la comunidad y una visualización con la biblioteca Phyloseq.

[Français]
Depuis la démocratisation du séquençage à haut débit, l’utilisation de l’ADN comme méthode d’identification est devenue une pratique courante. Grâce à des gènes marqueurs (ARNr 16S, ARNr 18S, ITS, etc.) et des bases de données taxonomiques, il est possible d’identifier les communautés présentes dans vos échantillons. Plus besoin de cultiver, isoler et identifier en fonction de la morphologie et des réactions chimiques! Cependant, le processus complet de d’identification des séquences d’ADN en unités taxonomiques exploitables peut s’avérer difficile en raison de l’évolution rapide des techniques bioinformatiques et de la profusion de méthodes disponibles. Dans cet atelier, nous vous présenterons un flux opérationnel accessible et reproductible en langage R dédié au traitement de séquences d’ADN brutes à l’aide de la librairie DADA2. Cette étape sera suivi d’une analyse et d’une visualisation de la communauté à l’aide de la librairie Phyloseq.

Install packages

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("dada2")
BiocManager::install("phyloseq")

install.packages(c('ggplot2', 'vegan', 'gtools')

Tutorials

[Bilingual Français-English]
Dada: https://alexiscarter.github.io/metab/Dada_script.html
Phyloseq: https://alexiscarter.github.io/metab/Phyloseq_script.html

[Français]
Dada: https://alexiscarter.github.io/metab/Dada_script_FR.html

[English]
Dada: https://alexiscarter.github.io/metab/Dada_script_EN.html

[Español]
Dada: https://alexiscarter.github.io/metab/Dada_script_ES.html

To download the repository

https://github.com/alexiscarter/metab/archive/master.zip
It includes data and scripts.

Set the working directory before starting the tutorials:

setwd("YourPath/metab-master")

Authors

Simon Morvan and Alexis Carteron, Université de Montréal

Data

Original data can be found here, from Carteron A, Beigas M, Joly S, Turner B L, Laliberté E. 2020. Temperate forests dominated by arbuscular or ectomycorrhizal fungi are characterized by strong shifts from saprotrophic to mycorrhizal fungi with increasing soil depth. Microbial Ecology. DOI PDF

Amplicons obtained through DNA amplification targeting the ITS region (fungal specific) and sequenced on an Illumina MiSeq plateform (paired-end 300 bp).

Orignial sequenced data were subsampled, randomly selecting 1,000 pair-end reads per sample (i.e. 128,000 reads in total) to facilite running time in local machine. Seqtk was used for the subsampling step.

Bioinformatical pipeline for ITS sequences using DADA2 and multivariate analyses of soil fungal communities using R for the article can be found here

Acknowledgements

Largely inspired by the tutorials of DADA2 and Phyloseq

Corresponding articles: