Described here is a simplified standard operating procedure for microbiome profiling using 16S rRNA metagenomic sequencing and analysis using freely available tools. This protocol will help researchers who are new to the microbiome field as well as those requiring updates on methods to achieve bacterial profiling at a higher resolution.
The human gut is colonized by trillions of bacteria that support physiologic functions such as food metabolism, energy harvesting, and regulation of the immune system. Perturbation of the healthy gut microbiome has been suggested to play a role in the development of inflammatory diseases, including multiple sclerosis (MS). Environmental and genetic factors can influence the composition of the microbiome; therefore, identification of microbial communities linked with a disease phenotype has become the first step towards defining the microbiome’s role in health and disease. Use of 16S rRNA metagenomic sequencing for profiling bacterial community has helped in advancing microbiome research. Despite its wide use, there is no uniform protocol for 16S rRNA-based taxonomic profiling analysis. Another limitation is the low resolution of taxonomic assignment due to technical difficulties such as smaller sequencing reads, as well as use of only forward (R1) reads in the final analysis due to low quality of reverse (R2) reads. There is need for a simplified method with high resolution to characterize bacterial diversity in a given biospecimen. Advancements in sequencing technology with the ability to sequence longer reads at high resolution have helped to overcome some of these challenges. Present sequencing technology combined with a publicly available metagenomic analysis pipeline such as R-based Divisive Amplicon Denoising Algorithm-2 (DADA2) has helped advance microbial profiling at high resolution, as DADA2 can assign sequence at the genus and species levels. Described here is a guide for performing bacterial profiling using two-step amplification of the V3-V4 region of the 16S rRNA gene, followed by analysis using freely available analysis tools (i.e., DADA2, Phyloseq, and METAGENassist). It is believed that this simple and complete workflow will serve as an excellent tool for researchers interested in performing microbiome profiling studies.
Microbiota refers to a collection of microorganisms (bacteria, viruses, archaea, bacteriophages, and fungi) living in a particular environment, and the microbiome refers to the collective genome of resident microorganisms. As bacteria are one of the most abundant microbes in humans and mice, this study is focused only on bacterial profiling. The human gut is colonized by trillions of bacteria and hundreds of bacterial strains1. The normal gut microbiota plays a vital role in maintaining a healthy state in the host by regulating functions (i.e., maintenance of an intact intestinal barrier, food metabolism, energy homeostasis, inhibition of colonization by pathogenic organisms, and regulation of immune responses)2,3,4,5. Compositional perturbations of the gut microbiota (gut dysbiosis) have been linked to a number of human diseases, including gastrointestinal disorders6, obesity7,8, stroke9, cancer10, diabetes8,11, rheumatoid arthritis12, allergies13, and central nervous system-related diseases such as multiple sclerosis (MS)14,15 and Alzheimer's disease (AD)8,16. Therefore, in recent years, there has been growing interest in tools for identifying bacterial composition at different body sites. A reliable method should have characteristics such as being high-throughput and easy-to-use, having the ability to classify bacterial microbiota with high resolution, and being low-cost.
Culture-based microbiological techniques are not sensitive enough to identify and characterize the complex gut microbiome due to the failure of several gut bacteria to grow in culture. The advent of the sequencing-based technology, especially 16S rRNA-based metagenomic sequencing, has overcome some of these challenges and transformed microbiome research17. Advanced 16S rRNA-based sequencing technology has helped in establishing a critical role for the gut microbiome in human health. The Human Microbiome Project, a National Institutes of Health initiative18, and the MetaHIT project (a European initiative)19 have both helped in establishing a basic framework for microbiome analysis. These initiatives helped kick-start multiple studies to determine the role of the gut microbiome in human health and disease.
A number of groups have shown gut dysbiosis in patients with inflammatory diseases12,14,15,20,21,22. Despite being widely used for taxonomic profiling due to the ability to multiplex and low costs, there are no uniform protocols for 16S rRNA-based taxonomic profiling. Another limitation is the low resolution of taxonomic assignment owing to smaller sequencing reads (150 bp or 250 bp) and use of only forward sequencing read (R1) due to low quality reverse sequencing reads (R2). However, advances in sequencing technology have helped to overcome some of these challenges, such as the ability to sequence longer reads using paired-end reads (e.g., Illumina MiSeq 2x300bp).
The present sequencing technology can sequence 600 bp good quality reads, which allows merging of R1 and R2 reads. These merged longer R1 and R2 reads allow better taxonomic assignments, especially with open-access R-based Divisive Amplicon Denoising Algorithm-2 (DADA2) platform. DADA2 utilizes amplicon sequence variant (ASV)-based assignments instead of operational taxonomic unit (OTU) assignments based on 97% similarity utilized by QIIME23. ASV matches result in an exact sequence match in the database within 1–2 nucleotides, which leads to assignment at genus and species levels. Thus, the combination of longer, good quality paired-end reads and better taxonomic assignment tools (such as DADA2) have transformed microbiome studies.
Provided here is a step-by-step guide for performing bacterial profiling using two-step amplification of the V3–V4 region of 16S rRNA and data analysis using DADA2, Phyloseq, and METAGENassist pipelines. For this study, human leukocyte antigen (HLA) class II transgenic mice are used, as certain HLA class II alleles are linked with a predisposition to autoimmune diseases such as MS20,24,25. However, the importance of HLA class II genes in regulating the composition of gut microbiota is unknown. It is hypothesized that the HLA class II molecule will influence gut microbial community by selecting for specific bacteria. Major histocompatibility complex (MHC) class II knockout mice (AE.KO) or mice expressing human HLA-DQ8 molecules (HLA-DQ8)24,25,26 were used in order to understand the importance of HLA class II molecules in shaping the gut microbial community. It is believed that this complete and simplified workflow with R-based data analysis will serve as an excellent tool for researchers interested in performing microbiome profiling studies.
The generation of mice lacking endogenous murine MHC class II genes (AE.KO) and AE-/-.HLA-DQA1*0103, DQB1*0302 (HLA-DQ8) transgenic mice with a C57BL/6J background has been described previously26. Fecal samples are collected from mice of both sexes (8–12 weeks of age). Mice were previously bred and maintained in the University of Iowa animal facility as per the NIH and institutional guidelines. Contamination control strategies such as weaning of the mice inside a laminar flow cabinet, changing of gloves between different strains of mice, and proper maintenance of mice are critical steps for profiling of gut microbiome.
Proper personal protective equipment (PPE) are highly recommended during the entire procedure. Appropriate negative controls should be included when performing DNA isolation, PCR1 and PCR2 amplification, and sequencing steps. Use of sterile, DNase-free, RNase-free, and pyrogen-free supplies is recommended. Designated pipettor for microbiome work and filtered pipette tips should be used throughout the protocol. Microbiota analysis consists of seven steps: 1) fecal sample collection and processing; 2) extraction of DNA; 3) 16S rRNA gene amplification; 4) DNA library construction using indexed PCR; 5) clean-up and quantification of indexed PCR (library); 6) MiSeq sequencing; and 7) data processing and sequence analysis. A schematic diagram of all protocol steps is shown in Figure 1.
The protocol was approved by the Institutional Animal Care and Use Committee of the University of Iowa.
1. Fecal Sample Collection and Handling
2. Extraction of DNA
3. 16S rRNA Gene Amplification (PCR1)
4. DNA Library Construction Using Indexed PCR (PCR2)
5. Clean-up of Indexed PCR (PCR2) and Quantification
6. MiSeq Sequencing
7. Data Processing and Sequence Analysis
NOTE: For detailed statistical tests performed during microbiome analysis, refer to the works of Chen et al. and Hugerth et al.14,30.
As MHC class II molecules (HLA in humans) are central players in the adaptive immune response and show strong associations with a predisposition to MS24,25,26, it was hypothesized that the HLA class II molecule would influence gut microbial composition. Therefore, mice lacking the MHC class II gene (AE.KO) or expressing human HLA-DQ8 gene (HLA-DQ8) were utilized to understand the importance of HLA class II molecules in shaping the gut microbial community.
Fecal samples were collected from AE.KO (n = 16) and HLA-DQ8 (n = 12) transgenic mice, bacterial DNA was extracted, and the V3-V4 region of the 16S rRNA gene was amplified. The amplicon size (550 bp) was confirmed by running the samples on a 1.5% agarose gel (Figure 2A, lanes 1–6). Further confirmation of 16S rRNA amplicon size (550 bp) was performed by loading 1 μL of the PCR1 product on a high sensitivity electrophoresis chip (Figure 2B, lanes 1–7).
An electropherogram was generated from 16S rRNA PCR product, which showed peak regions with fragments sized ~550 bp (Figure 2C). Dual indices and sequencing adapters were attached using indexed PCR (PCR2) that assigned a unique identity to each sample and allowed multiplexing of many samples in a single MiSeq sequencing run. Confirmation of indexed PCR was performed by agarose gel electrophoresis (Figure 2A, lanes 7–12) and a high sensitivity electrophoresis chip (Figure 2B, lanes 8–12), Figure 2D]. All the samples from PCR2 were pooled, purified, and loaded onto a next-generation sequencer that yielded forward R1 and reverse R2 reads of good quality (Figure 3). The median obtained reads after quality filtering and trimming were 88,125 (range of 9,597–111,848).
Community ecology analysis was performed using the DADA2 analysis pipeline and visualized with Phyloseq and METAGENassist to demonstrate differences in alpha diversity (Figure 4) and beta diversity (Figure 5), as well as differences at the genus and species levels between groups. DADA2 analysis generated an abundance table with comma-separated-values in csv format, which was used for further downstream analysis using a web-based platform (i.e., Phyloseq and/or METAGENassist). The alpha and beta diversity analyses were performed based on user-defined categories listed in a mapping file.
Shannon diversity analysis revealed an overall lower alpha diversity for AE.KO mice compared to HLA-DQ8 transgenic mice (Figure 3). Ordination with principal coordinates analysis showed a distinct spatial clustering between AE.KO mice and HLA-DQ8 transgenic mice (Figure 5). An abundance table from DADA2 was also used to perform a comprehensive metagenomic analysis using open-access software METAGENassist29. Heat map-based clustering of bacterial abundance (genus level) (Figure 6A) and a box plot for specific bacteria showing the differences between two groups (Figure 6B) were generated utilizing a METAGENassist pipeline.
Heat map analysis showed that certain bacteria such as Allobacullum, Desufovibrio, and Rikenella were more abundant in HLA-DQ8 transgenic mice. In contrast, Biolophila was more abundant in AE.KO mice (Figure 6B). Relative abundances of individual bacteria (Bilophila and Rikenella) are shown in a representative box plot (Figure 6B). Altogether, the data demonstrate that AE.KO mice possess a distinct microbial community compared to that of HLA-DQ8 transgenic mice, with an absence of specific bacteria in AE.KO mice. The data also suggest that MHC class II molecules play an important role in the abundance of certain bacteria. In summary, this simple and detailed protocol will help researchers who are new to the microbiome field as well as those who need updates on the methods for achieving higher taxonomic resolution.
Figure 1: Flow diagram of gut microbiome sequencing. All steps of the microbiome sequencing (sample collection to microbiome data analysis) are displayed. Please click here to view a larger version of this figure.
Figure 2: 16S rRNA gene amplification and quality control analysis of V3–V4 region. (A) Representative agarose gel electrophoresis image of the 16S amplicon (PCR1, size 550 bp) and indexed PCR (PCR2, size = 630 bp) from AE.KO mice (lanes 1–3 and 7–9) and HLA-DQ8 transgenic mice (lanes 4–6 and 10–12). (B) Representative gel image of the 16S amplicon (PCR1, size = 550 bp) and indexed PCR (PCR2, size = 630 bp) of AE.KO mice (lanes 1–3 and 8–10) and HLA-DQ8 transgenic mice (lanes 4–7 and 11–12) resolved by electrophoresis. (C) Representative electropherogram of the 16S amplicon (PCR1) showed a peak region containing fragments that were sized ~550 bp. (D) Representative electropherogram of indexed PCR (PCR2) showed a peak region comprising of fragments sized ~630 bp. Please click here to view a larger version of this figure.
Figure 3: Quality profile of forward reads (R1, top) for two representative samples and corresponding reverse reads (R2, bottom) for the same samples. The analysis was performed using a DADA2 pipeline, in which the x-axis shows read length (0–300 bases) and y-axis shows quality of the reads. Green line represents the median quality score, whereas the orange line represents quartiles of the quality score distribution at each base position. Forward reads (R1) always showed better quality than reverse reads (R2). Please click here to view a larger version of this figure.
Figure 4: Alpha diversity measures (Shannon diversity) of AE.KO mice and HLA-DQ8 transgenic mice. Each dot represents α-diversity (Shannon diversity) in a sample from a single mouse. Shannon diversity was overall lower for AE.KO mice compared to HLA-DQ8 transgenic mice. Please click here to view a larger version of this figure.
Figure 5: Ordination with partial least squares-based dimension analysis plot. The plot shows a clear separation between AE.KO mice and HLA-DQ8 transgenic mice. Each dot represents bacterial composition within a sample, and dotted eclipses indicate 80% confidence intervals. The PLS-DA plots were generated using METAGENassist. Please click here to view a larger version of this figure.
Figure 6: AE.KO mice showing distinct microbial community compared to HLA-DQ8 transgenic mice, with an absence of specific bacteria in AE.KO mice. (A) Heat map combined with agglomerative hierarchical clustering showing the relative abundance of bacteria (genus level). (B) Box plot showing a normalized relative abundance of two representative bacteria (Bilophila and Rikenella) in AE.KO and HLA-DQ8 transgenic mice. Both plots were generated using METAGENassist. Please click here to view a larger version of this figure.
Supplementary Figure 1: Representative picture of the divider box for the collection of fecal samples from multiple mice at a time. Please click here to view a larger version of this figure.
Supplementary File 2: DADA2 R-script for generating abundance table from raw sequences. Please click here to download this file.
Supplementary File 3: Representative genus abundance table (CSV format). Please click here to download this file.
Supplementary File 4: Representative mapping file (CSV format). Please click here to download this file.
The described protocol is simple, with easy-to-follow steps to perform microbiome profiling using 16S rRNA metagenomic sequencing from a large number of biospecimens of interest. Next-generation sequencing has transformed microbial ecology studies, especially in human and pre-clinical disease models31,32. The main advantage of this technique is its ability to successfully analyze complex microbial compositions (culturable and non-culturable microbes) in a given biospecimen at a high throughput level and at a low cost32. However, several factors (i.e., batch effects, selection of primers for 16S rRNA gene, and sequence data analysis) remain a major obstacle in the widespread use of this technology.
Advanced 16S rRNA-based MiSeq sequencing technologies (2x300bp)33 allow sequencing of ~600 nucleotides out of the 1,500 nucleotide-long 16S rRNA gene of bacteria, which overcame the earlier bottleneck of short sequencing reads. Primers specific for a different region of rRNA such as V1–V2, V3–V5, or V6–V9 can be used with each region-specific primers showing some bias for over- or under-detection of particular taxa34,35. Some groups prefer the V1–V2 region21, which shows an increased bias for Clostridium but underdetection for certain Bacteroidetes species. In contrast, other groups prefer the V4, V3–V4, and V3–V5 regions, which demonstrate the least biased classification of bacterial taxa15,20,36,37.
The present protocol uses primers specific for V3–V4 regions of the 16S rRNA gene, as it covers the longer region of 16S rRNA with two hypervariable regions (V3 and V4) compared to V4 alone. Additionally, V3–V4 specific amplicon allows merging of both forward (R1) and reverse (R2) reads, leading to better resolution of bacterial classification. Although the V3–V5 region provides longer reads and covers more hypervariable regions (V3, V4, and V5), it is challenging to merge R1 and R2 reads due to no/little overlapping regions between R1 and R2 reads. Therefore, a number of studies have used data only from R1 reads for bacterial classification when performing 16S rRNA metagenomic sequencing using V3–V5 region14.
Proper biospecimen storage and handling are critical for microbiome analysis to prevent degradation of bacterial DNA or environmental contamination38. Long-term storage of biospecimen at RT or 4 °C can lead to overgrowth of certain bacteria or fungi. Samples can be either frozen immediately or transported at 4 °C (for 1–3 days), then frozen. Biospecimen can also be stored directly in preservatives such as nucleic acid stabilization solution (e.g., RNA-later), 95% ethanol, or a commercial storage kit. The general consensus is that either of these storage methods does not cause significant differences in bacterial community profiles39,40. Although a solution with preservatives allows for storage and transportation at RT, these samples cannot be used for RNA-based assays, metabolite analyses, or fecal transplant experiments. These issues have been discussed in detail elsewhere39,40.
One of the critical steps in this protocol is use of a bead-beating-based method for the mechanical disruption of gram-positive bacteria and Archaea. An earlier study showed the highest bacterial diversity using the bead-beating-based method compared to other methods of cell lysis41. An incomplete bacterial lysis or contamination during DNA extraction can introduce bias in gut microbiome data42. Another important point to consider is contamination due to laboratory reagents included in kits called the kitome43,44,45. Samples with large biomass and rich bacterial diversity such as soil or feces show less of these problems compared to samples with lower biomass such as skin. Therefore, a water extraction control should be included with each extraction and processed with other samples to identify the introduction of potential contamination due to DNA extraction43,44,45.
In microbiome analysis, diversity within a sample can be measured by alpha diversity, a measure of the richness of species, or beta diversity, which estimates dissimilarities between samples/groups. Popular methods for measuring α-diversity include UniFrac (weighted and unweighted) coupled with multivariate statistical techniques such as principal coordinate analysis (PCoA) or Brey-Curtis dissimilarity. While UniFrac is based on phylogenetic distances, Brey-Curtis dissimilarity analysis utilizes bacterial abundance for generating plots. In depth descriptions about α- and β-diversity iare discussed elsewhere30,46,47. A number of statistical methods can be utilized to compare differences in microbial communities between groups14,48. It is advised to use adjusted p-values instead of raw p-values to correct for multiple testing14.
There is a variety of bioinformatics software to analyze targeted sequencing data independently49. The proposed protocol uses R-based open-source software packages, which allows user-friendly and fast profiling of bacterial taxa through R-based DADA2 pipelines. The abundance table generated from DADA2 can be used for downstream analysis using phyloseq and METAGENassist. DADA2 pipeline is advantageous over QIIME because it does not require special features (i.e., installation of virtual machines or Docker containers), which need relatively large computational resources and special technical expertise. Especially for beginners to the 16S analysis, R is appealing, as it is free and allows users to take advantage of accessible online tutorials and analysis scripts that are easy to execute. Importantly, these analysis tools require relatively small computational resources and can be run on a PC, Macintosh, or Linux-based platform. Additionally, METAGENassist can use abundance tables generated from DADA2 pipelines as well as biological observation matrix (BIOM) files generated from QIIME/MG-RAST to perform analysis such as PCA, partial least squares discriminant analysis (PLS-DA), volcano plots, t-tests (comparing two groups), ANOVA (comparing three or more groups), heat map plots, random forest analysis, etc. METAGENassist was found be very user-friendly.
In summary, this protocol describes a simple 16S rRNA metagenomic profiling pipeline, with a detailed guide on sample collection, DNA extraction, metagenomic library preparation, sequencing on Illumina MiSeq, and user-friendly data analysis using freely available platforms (i.e., DADA2, phyloseq, and METAGENassist). Although 16S rRNA metagenomic-based taxonomic profiling is reliable for characterization of bacteria present in particular biospecimens, shotgun metagenomic sequencing may be a better approach for detailed metabolic pathway analysis and strain-specific bacterial identification.
The authors have nothing to disclose.
The authors acknowledge funding from the NIAID/NIH (1R01AI137075-01), the Carver Trust Medical Research Initiative Grant, and the University of Iowa Environmental Health Sciences Research Center, NIEHS/NIH (P30 ES005605).
1.5 ml Natural Microcentriguge Tube | USA, Scientific | 1615-5500 | Fecal collection |
3M hand applicator squeegee PA1-G | 3M, MN, US | 7100038651 | Squeeger for proper sealing of PCR Plate |
Agencourt AMPure XP | Beckman Coulter, IN, USA | A63880 | PCR Purification, NGS Clean-up, PCR clean-up |
Agilent DNA 1000 REAGENT | Agilent Technologies, CA, USA | 5067-1504 | DNA quantification and quality control |
Bioanalyzer DNA 1000 chip | Agilent Technologies, CA, USA | 5067-1504 | DNA quantification and quality control |
Index Adopter Replacement Caps | Illumina, Inc., CA, USA | 15026762 | New cap for Index 1 and 2 adopter primer |
DNeasy PowerLyzer PowerSoil Kit | MoBio now part of QIAGEN, Valencia, CA, USA | 12855-100 | DNA isolation |
KAPA HiFi HotStart ReadyMiX (2X) | Kapa Biosystem, MA, USA | KK2602 | PCR ready mix for Amplicon PCR1 and Indexed PCR2 |
Lewis Divider Boxes | Lewis Bins, WI, US | ND03080 | Fecal collection |
Magnetic stand | New England BioLabs, MA, USA | S1509S | For PCR clean-up |
MicroAmp Fast Optical 96-Well Reaction Plate | Applied Biosystems, Thermo Fisher Scientific, CA, USA | 4346906 | PCR Plate |
MicroAmp Optical Adhesive Film | Applied Biosystems, Thermo Fisher Scientific, CA, USA | 4311971 | PCR Plate Sealer |
Microfuge 20 Centrifuge | Beckman Coulter, IN, USA | B30154 | Centrifuge used for DNA isolation |
MiSeq Reagent Kit (600 cycles)v.3 | Illumina, Inc., CA, USA | MS-102-3003 | For MiSeq Sequencing |
Nextera XT DNA Library Preparation Kit | Illumina, Inc., CA, USA | FC-131-1001 | 16S rRNA DNA Library Preparation |
Reagent Reservoirs Multichannel Trays | ASI, FL,USA | RS71-1 | For Pooling of PCR2 Product |
Plate Cetrifuge | Thermo Fisher Scientific, CA, USA | 75004393 | For PCR reagent mixing and removing air bubble from Plate |
PhiX Control | Illumina, Inc., CA, USA | FC-110-3001 | For MiSeq Sequencing control |
Microbiome DNA Purification Kit | Thermo Fisher Scientific, CA, USA | A29789 | For purification of PCR1 product |
PowerLyzer 24 Homogenizer | Omni International, GA, USA | 19-001 | Bead beater for DNA Isolation |
Qubit dsDNA HS (High Sensitivity) assay kit | Thermo Fisher Scientific, CA, USA | Q32854 | DNA quantification |
TruSeq Index Plate Fixture | Illumina, Inc., CA, USA | FC-130-1005 | For Arranging of the index primers |
Vertical Dividers (large) | Lewis Bins, WI, US | DV-2280 | Fecal collection |
Vertical Dividers (small) | Lewis Bins, WI, US | DV-1780 | Fecal collection |