Here we present a next-generation sequencing protocol for 16S rRNA sequencing which enables identification and characterization of microbial communities within vectors. This method involves DNA extraction, amplification and barcoding of samples through PCR, sequencing on a flow-cell, and bioinformatics to match sequence data to phylogenetic information.
In recent decades, vector-borne diseases have re-emerged and expanded at alarming rates, causing considerable morbidity and mortality worldwide. Effective and widely available vaccines are lacking for a majority of these diseases, necessitating the development of novel disease mitigation strategies. To this end, a promising avenue of disease control involves targeting the vector microbiome, the community of microbes inhabiting the vector. The vector microbiome plays a pivotal role in pathogen dynamics, and manipulations of the microbiome have led to reduced vector abundance or pathogen transmission for a handful of vector-borne diseases. However, translating these findings into disease control applications requires a thorough understanding of vector microbial ecology, historically limited by insufficient technology in this field. The advent of next-generation sequencing approaches has enabled rapid, highly parallel sequencing of diverse microbial communities. Targeting the highly-conserved 16S rRNA gene has facilitated characterizations of microbes present within vectors under varying ecological and experimental conditions. This technique involves amplification of the 16S rRNA gene, sample barcoding via PCR, loading samples onto a flow cell for sequencing, and bioinformatics approaches to match sequence data with phylogenetic information. Species or genus-level identification for a high number of replicates can typically be achieved through this approach, thus circumventing challenges of low detection, resolution, and output from traditional culturing, microscopy, or histological staining techniques. Therefore, this method is well-suited for characterizing vector microbes under diverse conditions but cannot currently provide information on microbial function, location within the vector, or response to antibiotic treatment. Overall, 16S next-generation sequencing is a powerful technique for better understanding the identity and role of vector microbes in disease dynamics.
The resurgence and spread of vector-borne diseases in recent decades pose a serious threat to global human and wildlife health. Effective vaccines are lacking for a majority of these diseases, and control efforts are hindered by the complex biological nature of vectors and vector-host interactions. Understanding the role of microbial interactions within a vector in pathogen transmission can allow for the development of novel strategies which circumvent these challenges. In particular, interactions between vector-associated microbial commensals, symbionts, and pathogens, referred to as the microbiome, may have important consequences for pathogen transmission. Overwhelming evidence now supports this assertion, with examples demonstrating a link between the vector microbiome and competence for diseases such as malaria, Zika virus, and Lyme disease1,2,3. However, translating these findings into strategies for disease control requires a far more detailed understanding of the structure, function, and origin of vector microbiomes. Identification and characterization of the vector microbial community under varying ecological and experimental conditions constitute an important path forward in this field.
A procedure for identifying the microbial residents of a pathogen vector is provided here by utilizing the Western black-legged tick, Ixodes pacificus, a vector species of the Lyme disease pathogen Borrelia burgdorferi. While ticks harbor more types of human pathogens than any other arthropod, relatively little is known about the biology and community ecology of tick microbiomes4. It is evident that ticks harbor a diverse array of viruses, bacteria, fungi, and protozoans which include commensals, endosymbionts, and transient microbial residents5,4. Prior work has demonstrated strong variations in Ixodes microbiomes associated with geography, species, sex, life stage, and blood meal source6,7,8. However, the mechanisms underlying this variation remain unknown and warrant more detailed investigations of the origin and assembly of these microbial communities. Ticks can acquire microbes through vertical transmission, contact with hosts, and uptake from the environment through the spiracles, mouth, and anal pore9. Understanding the factors shaping the initial formation and development of the tick microbiome, specifically the relative contribution of vertical and environmental transmission, is important for understanding the natural patterns and variations in tick microbiome diversity and how these communities interact during pathogen transmission, with possible applications to disease or vector control.
Powerful molecular techniques, such as next-generation sequencing, now exist for identifying microbial communities and can be employed to characterize vector microbiomes under diverse environmental or experimental conditions. Prior to the advent of these high-throughput sequencing approaches, the identification of microbes relied predominantly on microscopy and culture. While microscopy is a rapid and easy technique, morphological methods for identifying microbes are inherently subjective and coarse and limited by low sensitivity and detection10. Culture-based methods are broadly used for microbial identification and can be used to determine susceptibility of microbes to drug treatments11. However, this method also suffers from low sensitivity, as it has been estimated that fewer than 2% of environmental microbes can be cultured in a laboratory setting12. Histological staining approaches have also been employed to detect and localize specific microbes within vectors, enable investigations of various taxa distributions within the tick, and study hypotheses about microbial interactions. However, prior knowledge of microbial identity is required for selecting the appropriate stains, making this approach ill-suited for microbial characterization and identification. Furthermore, histological staining is a highly time-intensive, laborious process and does not scale well for large sample sizes. Traditional molecular approaches such as Sanger sequencing are similarly limited in their sensitivity and detection of diverse microbial communities.
Next-generation sequencing allows for the rapid identification of microbes from a large number of samples. The presence of standard marker genes and reference databases further enables enhanced taxonomic resolution, often to the genus or species level. Small subunit ribosomal RNAs are frequently used to achieve this goal, with 16S rRNA being the most common due to the presence of conserved and variable regions within the gene, allowing for the creation of universal primers with unique amplicons for each bacterial species13,14. This report details a procedure for identifying taxa in the tick microbiome through 16S rRNA next-generation sequencing. In particular, this protocol emphasizes the steps involved in preparing samples for sequencing. More generalized details on the sequencing and bioinformatics steps are provided, as there are a variety of sequencing platforms and analysis programs currently available, each with extensive existing documentation. The overall feasibility of this next-generation sequencing approach is demonstrated by applying it to an investigation of microbial community assembly within a key disease vector.
1. Tick Collection and Surface Sterilization
2. DNA Purification
3. 16S rRNA Gene Amplification
4. 16S Amplicon Purification
5. Sample Barcoding and Purification
6. Library Quantification and Normalization
7. Library Denaturation and Dilution, and Sequencing Run (perform on the same day)
8. Amplicon Sequence Analysis
A total of 42 ticks from three separate egg clutches and two environmental exposure periods, 0 and 2 weeks in soil, were processed for microbiome sequencing. Each treatment group, considered to be a single clutch and exposure time, contained 6-8 replicate tick samples. These processed tick extracts were loaded onto a next-generation sequencer and yielded 12,885,713 paired-end reads passing filter. Included in this run were 3 negative controls from the extraction step, yielding a total of 211,214 reads (included in previous count). Further run quality metrics along with optimal values for each metric are detailed in Table 2. Rarefaction curves, which relate sequencing effort to number of OTUs per sample, indicated that a sequencing depth, or number of unique sequence reads per sample, of 2,129 reads would be sufficient to adequately capture the diversity of the microbial community (Figure 2). Rarefaction levels will vary based on sample type and must be determined individually for each sequencing run. After rarefying to this depth, 1,714 OTUs were identified across all samples with an average of 93.3 ± 4.3 OTUs per sample. To avoid downstream analysis issues arising from sparse matrices and to remove potential contaminants, all OTUs not found in at least one sample at ≥ 1% abundance were pooled into a rare general category. Further, the decontam package in R was utilized to identify OTUs over-represented in negative controls relative to real samples, and these OTUs were removed from downstream analysis.
Community ecology analyses were performed using the QIIME diversity analyses workflow to demonstrate the types of alpha diversity, beta diversity, and taxonomy composition diversity output generated using a simple bioinformatics pipeline. For example, weighted and unweighted Unifrac principal coordinates analyses were produced using the "core_diversity_analyses.py" script, and they revealed spatial clustering of larval ticks based on clutch identity (Figure 3). Boxplots of OTU counts at varying environmental exposure times were also generated through this script, demonstrating differences in microbiome species richness over time (Figure 4). These alpha and beta diversity analyses are performed based on the user-defined categories listed in the mapping file, but figures and statistics on general taxonomic information are also generated for all samples (Figure 5).
Figure 1: Microbiome sequencing workflow. Major steps of the microbiome sequencing workflow are displayed with call-outs for the library preparation and sequencing analysis steps. Please click here to view a larger version of this figure.
Figure 2: Rarefaction curves for sequence count normalization. Rarefaction curves, shown for each cohort of ticks, relate observed OTU counts to sampling effort. Error bars are present due to replicate samples present within each clutch, and they denote one standard deviation. A sequencing depth should be selected at or beyond the point where the curve becomes stable to adequately capture the full diversity of OTUs. Here, a sequencing depth of 2,000 reads appears appropriate. Please click here to view a larger version of this figure.
Figure 3: Principal coordinate analysis by clutch. Unweighted Unifrac principal coordinate analysis (PCoA) shows variation in microbiome composition between larval ticks from different clutches, and from adults. Each data point denotes an individual tick. This figure is automatically generated through the QIIME "core_diversity_analyses.py" script. Please click here to view a larger version of this figure.
Figure 4: Alpha diversity boxplots by exposure time. Boxplots depicting OTU counts for environmental exposure groups show variation in microbial diversity over time. This figure is automatically generated through the QIIME "core_diversity_analyses.py" script. Please click here to view a larger version of this figure.
Figure 5: Phylum-level microbial identification for clutch one. Microbiome composition for individual tick samples from clutch one is shown at the phylum level. This figure, as well as summary information at lower taxonomic levels, is automatically generated through the QIIME "core_diversity_analyses.py" script. Please click here to view a larger version of this figure.
Metric | Definition | Our results (MiSeq V3) | Optimal for MiSeq V3 | Optimal for HiSeq Rapid Mode | |
Reads PF | Number of sequencing reads passing the chastity filter. A read passes filter if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles | 12,885,713 | 14-16 million | 600 million | |
Error Rate | Percentage of base pairs called incorrectly during a cycle, based on reads aligned to PhiX control | 3.28 ±0.10 | *Depends on values for other metrics | *Depends on values for other metrics | |
% ≥Q30 | Percentage of bases with a Q score ≥ 30, indicating a base call accuracy of 99.9% | 68.94 | > 70% | > 80% | |
Cluster PF (%) | Percentage of clusters passing the chastity filter. | 85.61 ±0.81 | 90% | 90% | |
Density | Density of clusters on the flowcell – a key metric for data quality and output | 552 ± 35 cluster/mm^2 | 1200-1400 cluster/mm^2 | 850-1000 cluster/mm^2 |
Table 1: Key performance parameters for NGS output. User data and target values reported based on the sequencing platform and reagent kit used in this study (Table of Materials) and a 25% sequencing control addition.
Next-generation sequencing of 16S rRNA has become a standard approach for microbial identification and enabled the study of how vector microbiomes affect pathogen transmission. The protocol outlined here details the use of this method to investigate microbial community assembly in I. pacificus, a vector species for Lyme disease; however, it can easily be applied to study other tick species or arthropod vector species.
Indeed, 16S rRNA sequencing for microbiome analysis has been used broadly to study the microbiomes of vectors including mosquitoes, psyllids, and tsetse flies29,30,31. Other methods available for microbial identification within vectors include microscopy, culturing, and histological staining. These methods may be more appropriate than sequencing if the goal is to identify and describe a novel microbe (microscopy), evaluate the effect of antimicrobial drugs (culture), or localize specific and known microbes within the vector (histological staining). However, these methods suffer from low specificity, detection, and scalability, and thus are less appropriate for identifying the full community of microbes within a vector or characterizing the vector microbiome under varying ecological and experimental conditions. Conversely, high-throughput sequencing of the 16S rRNA gene enables identification of low-abundance and non-culturable bacteria, provides high resolution and detection given the comprehensive reference databases, and can provide high replication depending on coverage needs.
While 16S rRNA sequencing is now widely used for microbial identification, this technique is not without limitations. Principally, microbial contamination can obscure interpretation of the sequencing results and confound biological meaning32. Furthermore, given the use of universal bacterial primers and sensitivity to low starting concentrations that are inherent to 16S rRNA sequencing, microbial contamination is common32. Sources of contamination include PCR reagents, DNA extraction kits, laboratory surfaces, and the skin and clothing of researchers33,34,35. The effects of microbial contaminants can be minimized by working in a sterile lab environment, using negative controls and technical replicates, and keeping a record of all kits and reagents used36.
In addition to microbial contamination, low-quality sequencing output can greatly hinder the usability of microbiome sequencing data. Data quality can be assessed by a number of run metrics including the number of reads passing filter, the percentage of reads above a Q-score (a measure of the predicted probability of error in base-calling) of 30, and cluster density. Values for these metrics will vary based on the number of samples run, the sequencing system, the reagent cartridge, and the percent sequencing control used, but optimal values based on run conditions are available online. In particular, cluster density, the number of library clusters on a given plane of the flow cell prior to sequencing, is a key parameter for optimizing data quality and yield. Both over and under-clustering can reduce data output and result in samples being excluded from analysis due to insufficient coverage. Poor cluster density often reflects inaccurate library quantification; thus, care should be taken to perform proper DNA clean-up, individual sample quantification, and whole library quantification.
Assuming high data quality and yield are achieved, divergent approaches in data analysis used to overcome statistical challenges of large datasets pose another limitation of this approach. For example, multiple methods exist to address variation in read counts between samples. Rarefaction, which involves sub-sampling reads to achieve an equal sequencing depth across all samples, is frequently used but has been subject to critique recently for wasting large amounts of data and the subjective selection of minimum sequencing depth37,38,39. Cumulative sum scaling (CSS), which keeps all sequence reads but weighs them based on the cumulative sum of counts within a given percentile, has been developed as an alternative technique. However, CSS has not been widely adopted due to its relative novelty and the confirmed utility of rarefaction for normalization prior to presence/absence analyses.
Standard data analysis procedures are also lacking for handling sequence reads in negative controls and distinguishing these from low abundance reads. As mentioned, sequencing negative controls generated from the DNA extraction step is recommended to help differentiate true vector microbiome residents from microbial contaminants during analysis. Yet, a standard and statistically rigorous approach for identifying and filtering suspected contaminants is lacking. A common approach is to remove OTUs present in the negative controls from all samples. However, this method may be overly conservative since many of these microbes likely originate from real samples rather than kit reagents40. Another common technique involves grouping microbes present at < 1% into a "rare" category under the assumption that microbial contaminants are rare in real samples8,41. However, this method may remove true vector microbes that are present at low abundances, particularly when the microbiome is dominated by an endosymbiont as seen in I. pacificus and I. holocyclus7,42. In these cases, detecting rare microbes would require deeper sequencing, developing primers that inhibit the amplification of the endosymbiont during the amplicon PCR42, or computationally removing the endosymbiont during sequence analysis7. Selecting among the various methods to address rare and contaminant OTUs creates the opportunity for subjectivity and bias in microbiome data analysis, which may limit the ability to compare results across studies.
The choice of reference database, necessary for assigning taxonomy and phylogenetic information to sequence reads, presents another opportunity for divergence and subjectivity. While the task of relating sequence reads from a variable gene region to an identified parent genome is inherently challenging, a reliable reference database is crucial for accurate phylogenetic assignment. Multiple databases have emerged to meet this challenge, such as SILVA43, Greengenes44, RDP45, and NCBI46. SILVA is the largest of the 16S-based taxonomies, but SILVA, as well as RDP, only provides taxonomic information down to the genus level. Greengenes provides species-level information and is included in metagenomic analyses packages like QIIME, but it has not been updated in over four years. NCBI, while not a primary source for taxonomic information, provides daily updated classifications from user-submitted sequences, but it is uncurated. While selection among the databases typically depends on the users' needs regarding resolution, coverage, and currency, it has a significant impact on phylogenetic assignment and downstream analysis47,48. Consensus among investigators regarding which reference database to use is thus imperative for avoiding additional bias in analyses.
Standard approaches to these data processing challenges will likely emerge as 16S rRNA sequencing becomes an increasingly common technique. Continued reductions in sequencing costs and time will further popularize this method. The increased usage of this technology will enable deeper investigations into the composition and ecology of vector microbiomes under diverse conditions. As knowledge of vector microbiome biology is still in its infancy, these types of descriptive studies are a critical first step preceding attempts to leverage the microbiome as a means of vector control. However, to truly understand the role of the microbiome in pathogen transmission, microbial identification must be coupled with knowledge of the functional role of these microbes. RNA sequencing and transcriptomic approaches, which involve mapping and quantifying gene expression, enable inferences into the functional role of microbes but require deep sequencing and fully assembled genomes of target species. Circumventing these challenges, computational tools to predict functional composition from 16S rRNA sequencing data have recently been developed but are not widely adopted yet49. The development of such tools, as well as ecological theory, linking microbial identity and the functional role within vectors will increase the utility of 16S rRNA sequencing data in vector microbiome studies.
The authors have nothing to disclose.
This work was supported by National Science Foundation grants to A.S. (DEB #1427772, 1745411, 1750037).
Item | Name of Material/Equipment | Company | Catalog # |
1 | DNeasy Blood & Tissue Kit | Qiagen | 69504 |
2 | Qubit 4 Fluorometer | ThermoFisher Scientific | Q3326 |
3 | NanoDrop 8000 Spectrophotometer | ThermoFisher Scientific | ND-8000-GL |
4 | 2x KAPA HiFi HotStart ReadyMix | Kapa Biosystems | KK2501 |
5 | AMPure XP beads | Agen Court | A63880 |
6 | Magnetic Rack | ThermoFisher Scientific | MR02 |
6 | TE buffer | Teknova | T0223 |
7 | Nextera Index Kit | Illumina | FC-121-1011 |
8 | KAPA Library Quantification Kit | Roche | KK4824 |
9 | MiSeq System | Illumina | SY-410-1003 |
10 | MiSeq Reagent Kit v3 | Illumina | MS-102-3001 |
11 | 10 mM Tris-HCl with 0.1% Tween 20 | Teknova | T7724 |