This manuscript describes a technique for detecting mutations of low frequency in ctDNA, ER-Seq. This method is differentiated by its unique use of two-directional error correction, a special background filter, and efficient molecular acquirement.
The analysis of circulating tumor DNA (ctDNA) using next-generation sequencing (NGS) has become a valuable tool for the development of clinical oncology. However, the application of this method is challenging due to its low sensitivity in analyzing the trace amount of ctDNA in the blood. Furthermore, the method may generate false positive and negative results from this sequencing and subsequent analysis. To improve the feasibility and reliability of ctDNA detection in the clinic, here we present a technique which enriches rare mutations for sequencing, Enrich Rare Mutation Sequencing (ER-Seq). ER-Seq can distinguish a single mutation out of 1 x 107 wild-type nucleotides, which makes it a promising tool to detect extremely low frequency genetic alterations and thus will be very useful in studying disease heterogenicity. By virtue of the unique sequencing adapter's ligation, this method enables an efficient recovery of ctDNA molecules, while at the same time correcting for errors bidirectionally (sense and antisense). Our selection of 1021 kb probes enriches the measurement of target regions that cover over 95% of the tumor-related driver mutations in 12 tumors. This cost-effective and universal method enables a uniquely successful accumulation of genetic data. After efficiently filtering out background error, ER-seq can precisely detect rare mutations. Using a case study, we present a detailed protocol demonstrating probe design, library construction, and target DNA capture methodologies, while also including the data analysis workflow. The process to carry out this method typically takes 1-2 days.
Next-generation sequencing (NGS), a powerful tool to investigate the mysteries of the genome, can provide a large quantity of information, which may reveal genetic alterations. The application of NGS analysis in the clinic has become more common, especially for personalized medicine. One of the greatest limitations of NGS, however, is a high error rate. Although it is deemed suitable for studying inherited mutations, the analysis of rare mutations is greatly limited1,2, especially when analyzing DNA obtained from a "liquid biopsy".
Circulating tumor DNA (ctDNA) is cell-free DNA (cfDNA) in the blood that is shed from tumor cells. In most cases, the quantity of ctDNA is extremely low, which make its detection and analysis very challenging. However, ctDNA has many attractive features: its isolation is minimally invasive, it can be detected in the early stages of tumor growth, the ctDNA level reflects therapeutic efficiency, and ctDNA contains DNA mutations found in both primary and metastatic lesions3,4,5. Therefore, given the rapid development of the NGS technique and analysis, the application of ctDNA detection has become more attractive.
Different massively parallel sequencing approaches have been utilized for ctDNA detection but none of these approaches have been accepted for routine use in clinics due to their limitations: low sensitivity, lack of versatility, and a relatively high cost6,7,8. For example, duplex sequencing, based on a unique identifier tag (UID), repeatedly corrects errors in the consensus bidirectionally, rectifying most sequencing errors. However, the feasibility of this method is lost due to its high cost and low data utilization9,10. Similarly, CAPP-Seq and its improved iteration, CAPP-IDES11,12, have greater practicality in cfDNA detection, though the accuracy and universality of these methods need improvement.
To meet the current need for accurate ctDNA detection and analysis, we developed a new strategy, Enrich Rare Mutation Sequencing (ER-Seq). This approach combines the following: unique sequencing adapters to efficiently recover ctDNA molecules, with bidirectional error correction and the ability to distinguish a single mutation out of > 1 × 107 wild-type nucleotides; 1021 kb probes which enrich measurement of target regions that cover over 95% of the tumor-related mutations from 12 tumors, including lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, liver cancer, thyroid cancer, cervical cancer, esophageal cancer, and endometrial carcinoma (Table 1); and baseline database screening making it efficient and easy to precisely detect rare mutations in ctDNA.
To build a baseline database, find all the gene mutations by ER-Seq from a number of the same type of samples (~1000 at the beginning). These real mutations must be verified by several other reliable detection methods and analysis. Next, summarize the pattern of false mutations and cluster all the false mutations to build the initial baseline database. Continue adding false mutations found from subsequent sequencing experiments to this database. Therefore, this baseline database becomes a dynamic expanded database, which significantly improves sequencing accuracy.
To promote progress in tumor diagnosis and monitoring, we present ER-Seq, a low cost and feasible method for the acquisition of universal data. We present a case study which underwent ER-Seq analysis, demonstrating its accuracy for detecting rare mutations and feasibility for use in the clinic.
Tumor specimens and blood samples were obtained according to a protocol approved by the Ethics Committee of Peking University People's Hospital. Written informed consent was obtained from the patients to use their samples. Participants were screened according to the following criteria: female, advanced Non-Small Cell Lung Cancer, EGFR p.L858R mutation indicated by previous Sanger Sequencing, disease progression following two session of EGFR targeted therapy with Erlotinib, and ER-Seq which was applied to ctDNA to analyze the cause of resistance and find new target drugs.
1. DNA Extraction from Peripheral Blood for cfDNA and genomic DNA (gDNA)
2. Library Preparation
NOTE: Regarding base library construction on fragmented DNA, cfDNA exists in fragments with a peak size ~170 bp and thus does not need to be fragmented.
3. Targeted DNA capture
NOTE: Target enrichment was performed using a custom sequence capture-probe which is specifically designed for a 1021 kb target enrichment region covering known tumor-associated driver mutations from 12 different types of tumors. Modifications to the manufacturer's protocol are detailed in the following steps.
4. Sequencing
5. Data analysis workflow
NOTE: Figure 3 displays the general work flow and data analysis process. Data analysis parameters and commands are shown below.
The 1021 kb probes enriched target regions used in ER-Seq are shown in Table 1, which covers over 95% of the gene mutations in 12 common tumors. The wide range of these probes makes this process applicable to a majority of cancer patients. Additionally, our unique sequencing adapters and baseline database screening make it possible to detect rare mutations precisely.
Due to the different properties of gDNA and cfDNA extracted from peripheral blood, there is a range of data quality, which is determined by the different instruments and quality controls shown in Table 2. Successfully extracted cfDNA should display a peak size of ~170 bp, as analyzed by the bioanalyzer QC and shown in Figure 1. Large fragments indicate contamination from genomic DNA, which should not appear in the final product. DNA extracted from this patient sample was of sufficient quality and quantity for ER-Seq (cfDNA – 42.6 ng; gDNA – 3.426 µg).
Target DNA capture using a custom probe was then amplified by PCR. The average fragment size was evaluated by the bioanalyzer and representative data for a patient's cfDNA in Figure 2A showed a peak around ~320 bp and a unique band on agarose gel. gDNA should appear as a smear on an agarose gel, as shown in Figure 2B. The qualities of both libraries were acceptable for the subsequent sequencing.
After sequencing, the data was analyzed in order according to the work flow in Figure 3B. To illustrate the advantages of our ER-Seq versus the traditional method, we performed both analyses using the same sample. Results from both analyses are shown in Table 3 and Table 4. We showed that ER-Seq improves coverage depth by 23% compared with the traditional method (Table 3, 3214.3 reads in ER-Seq vs 2475 reads in traditional analysis), which was due to its efficient recovery of cfDNA molecules, thus greatly enhancing the analysis of rare mutations. It is also clear that the unique sequencing adapters used in ER-Seq enable easy differentiation of natural and PCR-induced duplications (Table 3, 0 PCR induced duplicated read in ER-Seq). Additionally, we found in the calling results that analysis based on ER-Seq was 100% consistent for EGFR p.L858R detection and that the frequency of detection was a bit higher (2.7% vs 1.2%) when compared with traditional analysis (Table 4). Importantly, ER-Seq analysis enabled the detection of other relatively low-frequency mutations, including EGFR p.T790M (variation frequency 0.53%) (Table 4), and this was not recognized by traditional analysis due to high background noise.
Figure 1. Representative QC Result of Successfully Isolated cfDNA
For the left graphs, the X axis shows fragment size (bp) and the Y axis indicates relative fluorescence units (FU). The primary size of cfDNA is ~170 bp and there are no large fragments of contamination. A simulated electrophoresis gel is shown on the right. Please click here to view a larger version of this figure.
Figure 2. Example of Libraries QC Analyzed.
The X axis shows the fragment size (bp) and the Y axis indicates the relative fluorescence units (FU). A simulated electrophoresis gel was shown on the right of each graph. (A) Patient cfDNA sample showed a sharp peak after it was amplified with adapters. (B) Control gDNA sample showed evenly distributed fragments with no sharp peak and no bias toward one side. Both samples were acceptable for subsequent sequencing. Please click here to view a larger version of this figure.
Figure 3. ER-Seq Work Flow.
(A) General work flow. (B) Data analysis work flow. Please click here to view a larger version of this figure.
2735 exons from all the exon region of 70 genes | |||||||||
ABL1 | ABL2 | AKT1 | AKT2 | AKT3 | ALK | APC | AR | ARAF | ATM |
ATR | AURKA | AURKB | AXL | BAP1 | BCL2 | BRAF | BRCA1 | BRCA2 | BRD2 |
BRD3 | BRD4 | BTK | C11orf30 | C1QA | C1S | CBL | CCND1 | CCND2 | CCND3 |
CCNE1 | CD274 | CDH1 | CDK13 | CDK4 | CDK6 | CDK8 | CDKN1A | CDKN1B | CDKN2A |
CDKN2B | CHEK1 | CHEK2 | CRKL | CSF1R | CTNNB1 | DDR1 | DDR2 | DNMT3A | EGFR |
EPHA2 | EPHA3 | EPHA5 | ERBB2 | ERBB3 | ERBB4 | ERCC1 | ERG | ESR1 | EZH2 |
FAT1 | FBXW7 | FCGR2A | FCGR2B | FCGR3A | FGFR1 | FGFR2 | FGFR3 | FGFR4 | FLCN |
FLT1 | FLT3 | FLT4 | FOXA1 | FOXL2 | GAB2 | GATA3 | GNA11 | GNAQ | GNAS |
HDAC1 | HDAC4 | HGF | HRAS | IDH1 | IDH2 | IGF1R | IL7R | INPP4B | IRS2 |
JAK1 | JAK2 | JAK3 | KDR | KIT | KRAS | MAP2K1 | MAP2K2 | MAPK1 | MAPK3 |
MCL1 | MDM2 | MDM4 | MED12 | MET | MITF | MLH1 | MLH3 | MPL | MS4A1 |
MSH2 | MSH3 | MSH6 | MTOR | MYC | MYD88 | NF1 | NF2 | NOTCH1 | NOTCH2 |
NOTCH3 | NOTCH4 | NRAS | NTRK1 | NTRK3 | PALB2 | PDGFRA | PDGFRB | PDK1 | PIK3CA |
PIK3CB | PIK3R1 | PIK3R2 | PMS1 | PMS2 | PRKAA1 | PSMB1 | PSMB5 | PTCH1 | PTCH2 |
PTEN | PTPN11 | RAF1 | RARA | RB1 | RET | RHEB | RHOA | RICTOR | RNF43 |
ROCK1 | ROS1 | RPS6KB1 | SMARCA4 | SMARCB1 | SMO | SRC | STAT1 | STAT3 | STK11 |
SYK | TMPRSS2 | TOP1 | TP53 | TSC1 | TSC2 | VEGFA | VHL | XPO1 | XRCC1 |
introns, promoters and breakpoints or fusion region from the following 24 genes | |||||||||
ALK | FGFR1 | FGFR2 | FGFR3 | NTRK1 | NTRK3 | PDGFRA | PDGFRB | ROS1 | RET |
MET | BRAF | ABL1 | BRD3 | BRD4 | EGFR | RAF1 | BCR | ERG | TMPRSS2 |
RARA | KIF5B | BCL2L11 | TERT | ||||||
other relative genes: 1122 exons from 847 genes |
Table 1. 1021 kb Probes Enriched Target Regions.
These regions included: 2735 exons from 170 genes; introns, promoter regions and breakpoint regions from 24 genes; 1122 exons from 847 related genes. These cover over 95% of the tumor related driver mutations and target mutation sites from the 12 most common tumors (lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, liver cancer, thyroid cancer, cervical cancer, esophageal cancer, and endometrial carcinoma).
DNA | Result | Indication | Ideal Range |
cfDNA | Fragment Size | Identification of DNA Fragment distribution | 168bp±20(a) |
Concentration | More accurate DNA quantification | Total cfDNA ≥30ng(b) | |
gDNA | A260/A230 | Identification of chemical contaminants (e.g., ethanol) | 1.50 – 2.2 |
A260/A280 | Identification of protein contaminants | 1.60 – 2.2 | |
Concentration | DNA quantification | Total gDNA >3ug |
Table 2. Quality Analysis of cfDNA and gDNA Extracted from Peripheral Blood.
The peak size of cfDNA is around 170Bp. Quality control should be performed by a standardized method for the 2100 Bioanalyzer to determine levels of sample degradation and/or gDNA contamination. Peak analysis of the quality of the cfDNA extraction should appear similar to Figure 1. Note that increased DNA amount will result in a library with a higher quality for the effective detection of rare mutations in cfDNA. We recommend using at least 30 ng cfDNA (30,000 copies).
Table 3. The QC of ER-Seq Information Analysis and Traditional Information Analysis.
The dark gray highlight indicates a coverage depth increase occurred in ER-Seq compared with traditional methods; a light gray highlight indicates no PCR induced duplicated read in ER-Seq. Please click here to view a larger version of this figure.
Gene | Chr | Start | End | cHGVS | pHGVS | Function | caseAltIsHot | ER-Seq var_freq | Traditional var_freq |
TP53 | 17 | 7577534 | 7577535 | c.746G>T | p.R249M | Missense | HighFreq | 0.0400 | 0.0378 |
EGFR | 7 | 55259514 | 55259515 | c.2573T>G | p.L858R | Missense | Actionable | 0.0270 | 0.0120 |
ATM | 11 | 108203618 | 108203619 | c.7919delC | p.T2640Ifs*6 | frameshift | ND | 0.0169 | 0.0180 |
PTCH1 | 9 | 98221917 | 98221918 | c.2851G>T | p.D951Y | Missense | ND | 0.0154 | 0.0260 |
EGFR | 7 | 55249070 | 55249071 | c.2369C>T | p.T790M | Missense | Actionable | 0.0053 | ——(0.0013) |
RB1 | 13 | 49039151 | 49039152 | c.2230A>T | p.I744F | Missense | ND | 0. 0036 | ——(0.0014) |
Table 4. The Calling Result of ER-Seq Information Analysis and Traditional Information Analysis.
The existence of circulating tumor DNA (ctDNA) was discovered more than 30 years ago, however the application of ctDNA analyses is still not routine in clinical practice. Interest in the practical application of ctDNA methods has increased with the development of technologies for ctDNA detection and analysis. Tumor monitoring with ctDNA offers a minimally-invasive approach for the assessment of microscopic residual disease, response to therapy, and tumor molecular profiles under the background of tumor evolution and intratumoral heterogeneity13,14. Improvements in the sensitivity and specificity of analysis will facilitate all these applications15,16.
In this manuscript, we introduced ER-Seq, a promising method aimed at improving the sensitivity of ctDNA detection, using a NSCLC case as an example. Compared with traditional analysis, the application of our unique sequencing adapters in ER-Seq significantly improved its sensitivity and specificity, indicated by a coverage depth increase and an ability to detect low-frequency mutations (such as EGFR p.T790M (0.53%)). The existence of T790M in this patient sample may be a contributing factor to Erlotinib resistance, and gave insight into how to better treat this patient, suggesting the use of a third-generation EGFR inhibitor specific for T790M, such as AZD929117-20. Another critical point in ER-Seq that contributes to its sensitivity and specificity is the baseline database screening, which facilitates the precise detection of low-frequency mutations in ctDNA by filtering out background noise.
Although ER-Seq has many advantages that make it superior to traditional methods, there are still some challenges that need to be overcome. One of the major challenges for ER-Seq is the extremely low level of ctDNA present in the blood. For those low-frequency mutations, the acquisition of sufficient template ctDNA is critical. Our unique adapters significantly increase the recycle rate of ctDNA, though they still need to be improved. Another challenge for ER-Seq, and for all the other sequencing techniques, is the limitation of coverage of target regions. Our selected 1021 kb probe enriched regions, which can cover about 95% of tumor-related mutations, are more comprehensive compared with other small-plane methods21,22,23. However, as tumor heterogeneity has been well recognized and more and more tumor biomarkers have been discovered, the relative coverage rate of our probes enriched regions decreases. To better serve as a tool for early diagnosis and disease monitoring of tumor patients, more comprehensive sequencing and analysis techniques are needed. Currently, ER-Seq still relies on a high volume of data for detecting low-frequency mutations. Further improvement of data utilization could be favorable for the application of ER-Seq.
As discussed above, many limitations associated with ER-Seq exist. However, our unique analysis technique still has many advantages compared with other top-ranking methods in this field. The pros and cons of this technique merit further research. The potential of ER-Seq for routine clinical use will be of great importance for clinical physicians, providing them with a powerful tool to diagnose tumors and monitor tumor dynamics and response to therapy. Novel mutations associated with resistance to conventional and targeted therapy found by our analysis might offer new avenues of treatment for cancer patients with advanced disease.
The authors have nothing to disclose.
This work is supported by Geneplus–Beijing Institute.
QIAamp Circulating Nucleic Acid Kit | Qiagen | 55114 | DNA Extraction from Peripheral Blood for cfDNA |
QIAamp DNA Blood Mini Kit | Qiagen | 51105 | DNA Extraction from Peripheral Blood for gDNA |
Quant-iT dsDNA HS Assay Kit | Invitrogen | Q32854 | Measure cfDNA concentration |
Quant-iT dsDNA BR Assay Kit | Invitrogen | Q32853 | Measure library concentration |
Agilent DNA 1000 Reagents | Agilent | 5067-1504 | Measure cfDNA and library fragments |
The NEBNext UltraII DNA Library Prep Kit for Illumina | NEB | E7645L | Library Preparation |
Agencourt SPRIselect Reagent | Beckman | B23317 | DNA fragment screening and purification |
Tris-HCl (10 mM, pH 8.0)-100ML | Sigma | 93283 | Dissolution |
xGen Lockdown Probes | IDT | —— | xGen Custom Probe |
Human Cot-1 DNA | Life | 15279-011 | Targeted DNA capture |
Dynabeads M-270 Streptavidin | Life | 65305 | Targeted DNA capture |
xGen Lockdown Reagents | IDT | 1072281 | Targeted DNA capture |
KAPA HiFi HotStart ReadyMix | KAPA | KK2602 | post-capture PCR enrichment libraries |
KAPA Library Quantification Kit | KAPA | KK4602 | Measure library concentration |
NextSeq 500 High Output Kit v2((150 cycles) | illumina | FC-404-2002 | Sequence |
Centrifuge5810 | eppendorf | 5810 | |
Nanodrop8000 | Thermo Scientific | 8000 | Measure gDNA concentration |
Qubit 2.0 | Invitrogen | Quantify | |
Agilent 2100 Bioanalyzer | Agilent | ||
ThermoMixer C | eppendorf | Incubation | |
16-tube DynaMagTM-2 Magnet | Life | 12321D | |
Concentrator plus | eppendorf | ||
PCR | AB | simplyamp | |
QPCR | AB | 7500Dx | |
NextSeq 500 | illumina |