This protocol describes an approach to interrogate the recombined immunoglobulin heavy chain VDJ regions of lymphomas by deep-sequencing and retrieve VDJ rearrangement and somatic hypermutation status to delineate clonal architecture of individual tumor. Comparing clonal architecture between paired diagnosis and relapse samples reveals lymphoma relapse clonal evolution modes.
Understanding tumor clonality is critical to understanding the mechanisms involved in tumorigenesis and disease progression. In addition, understanding the clonal composition changes that occur within a tumor in response to certain micro-environment or treatments may lead to the design of more sophisticated and effective approaches to eradicate tumor cells. However, tracking tumor clonal sub-populations has been challenging due to the lack of distinguishable markers. To address this problem, a VDJ-seq protocol was created to trace the clonal evolution patterns of diffuse large B cell lymphoma (DLBCL) relapse by exploiting VDJ recombination and somatic hypermutation (SHM), two unique features of B cell lymphomas.
In this protocol, Next-Generation sequencing (NGS) libraries with indexing potential were constructed from amplified rearranged immunoglobulin heavy chain (IgH) VDJ region from pairs of primary diagnosis and relapse DLBCL samples. On average more than half million VDJ sequences per sample were obtained after sequencing, which contain both VDJ rearrangement and SHM information. In addition, customized bioinformatics pipelines were developed to fully utilize sequence information for the characterization of IgH-VDJ repertoire within these samples. Furthermore, the pipeline allows the reconstruction and comparison of the clonal architecture of individual tumors, which enables the examination of the clonal heterogeneity within the diagnosis tumors and deduction of clonal evolution patterns between diagnosis and relapse tumor pairs. When applying this analysis to several diagnosis-relapse pairs, we uncovered key evidence that multiple distinctive tumor evolutionary patterns could lead to DLBCL relapse. Additionally, this approach can be expanded into other clinical aspects, such as identification of minimal residual disease, monitoring relapse progress and treatment response, and investigation of immune repertoires in non-lymphoma contexts.
Cancer is a clonal disease. Since thirty years ago when Peter C. Nowell proposed the cancer clonal evolution model1, many studies have tried to dissect clonal populations within tumor samples and reconstruct clonal expansion and evolution patterns that underlie the tumorigenesis process2. Recently, whole-genome sequencing has enabled investigators to take a deep look at the clonal heterogeneity and evolution3,4. However, due to the lack of tractable markers in many cell types, it is difficult to infer the precise clonal architecture and evolutionary path. Fortunately there is a natural clonality marker in mature B cells from which many lymphoid malignancies, including DLBCL, originate. In response to antigen stimulation, each B-cell can form a single productive IgH VDJ sequence by joining a VH (variable), a D (diversity), and a JH (joining) segment together from a large pool of these segments. During this process, small portions of the original sequence may be deleted and additional non-templated nucleotides may be added to create a unique VDJ rearrangement. This specific VDJ rearrangement can be inherited in all the progeny of this B-cell, therefore tagging individual mature B-cell and its offspring5. Furthermore, SHM occurs on the recombined VDJ sequences in the subsequent germinal center (GC) reaction to introduce additional mutations for the expansion of the antibody pool and the enhancement of antibody affinity6. Therefore, by comparing and contrasting VDJ and SHM patterns of lymphoma samples that have undergone these processes, intra-tumor heterogeneity could be delineated and clonal evolution path of the disease may be deduced.
Previously, VDJ rearrangement and SHM could be identified by PCR amplifying the recombined regions, cloning the PCR products, and subsequently Sanger sequencing to obtain sequence information. This approach is low-throughput and low yield, retrieving only a very small portion of the entire recombined VDJ repertoire, and hindering the characterization of the overall representation of the clonal population within a given sample. A modified approach was created by generating NGS indexed sequencing libraries from VDJ PCR products and performing PE 2×150 bp sequencing to obtain more than half a million recombined VDJ sequences per sample. In addition, a custom pipeline was developed to perform quality control (QC), align, filter VDJ sequencing reads to identify rearrangements and SHMs of each read, and perform phylogenetic analysis on the clonal architecture of each sample. In addition, a new approach has been established to further characterize the clonal evolution patterns for samples collected at various disease stages.
We have applied this technique to DLBCL patient samples. DLBCL is an aggressive form of non-Hodgkin lymphoma with frequent relapse in up to one third of the patients7. DLBCL relapses normally occur early, within 2 to 3 years of the initial diagnosis, although some do occur after 5 years8. Prognosis for relapsed patients is poor, with only 10% achieving 3 year progression-free survival due to limited treatment options. This is the basis to the urgent need for novel approaches to treat DLBCL relapse9,10. However, molecular mechanisms associated with DLBCL relapse are still largely unknown. Particularly, the role of clonal heterogeneity at diagnosis and clonal evolution during DLBCL relapse development are presently uncharacterized, making it difficult to define an accurate and useful biomarker to predict relapse. To address these questions, we applied our VDJ-sequencing approach on multiple pairs of matched primary diagnosis-relapse DLBCL sample pairs. Two distinct clonal evolutionary scenarios of relapse emerged from the comparison of the clonal architectures between the diagnosis and relapse samples that suggests multiple molecular mechanisms may be involved in DLBCL relapse.
1. VDJ Amplification
1.1) DNA Extraction from Tumor Samples
1.2) DNA Quality Assessment
1.3) VDJ PCR
1.3.1) Amplify Recombined IgH VDJ Segment from Framework Region 1 (IgVHFR1)
1.3.2) Amplify Recombined IgH VDJ Segment from Framework Region 2 (IgVHFR2)
1.4) Optimize VDJ PCR
2. VDJ Amplicon Library Preparation and Sequencing
2.1) Library Preparation
2.1.1) End-repair
2.1.2) A-tailing
2.1.3) Adaptor Ligation
2.1.4) Amplify DNA Fragments
2.1.5) Library Validation
2.2) VDJ-PCR Library Pooling and Sequencing
3. Data Analysis
Note: A summary of the bioinformatics scripts used in this section can be found as a Supplementary Code File.
3.1) Alignment and QC
3.2) SHM Profile Identification
3.3) Graphical Representation of Results
3.4) Heterogeneity (Entropy) Measurement
The overall procedure of VDJ sequencing (VDJ-seq), including DNA extraction, recombined VDJ region amplification and purification, sequencing library construction, reads processing, and phylogenetic analysis, is represented in Figure 1. Routinely 5-200 μg DNA can be retrieved from frozen solid tissue sections or 0.5-20 μg DNA from formalin-fixed paraffin-embedded tissue sections. Depending on the quality, rearrangement pattern, and SHM degree of individual samples, a variety of VDJ PCR products from different samples could be obtained (Figure 2), including IGVHFR1 (310-380 bp), FR2 (250-295 bp), and FR3 (70-120 bp). Samples from which only FR3 fragment can be obtained from VDJ PCR were excluded from the remaining study due to the short product that does not provide sufficient amount of sequence information for downstream data analysis purpose. Only samples of which a major FR1 or FR2 PCR product can be visualized on the agarose gel are processed to generate sequencing libraries. The yield of the major PCR product after gel purification ranges from 10 to 50 ng in total.
The entire purified PCR product was used for the subsequent library construction for sequencing library. After adaptor ligation, each sample obtains its own unique index. During library amplification, 10 cycles of PCR were performed which could yield more than 30 ng as final library products. The library quality was accessed by a bioanalyzer. As shown in Figure 3, there is one single product in the library sample with a size shift of ~125 bp form the original VDJ PCR amplicon product indicating the additional of adaptor. For each sequencing run, 5-10 libraries were pooled together at 7 µM. In addition, due to the high sequence similarity between the VDJ PCR products, the library pool was mixed with 50% PhiX spike-in to ensure the complexity of the run and correct identification of base additional after each sequencing cycle. Library DNA fragments were sequenced for 150 bp on both ends, which allows the retrieval of sufficient amount of sequence information spanning VD and DJ junctions for the FR1 fragments, and SHM mutation patterns for phylogenetic analysis.
Previously, we performed VDJ sequencing on 32 DLBCL samples collected at diagnosis and relapse from 14 patients (several patients had multiple samples collected at either diagnosis or relapse stage) using the condition described above12. By performing the phylogenetic analysis of the SHM profiles of the major VHDJH rearrangements between the paired samples, an “Early-divergent” and a “Late-divergent” mode of clonal evolution associated with DLBCL relapse were identified12. Herein, the same approach was applied to a newly collected DLBCL diagnosis and relapse pair. The FR1 regions were obtained in both samples after PCR amplification. A major PCR product with the size around 344 bp (measured by bioanalyzer) was obtained from both the diagnosis and the relapse samples. After sequencing, a total of 0.70 million paired-end reads were obtained for the diagnosis sample and 0.73 million paired-end reads for the relapse sample, which were within the range of the number of reads obtained per sample in the previous study (0.38-1.42 million, average 0.75 ± 0.26 million)12. After mapping paired-end reads against IMGT database, reads that do not have hits in all three V, D and J regions were discarded. VHDJH arrangement was assigned to each paired-end read. A total of 0.64 million and 0.67 million VHDJH junctions were identified in the diagnosis and relapse samples respectively, representing greater than 90% alignment rate. By counting how many times each combination of VHDJH was found in individual samples, 808 distinct VHDJH junctions were found in the diagnosis sample and 581 distinct VHDJH junctions in the relapse sample. The dominant VHDJH junction in both sample is IGHV4-34*2 IGHD5-5*01 IGHJ2*01, confirming that they are clonally related. This dominant VHDJH junction accounted for 97% of the entire mapped VHDJH junction reads in both the diagnosis sample and the relapse sample.
To trace the clonal evolution of this DLBCL relapse case, phylogenetic analysis of the SHM profiles of the major VHDJH rearrangements between this pair of diagnosis and relapse samples was performed. We observed that the clonal evolution pattern of this pair was following the “Late-divergent” path (Figure 4), that the subclones from the diagnosis and relapse tumors clustered together on the same branch of the phylogenetic tree, and the dominant diagnosis subclone and the dominant relapse subclones shared similar SHM pattern with a few additional mutations occurred in the relapse subclone. These results suggest that potentially there was a subclone in the diagnosis tumor that was either chemo-resistant or escaped from the treatment, and then eventually developed into the relapse tumor.
To further characterize and visualize clonal architecture of each samples and clonal evolution patterns during relapse process, two new analyses have been developed. In these analyses, the entire set of subclones of the major VJ rearrangement in each sample were used to calculate the pairwise string distance between all subclones as defined by their patterns of SHM. Then by using multi-dimensional scaling (Figure 5A, 5B) or by building an undirected graph (Figure 5C, 5D) as described in Data Analysis, plots representing the distribution and heterogeneity of subclones were created. As illustrated in Figure 4, the MDS plot exhibits far less sequence diversity between major subclones in the late divergent cases (Figure 5A) than that in the early divergent cases (Figure 5B). Furthermore, the sub-clones of the diagnosis sample and the sub-clones of the relapse sample in the early divergent case separate completely from each other, indicating the lack of similarity of the SHM patterns between these tumors even though they carry the same VDJ rearrangement. Likewise, the undirected graphs in (Figure 5C) and (Figure 5D) show that the distribution of subclone counts in the late divergent case (Figure 5C) differs from the early divergent (Figure 5D). The latter has more prominent but genetically diverse major clones from diagnosis to relapse. In addition, the lack of edges and smaller subclones connections between the major diagnosis and relapse clones in the early divergent case (Figure 5D), demonstrate the sequence diversity nature of the early divergent relapse case.
Figure 1: Step by step flow chart of the VDJ-seq approach.The overall procedure of VDJ sequencing is shown. Please click here to view a larger version of this figure.
Figure 2: Representative images of VDJ PCR amplicon products. Gel images showing the representative IGVH-FR1 (left panel) and IGVH-FR2 (right panel) PCR products from DNA extracted from lymphoma patient samples. A PCR product could not be obtained in Sample 5 probably due to either the low sample DNA quality or SHM at primer sites that impaired the PCR reaction. Please click here to view a larger version of this figure.
Figure 3: QC of VDJ PCR amplicon and subsequent sequencing library by bioanalyzer. Representative Bioanalyzer traces of the same VDJ amplicon before library construction (top panel) and after library construction (bottom panel). Note the size shift from 334 bp to 459 bp indicating the addition of the adaptor. Please click here to view a larger version of this figure.
Figure 4: Clonal evolution analysis of a pair of DLBCL diagnosis and relapse samples. Phylogenetic analysis of a diagnosis-relapse DLBCL sample pair showing the “Late-divergent” clonal evolution mode. The color tickers represent the mutation status. Please click here to view a larger version of this figure.
Figure 5: New graphical methods to visualize clonal evolution patterns. (A, B) Multidimensional scaling plots integrating SHM profiles and subclone counts of sample pair A showing the late-divergent relapse mode (A) and sample pair B showing early-divergent relapse mode (B). The radius of the circles indicate the count of subclones corresponding to a particular SHM profile and colors corresponding to the diagnosis (blue) sample or relapse (red) sample. The X- and Y-axes represent the top 2 components of MDA. (C, D) Undirected graphs of sample pair A showing the late-divergent relapse mode (C) and sample pair B showing early-divergent relapse mode (D). Edges correspond to two SHM profiles having a string distance of one letter separation and graduated shading (color scale bar) indicating the proportion of sequences mapping to a particular SHM profile corresponding to the diagnosis (red) sample or relapse (yellow) sample. Please click here to view a larger version of this figure.
Because of the nearly unlimited number of iterations of sequence information coded by VDJ rearrangement and SHM at the IGH locus of human B cells, examination of the entire IGH repertoire by high-throughput deep-sequencing proved to be an efficient and comprehensive way to delineate clonal and sub-clonal B-cell populations. Furthermore, this strategy can be used to study the clonal evolution path of B cell tumor development, remission, and relapse by comparing the clonal and sub-clonal architectures of patient samples collected along various stages of the disease. Although amplification of rearranged VDJ sequences from primary sample DNA is readily performed in a laboratory setting using commercially available primers, the efficiency of amplification is largely dependent on the quality of sample DNA. In particular, DNA extracted from FFPE samples exhibits low amplification efficiency, and often only the small FR3 fragment can be successfully amplified from these samples, making them unsuitable for the subsequent analysis. Since our study described herein was designed to understand the clonality of B cell lymphoma, which is a clonal disease with one or several major clones dominating the entire tumor, we only collected and sequenced the major VDJ-PCR product. For studies interested in cataloging the entire B cell clonal population of, for example, peripheral blood, a larger portion of the gel that contains multiple VDJ-PCR products (multiple bands or a smear on the gel) may be excised and sequenced. However, it is important to point out that it is not possible to capture the entire rearranged IGH repertoire because certain SHM may occur at where primers target and impair the amplification of these VDJs. Moreover, recently, a commercial assay directly coupling IGH rearrangement amplification and MiSeq sequencing has been introduced. This assay streamlines sample preparation process. However, since not every tumor samples would be able to generate the FR1 fragment, we still recommend including the gel electrophoresis step to check the product of the PCR reaction before pooling samples for sequencing.
Sequencing 150 bp of both ends of the VDJ fragment (total of 300 bp sequence information) has the following advantages: 1) 300 bp sequencing can cover the majority of the FR1 and the entire FR2 fragment, providing ample sequence information for identifying VDJ rearrangement and somatic hyper-mutations; 2) The platform provides fast sequencing turn-around time that completes the entire 300 cycles of VDJ-sequencing under 48 hr; 3) Each run allows multiplexing up to 12 samples and still achieves around one million paired-end reads per sample output. Because the high sequence similarity between clones carrying the same VDJ rearrangement, especially in B cell lymphoma samples, it is critical to spike-in 20-50% PhiX during the sequencing run to allow accurately define base calling matrix. Recently, the MiSeq software has been updated to address this question. It is recommended to use as little as 5% PhiX spike-in for low complexity library sequencing. However, our sequencing facility has experienced inconsistent concentration of the PhiX spike-in control DNA provided by the manufacturer that would compromise the outcome of low complexity sequencing. Therefore, in current practice, we still use 10-20% PhiX spike-in for the VDJ sequencing. For sequencing of the entire peripheral blood B-cell repertoire, which normally contains a large number of different VDJ arrangements, it is possible to reduce the amount of PhiX spike-in. Although the number of clones and subclones identified of each sample is positively correlated with the number of reads sequenced, half to one million of paired-end reads per sample is sufficient to compare clonal structures between samples and deduce clonal evolution patterns between samples collected at different disease stages of the same patient12. It is possible that PE 150 bp sequencing might not achieve sufficient coverage of those long FR1 fragments (i.e. 350-380 bp). In some cases, sequence information of the D segment might not be collected or there might not be enough sequence information on the V segment to differentiate subclones by SHMs. However, with the advance of the sequencing technologies, it has become possible to sequence longer and cover the entire FR1 amplicon.
Through the sequencing of the IGH repertoire of B-cells it was possible to track the expansion of individual B cell clones and infer clonal evolution over the course from diagnosis to relapse. With the use of analysis on the populations of subclones generated by SHM patterns, we were able to differentiate two modes of cancer progression to relapse by examining their phylogenies. The latest graphical representations using multidimensional scaling plots and undirected graphs based on the Levenshtein distance further allow us to understand 1) the subclonal composition of each tumor, 2) how individual subclones are related to one another, and 3) how each subclone evolves during disease progression. Furthermore, statistical analysis showed that these phylogenetic structures were clearly different and that the subclonal populations differed as regards to their heterogeneity. Finally, the clonal evolution trajectory obtained by VDJ-sequencing can be well correlated with genetic evolution and characterized by either exome or targeted resequencing12. Therefore, combining these two approaches has the potential to identify driver mutations during lymphomagenesis and lymphoma relapse
In addition to defining the B cell IGH repertoire and trace clonal evolution of lymphoma progression, the VDJ-seq approach can be potentially used as a highly sensitive method for tracking minimal residue disease, monitoring tumor response to therapies, and potentially identifying the presence of drug-resistant clones. A similar approach can also be applied to T cells to map T cell receptor repertoire, and may be used to probe immune surveillance response to cancer. Moreover, VDJ-seq can also be performed in murine models using primers available from Hanna et al.13 It is foreseeable that information collected through this approach in the next few years can result in a better understanding of the dynamics of tumor development as well as expand our knowledge of the immune system and its roles in defending our body from tumor invasion.
The authors have nothing to disclose.
The authors would like to thank Dr Rita Shaknovich and members of Elemento lab, Melnick lab, and Tam lab for thoughtful discussions. We would also like to thank the Genomics Resources Core Facility at Weill Cornell Medical College for performing the VDJ-sequencing. YJ was supported by ASH Scholar Award. WT and OE are supported by Weill Cornell Cancer Center Pilot Grant. OE is supported by the NSF CAREER award, the Starr Cancer Consortium and the Hirschl Trust. We would also like to thank Katherine Benesch, JD, MPH for her generous support to this project.
Ethanol | VWR | 89125-170 | 200 proof, for molecular biology |
TE buffer | Life Technologies | 12090-015 | 10 mM Tris·Cl, pH 8.0; 10mM EDTA |
Xylenes | VWR | EM-XX0055-6 | 500 ml |
Proteinase K | Life Technonogies | 25530-015 | 100 mg |
Deoxynucleotide triphosphate (dNTP) Solution Mix | Promega | U1515 | 10 mM each nucleotide |
10x PCR Buffer | Roche | 11699105001 | Without MgCl2 |
AmpliTaq Gold DNA Polymerase with Gold Buffer and MgCl2 | Life Technonogies | 4311806 | 50 µl at 5 U/µl |
Specimen Control Size Ladder | Invivoscribe Technologies | 2-096-0020 | 33 reactions |
IGH Somatic Hypermutation Assay v2.0 – Gel Detection | Invivoscribe Technologies | 5-101-0030 | 33 reactions, Mix 2 for IGVHFR1 detection |
IGH Gene Clonality Assay – Gel Detection | Invivoscribe Technologies | 1-101-0020 | 33 reactions, Tube B for IGVHFR2 detection |
GoTaq Flexi DNA Polymerase | Promega | M8291 | 20 µl at 5 U/µl |
10X Tris-Borate-EDTA (TBE) buffer | Corning (cellgro) | 46-011-CM | 6×1 L |
50X TAE BUFFER | VWR | 101414-298 | 1 L |
Ethidium bromide solution | Sigma-Aldrich | E1510 | 10 mg/ml |
25 bp DNA Ladder | Life Technologies | 10597011 | 1 µg/µl |
100 bp DNA Ladder | Life Technologies | 15628-019 | 1 µg/µl |
UltraPure Agarose | Life Technologies | 16500-500 | 500 g |
40% Acrylamide/Bis Solution | Bio-Rad Laboratories | 1610144 | 500 ml |
QIAquick Gel Extraction Kit | Qiagen | 28704 | 50 columns |
Agencourt AMPure XP | Beckman Coulter | A63881 | |
Qubit dsDNA High Sensitivity Assay Kit | Life Technologies | Q32854 | |
High Sensitivity DNA Kit | Agilent Technologies | 5067-4626 | |
2100 Bioanalyzer | Agilent Technologies | ||
PhiX Control v3 | Illumina | FC-110-3001 | |
MiSeq | Illumina | ||
Qubit 2.0 Fluorometer | Life Technologies | Q32872 | |
Resuspension buffer (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
End repair mix (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
A-tailing mix (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
Ligation mix (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
DNA adaptor index (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
Stop ligation buffer (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
PCR primer cocktail (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
PCR master mix (Illumina TruSeq DNA Sample Preparation Kit v2) | Illumina | FC-121-2001 | |
Magnetic stand | Life Technologies | 4457858 | |
Gel imaging system | Bio-Rad Laboratories | 170-8370 |