We describe a protocol for amplifying retroviral integration sites from the genomic DNA of infected cells, sequencing the amplified virus-host junctions, and then mapping these sequences to a reference genome. We also describe techniques to quantify the distribution of integration sites relative to various genomic annotations using BEDTools.
Retroviruses exhibit signature integration preferences on both the local and global scales. Here, we present a detailed protocol for (1) generation of diverse libraries of retroviral integration sites using ligation-mediated PCR (LM-PCR) amplification and next-generation sequencing (NGS), (2) mapping the genomic location of each virus-host junction using BEDTools, and (3) analyzing the data for statistical relevance. Genomic DNA extracted from infected cells is fragmented by digestion with restriction enzymes or by sonication. After suitable DNA end-repair, double-stranded linkers are ligated onto the DNA ends, and semi-nested PCR is conducted using primers complementary to both the long terminal repeat (LTR) end of the virus and the ligated linker DNA. The PCR primers carry sequences required for DNA clustering during NGS, negating the requirement for separate adapter ligation. Quality control (QC) is conducted to assess DNA fragment size distribution and adapter DNA incorporation prior to NGS. Sequence output files are filtered for LTR-containing reads, and the sequences defining the LTR and the linker are cropped away. Trimmed host cell sequences are mapped to a reference genome using BLAT and are filtered for minimally 97% identity to a unique point in the reference genome. Unique integration sites are scrutinized for adjacent nucleotide (nt) sequence and distribution relative to various genomic features. Using this protocol, integration site libraries of high complexity can be constructed from genomic DNA in three days. The entire protocol that encompasses exogenous viral infection of susceptible tissue culture cells to integration site analysis can therefore be conducted in approximately one to two weeks. Recent applications of this technology pertain to longitudinal analysis of integration sites from HIV-infected patients.
Integration of viral DNA (vDNA) into the host cell genome is an essential step in the retroviral life cycle. Integration is accomplished by the viral enzyme integrase (IN), which carries out two distinct catalytic processes that lead to the establishment of the stably inserted provirus 1. IN subunits engage the ends of the linear vDNA that is generated through reverse transcription, forming the higher-order intasome with vDNA ends held together by an IN multimer 2-4. IN cleaves the 3' ends of the vDNA downstream from invariant 5'-CA-3' sequences in a process known as 3'-processing, leaving recessed 3' ends with reactive hydroxyl groups at each vDNA terminus 5-8. The intasome is subsequently imported into the nucleus as part of a large assembly of host and viral proteins known as the preintegration complex (PIC) 9-11. After encountering cellular target DNA (tDNA), IN uses the vDNA 3'-hydroxyl groups to cleave the tDNA top and bottom strands in a staggered fashion and simultaneously joins the vDNA to tDNA 5' phosphate groups through the process of strand transfer 12,13.
Retroviruses exhibit integration site preferences on the local and global scales. Locally, consensus integration sites consist of weakly conserved palindromic tDNA sequences that span from approximately five to ten bp upstream and downstream from the vDNA insertion sites 14,15. Globally, retroviruses target specific chromatin annotations 16. There are seven different retroviral genera – alpha through epsilon, lenti, and spuma. The lentiviruses, which include HIV-1, favor integration within the bodies of actively transcribed genes 17, while the gammaretroviruses preferentially integrate into transcriptional start sites (TSSs) and active enhancer regions 18-20. In sharp contrast, spumavirus is strongly biased towards heterochromatic regions, such as gene-poor lamina-associated domains 21. Local tDNA base preferences are in large part dictated by specific networks of nucleoprotein contacts between IN and tDNA 13,22,23. For the lentiviruses and gammaretroviruses, integration relative to genomic annotations is in large part governed by interactions between IN and cognate cellular factors 24-27. Altering the specifics of the IN-tDNA interaction network 13,22,23,28 and disrupting or re-engineering IN-host factor interactions 25-27,29-32 are proven strategies to retarget integration on the local and global levels, respectively.
The power of DNA sequencing procedures used to catalogue retroviral integration sites has increased immensely over the past decades. Integration sites were recovered in pioneering work using laborious purification and manual cloning techniques to yield just a handful of unique sites per study 33,34. The combination of LM-PCR amplification of LTR-host DNA junctions with the ability to map individual integration sites to human and mouse draft genomes transformed the field, with the number of sites recovered from exogenous tissue culture cell infections increasing to several hundred to thousands 17,18. The more recent combination of LM-PCR with NGS methodology has sent library depth skyrocketing. Specifically, pyrosequencing yielded on the order of tens of thousands of unique integration sites 30,35-38, while libraries sequenced through the use of DNA clustering can yield millions of unique sequences 19-21,39. Here we describe an optimized LM-PCR protocol for amplifying and sequencing retroviral integration sites using DNA clustering NGS. The method incorporates required adapter sequences into the PCR primers and hence directly into the amplified DNA molecules, thereby precluding the requirement for an additional adapter ligation step prior to sequencing 40. The bioinformatic analysis pipeline, from the parsing of raw sequencing data for LTR-host DNA junctions to the mapping of unique integration sites to pertinent genomic features, is also generally described. In accordance with the precedence established from prior methodological protocols in this field 36,38,41-43, custom scripts can be developed to aid the completion of specific steps in the bioinformatics pipeline. The utility and sensitivity of the protocol is illustrated with representative data by amplifying, sequencing, and mapping HIV-1 integration sites from tissue culture cells infected at the approximate multiplicity of infection (MOI) of 1.0, as well as a titration series of this DNA diluted through uninfected cellular DNA in 5-fold steps to a maximum dilution of 1:15,625 to yield the approximate equivalent MOI of 6.4 x 10-5.
1. Generate Virus Stocks
Note: A flow chart of the wet bench aspect of this protocol is depicted in Figure 1. The details of viral stock production and subsequent infection of tissue culture cells will generally apply to different types of retroviruses. For some experiments, the target cell may not express the endogenous viral receptor(s), and in such cases the construction of pseudotyped retroviral particles harboring heterologous viral envelope glycoprotein, e.g. the G glycoprotein from vesicular stomatitis virus (VSV-G), will be required for infection 44,45.
Note: Precaution should be taken when working with HIV-1. Though specific guidelines will vary from institution to institution, all virus-based work should be conducted in a dedicated, operator restricted biological safety cabinet (typically referred to as a tissue culture hood). Proper personal protective equipment that includes face protection, shoe covers, a double glove layer, and a full-body coverall suit should be worn at all times. All liquid waste resulting from virus-related experiments should be inactivated with bleach (10% final concentration), and all waste including solids should be autoclaved prior to disposal.
2. Infect Cells with Virus
3. Fragment Genomic DNA by Sonication or by Restriction Enzyme Digest
Note: Sonication fragments genomic DNA in a virtually sequence-independent manner and is thus the preferable mode of fragmentation when sequencing samples with a low expected recovery rate (e.g., infected patient cells or infections initiated at relatively low MOI). Furthermore, sonication allows one to distinguish PCR duplicates of a particular integration site sequence from unique integrations at the same site, which is critical to distinguish the clonal expansion of provirus-containing cells in infected patients (see Step 11 below) 39,52-54.
Note: The DNA should be cleaved immediately downstream from the upstream LTR to diminish amplification of internal viral sequences during LM-PCR. The restriction enzyme BglII that lies 43 bp downstream from the upstream U5 sequence and that is incompatible for subsequent ligation with MseI-generated DNA ends works well with many HIV-1 strains (Figure 1B). When preparing DNA by sonication, the internal-cleaving restriction enzyme should be applied after linker ligation (see Figure 1C–E and Step 4.3 below).
4. Anneal Linker Oligonucleotides and Ligate to Fragmented Genomic DNA
Note: Prepare an asymmetric linker containing an overhang that is compatible with the above DNA fragments (see Table 1 for the sequences of oligonucleotides utilized in this protocol). The linker to be used with sonicated DNA must contain a compatible T-3' overhang, while the linker for MseI-digested DNA must contain a compatible 5'-TA overhang (Figure 1). The short linker strand must additionally contain a non-extendable chemical modification, such as 3'-amine, to constrain the subsequent amplification reactions toward the DNA of interest.
Note: When preparing multiple different integration site libraries in parallel and/or when multiplexing unique samples on the same sequencing run, it is recommended to use unique linkers for each sample to limit the potential for sample cross-contamination during PCR. This additionally implies the use of unique linker primers for each sample during semi-nested PCR (described below). Unique linker strands and linker primers may be designed by scrambling the linker oligonucleotide sequences listed in Table 1 while maintaining similar overall %GC content and applicable overhang positions.
5. Amplify Viral LTR-Host Genomic DNA Junctions by Semi-nested PCR
Note: To ensure for optimal library diversity, at least 4-8 parallel PCRs, depending on the DNA concentration of the recovered ligation reaction, should be prepared for each sample for both PCR rounds. DNA template concentration should be quantified by spectrophotometry. In this protocol the first and second rounds of PCR employ nested LTR-specific primers, but the same linker-specific primer is used for both rounds (Table 1). The second round LTR-specific primer and the linker-specific primer encode adapter sequences for DNA clustering as well as sequencing primer-binding sites. The nested LTR-specific primer also encodes a 6 nt index sequence, which can be varied among different primers for multiplexing libraries within the same sequencing run.
6. Perform QC and NGS (Typically Completed by a Sequencing Facility)
7. Use a Customized Python or PERL Script to Parse Sequencing Data for LTR-containing Sequences, Crop away LTR and Linker Sequences, and Map to Reference Genome with BLAT
8. Create .bed Files Containing 15-Nt Intervals Surrounding Integrations, Convert These to FASTA Files, and Construct Sequence Logos to Display Base Preferences Surrounding Integration Sites
9. Create Central Base Pair .bed Files, Check for Sample Cross-Contamination, and Map the Distribution of Unique Integration Sites Relative to Pertinent Genomic Features
10. Statistically Compare Integration Site Distributions among Samples Using Two-tailed Fisher's Exact Test and Two-tailed Wilcoxon Rank Sum Test in R
Note: Use Fisher's exact test for comparing the proportion of integration sites within RefSeq genes or within a window of CpG islands or TSSs, but use the Wilcoxon rank sum test for comparing the distribution in gene density surrounding the integration sites. The R program is available at http://www.r-project.org/.
Two-tailed Fisher's exact test:
11. Examine Raw Sequencing Data for Evidence of Clonal Expansion of Cells Containing Integrated Viral DNA
Note: A small potential exists for more than one integration at the exact same nt in the reference genome. Alternatively, a single integration event may become redundantly present in sequencing data due to the use of PCR during library preparation and/or by cell duplication prior to DNA preparation. Recent analyses of genomic DNA from HIV-infected patients have distinguished these possibilities by identifying unique sonication shear points/linker attachment points (which can only arise prior to PCR) within DNA sequences containing identical integration sites 52-54. There is currently a debate as to whether proviruses harbored within clonally expanded cells contribute to the latent viral reservoir, and thus it is of particular interest to characterize their level of expansion when studying integration sites in human patients.
Table 4 lists the results of a representative experiment to illustrate the sensitivity of NGS for recovering integration sites from a culture of infected cells. Uninfected cellular DNA was utilized to serially dilute genomic DNA from an infection in which every cell on average contained one integration 40. Dilutions were prepared in steps of five to a maximal dilution of 1:15,625. Genomic DNA in the titration series was then fragmented by sonication or by digestion with restriction endonucleases MseI and BglII, followed by LM-PCR. The numbers of unique integration sites, as well as the number of sites mapping proximal to selected genomic annotations, were calculated according to the above protocol. Data analysis revealed dozens of unique integration sites (1-2% of the amount recovered from neat genomic DNA) recovered from libraries prepared from cells where in theory only one in 15,625 was infected.
When analyzing integration site datasets, it is critical to compare the data to a matched set of random genomic sites, which is called a matched random control or MRC. As the representative results sheared genomic DNA by restriction enzyme digestion or by sonication, two different MRC datasets were constructed. MRCenz contained 50,000 unique genomic sites generated by randomly selecting sites from hg19 in proximity to the sites of MseI and BglII restriction enzyme digestion, whereas MRCrandom harbored 10,000 sites generated without normalization for distance from set genomic markers. Only the sites that can be mapped back to a unique genomic location should be used in MRC datasets. As sonication shears genomic DNA essentially free from sequence bias, MRCrandom may be viewed as more applicable to datasets produced by fragmentation of DNA by sonication. An alternative style of control integration site dataset can be generated in vitro by reacting recombinant IN protein, intasome nucleoprotein complex 21, or PICs extracted from acutely infected cells 17 with deproteinized genomic DNA, and then following the LM-PCR and NGS protocols 21.
P values for comparison of the distribution of integration sites recovered by sonication versus restriction digest (comparison is between the neat samples), as well as for comparison to the MRCenz and MRCrandom, are displayed in Figure 2. The distribution of integration sites recovered following sonication was similar to those recovered by restriction enzyme digest for all annotations examined, with the greatest variance evident in terms of proximity to CpG islands. As expected 18,65 both datasets differed significantly from the MRCs in terms of integrations within RefSeq genes and gene density surrounding the average integration site, while both datasets were similar to the MRCs in terms of distribution relative to CpG islands and TSSs. Since relatively few HIV-1 integration sites map within 2.5 kb of a CpG island or TSS, increasing the total number of sites recovered is likely to decrease the variability that can arise between datasets (Table 4 and Figure 2). Sequence logos to confirm the authenticity of the integration site data are shown in Figure 3. The consensus HIV-1 integration site 14,22 (-3)TDG(G/V)TWA(C/B)CHA(+7) (written using International Union of Biochemistry base codes; the backslash indicates the position of vDNA plus-strand joining, and the underline indicates the 5-bp sequence duplicated following HIV-1 integration and DNA repair) is apparent for libraries prepared by both fragmentation techniques, although the degree of certainty decreases with increasing dilution of infected cell DNA. The random sites aligned from the MRC dataset by contrast failed to generate appreciable levels of base preferences.
Figure 1: Flow Chart Illustration of Integration Site Library Preparations. (A) Generate virus stocks by transfecting HEK293T cells, harvesting and filtering supernatant 48 hr later, concentrating by ultracentrifugation, and infecting target cells with appropriate concentration of virus. At least five days after infection, extract genomic DNA. Refer to Sections 1 and 2 of main text for additional experimental details. (B and C) Fragment purified genomic DNA by digestion with restriction enzymes or by sonication. The restriction enzyme cocktail should include an enzyme (e.g. BglII) that cleaves downstream from the upstream viral LTR to counter-select for LM-PCR amplification of internal vDNA sequences. Green asterisk and branched arrow in (C) denote that BglII should be applied after linker ligation. Red highlights viral sequence, while black highlights host cellular sequence. Implied DNA break points (not to scale) are marked by "X." HIV-1 contains numerous MseI and BglII sites; only those relevant to the protocol are shown. The brackets above the maps denote the U5-cellular DNA regions preferentially amplified by LM-PCR. (D) Purify fragmented DNA (then end-repair and A-tail in the case of sonication) and ligate to (E) compatible asymmetric linker molecules (colored blue). Magenta circles in (D) indicate the integration site that will be amplified. Asterisks at the 3' ends of the linker short strands denote amino blocking modifications. (F) Conduct first round of semi-nested PCR using first round LTR primer (red) and linker primer (blue). In this PCR round, the linker primer encodes for DNA clustering and NGS primer binding sequences (grouped as a green appendage to the blue linker primer), while the LTR primer lacks such sequences. (G) Purify first round PCR product and conduct second round of semi-nested PCR. In this round of PCR, use the same linker primer as in the first round (blue + green appendage), together with the second round LTR primer (red) that carries DNA clustering and NGS primer binding sequences as well as a barcode for multiplexing (grouped as a green appendage to the red LTR primer). (H) Purify second round PCR product as the final integration site library (boxed in magenta, with integration site marked by magenta circle). Submit aliquot to sequencing facility for QC and NGS. Please click here to view a larger version of this figure.
Figure 2: P Values for Comparison of Integration Sites Amplified Following DNA Fragmentation by Sonication or by Restriction Enzyme Digestion versus Respective MRCs. Numbers of integration sites within RefSeq genes and nearby CpG islands and TSSs, as well as regional gene density profiles, are listed in Table 4. P values ≥0.05 are highlighted in bold and italic text. aP values calculated by Fisher's exact test. bP values calculated by Wilcoxon rank sum test. cMRCenz: matched random control; a set of 50,000 unique integration sites was produced by randomly selecting positions in proximity to MseI/BglII restriction sites in hg build 19. dMRCrandom: matched random control containing 10,000 unique integration sites produced by randomly selecting positions in hg19 without normalization to restriction site proximity. Please click here to view a larger version of this figure.
Figure 3: Sequence Logos Depicting HIV-1 Base Preferences from Representative Experiment Libraries. Integration sites from libraries prepared by (A) digestion with restriction enzymes or (B) sonication were aligned using WebLogo software. Each dilution in the titration series is depicted, from neat DNA at the top of the figure to the maximum dilution of 1:15,625 at the bottom. (C) Sequence logo for the MRC of 50,000 unique genomic sites. Error bars essentially represent the standard deviation in base incorporation at any particular position. More specifically, the total height of each error bar is equivalent to twice the small sample correction 66, which controls for underestimation of entropy present in relatively small datasets. The x-axis represents host cell genomic DNA nt positions relative to the site of integration at point zero. Please click here to view a larger version of this figure.
Table 1: Oligonucleotide Sequences for Linker Construction and PCR Amplification. Linker-specific and second round LTR primers encode DNA clustering adapter sequences, which are color-coded as follows: black, bases complementary to the linker or to the HIV-1 LTR; red, unique index or barcode; green, sequencing primer binding sites; blue, adapter sequences for DNA clustering. Single-end (SE) sequencing reactions will utilize the sequencing primer that anneals to the second round LTR primer read1 (green) sequence, while paired-end (PE) reactions will use both (read1 and read2) sequencing primers. aLinker short strands contain 3' amino blocking modification. Please click here to view a larger version of this table.
Reagent | To Add per Reaction |
First Round LTR primer (15 µM): | 2.5 µl |
Linker-specific primer (15 µM): | 0.5 µl |
10x PCR buffer: | 2.5 µl |
dNTPs (2.5 mM each) | 0.5 µl |
DNA polymerase mix: | 0.5 µl |
Ligation reaction: | 100 ng |
Nuclease-free water: | up to 25 µl |
Table 2: Recipe for First Round PCR. The amount of each specified reagent to be added to each individual PCR tube is indicated.
Reagent | To Add per Reaction |
Second Round LTR primer (15 µM): | 2.5 µl |
Linker-specific primer (15 µM): | 0.5 µl |
10x PCR buffer: | 2.5 µl |
dNTPs (2.5 mM each) | 0.5 µl |
DNA polymerase mix: | 0.5 µl |
First round PCR: | 100 ng |
Nuclease-free water: | up to 25 µl |
Table 3: Second Round PCR Recipe. The amount of each reagent to be added to each PCR tube is indicated.
Library | #Unique Sites | %RefSeqa | %CpG +/- 2.5 kbb | %TSS +/- 2.5 kbc | Avg. Gene Density +/- 500 kbd |
Sonication, neat | 3,169 | 71.2 | 5.1 | 3.7 | 15.8 |
Sonication, 1:5 | 366 | 75.1 | 2.7 | 3 | 16.3 |
Sonication, 1:25 | 254 | 74 | 7.1 | 5.1 | 16.7 |
Sonication, 1:125 | 430 | 69.8 | 6.9 | 6 | 14.6 |
Sonication, 1:625 | 314 | 65.6 | 5.6 | 6.7 | 13.5 |
Sonication, 1:3,125 | 116 | 73.6 | 3.5 | 2.5 | 13.1 |
Sonication, 1:15,625 | 72 | 62.5 | 0 | 1.4 | 14.7 |
Digest, neat | 7,428 | 69.8 | 3.6 | 2.9 | 15.2 |
Digest, 1:5 | 1,460 | 71.4 | 4.4 | 3.4 | 14.9 |
Digest, 1:25 | 394 | 68.8 | 4.3 | 3.3 | 15.8 |
Digest, 1:125 | 172 | 71 | 0 | 3 | 14 |
Digest, 1:625 | 134 | 73.9 | 3.7 | 3.7 | 14.1 |
Digest, 1:3,125 | 100 | 83.1 | 6.4 | 5.2 | 19.1 |
Digest, 1:15,625 | 73 | 74 | 4.1 | 1.4 | 9.7 |
MRCenze | 50,000 | 44.7 | 4.2 | 4 | 8.7 |
MRCrandomf | 10,000 | 41.3 | 5.3 | 4.2 | 8.6 |
Table 4: Genomic Distribution of Integration Sites from Representative Titration Series. The percentage of total integration sites that fall within aRefSeq genes, bwithin 2.5 kb of CpG islands, and cwithin 2.5 kb of TSSs. dThe gene density within 1 Mb surrounding the average integration site. eMRCenz: matched random control; a set of 50,000 unique integration sites was produced by randomly selecting positions in proximity to MseI/BglII restriction sites in hg19. fMRCrandom: matched random control containing 10,000 unique integration sites produced by randomly selecting positions in hg19 without normalization to fixed positions.
A protocol for the analysis of retroviral integration sites, from the initial virus infection step through mapping of genomic distribution patterns, is described. This protocol is applicable to any retrovirus and any infectable cell type. Furthermore, the assay pipeline is quite sensitive, with the potential to recover a satisfactory number of unique integration sites from serial dilutions of genomic DNA equivalent to that of an infection initiated with an MOI of 6.4 x 10-5. This sensitivity makes the protocol especially useful when applied to samples from infected patients that may contain a low viral load, where only a small fraction of cells will harbor an integrated provirus. Consistent with prior methodology papers in this field 36,38,41-43, multiple steps in the bioinformatics portion of this protocol will benefit from the development of customized scripts for processing large files of sequence data. While BLAT 58 is the mapping utility described in this protocol, users may find Bowtie 67 (http://bowtie-bio.sourceforge.net/index.shtml) to be a suitable alternative.
An alternative bioinformatics pipeline was recently reported for determination of Moloney murine leukemia virus (MoMLV) integration sites 19. That pipeline is useful in that it was developed into standalone software that is publicly available, and is quite powerful in that it was originally used to map hundreds of thousands of unique MoMLV integration sites. However, the available software was originally designed to specifically re-analyze the reported MoMLV dataset, and so reprogramming would be necessary to customize the pipeline to alternate experimental designs (the functionality of the tool was recently expanded to include adeno-associated virus and Tol2 and Ac/Ds transposon vectors 68). Furthermore, that protocol described the generation of the preliminary integration site .bed file, but did not lay out specific steps necessary to map sites to pertinent genomic annotations. Readers may find the "Vector Integration Site Analysis" server 69, which was released during the review of the current manuscript, useful to analyze the NGS sequences generated using the protocol described here.
Certain points should be emphasized when using any protocol to analyze retroviral integration site datasets. When preparing multiple libraries in tandem, a significant potential exists for sample cross-contamination. Even a very small level of sample crosstalk can obscure results to the level of rendering a NGS run unusable. Therefore, all wet-bench work should be completed in a sterilized, dedicated laminar flow hood or PCR workstation. A set of pipettes and reagents such as nuclease-free water should be dedicated solely to integration site amplification. The use of unique linkers for each library preparation can limit the potential for cross-amplification and also allow for identification of crossover reads within each library in the raw FASTA files.
It is important to consider the pros and cons of using sonication versus restriction endonuclease digestion to fragment genomic DNA. On the one hand, sonication provides a relatively random distribution of shear points, but the subsequently required DNA repair and A-tailing steps consistently reduce the yield of linker ligation products as compared to ligations performed with restriction enzyme-generated sticky ends. On the other hand, restriction enzyme digestion provides a less-disbursed population of shear points, which will invariably introduce some bias in the recovered data. Utilizing a restriction endonuclease to discard upstream LTR sequences will in both cases (Figure 1) result in the loss of a small fraction of integration sites that lie upstream of that site in the genome. Any data bias that may result can be addressed by omitting the enzymatic digestion from the protocol during library preparation and filtering out the multitude of resulting upstream LTR sequences from the sequencing data.
Though the current protocol is quite sensitive and capable of generating millions of unique integration sites 21,40, only about one-third of all available integrations might be expected to be amplified in a given experiment even with the best of library preparations (ref. 70 and unpublished observations). This can cause complications when analyzing samples from low MOI infections or patients that harbor low viral load. This limitation can be overcome in part by repeatedly sequencing the same library preparation and/or sequencing multiple libraries derived from the same DNA sample in parallel. Future increases in assay sensitivity will accordingly be very beneficial to the furthering translational applications of retroviral integration site sequencing.
The authors have nothing to disclose.
We are grateful to our colleagues Stephen Hughes and Henry Levin for advice that was critical to establish the NGS protocol for retroviral integration site sequencing in the Engelman lab. This work was supported by US National Institutes of Health grants AI039394 and AI052014 (to A.N.E.) and AI060354 (Harvard University Center for AIDS Research).
DMEM | Gibco | 11965-084 | Standard cell culture medium, compatible with HEK293T cells |
Fetal Bovine Serum | Thermo Scientific | SH 30088.03 | Different lots of serum may need to be pre-screened for optimal viral production |
Penicillin/Streptomycin | Corning | 30-002-Cl | Antibiotics to be added to DMEM |
Phosphate-buffered saline | Mediatech | 21-040-CV | Used to wash cells |
Trypsin EDTA | Corning | 25-053-CI | Used to detach adherent cells from tissue culture plates |
PolyJet | SignaGen Laboratories | SL100688 | DNA transfection reagent |
0.45 µm Filters | Thermo Scientific | 09-740-35B | Used to filter virus particle-containing cell culture media |
Turbo DNase | Ambion | AM2239 | Used to degrade carryover plasmid DNA from virus stocks |
HIV-1 p24 Antigen Capture Assay | ABL Inc. | 5447 | Used to quantify yield of virus production |
DNeasy Blood & Tissue Kit | Qiagen | 69506 | Used to purify genomic DNA from cells |
Sonicator | Covaris | S2 | With this model of sonicator perform two rounds of duty cycle, 5%; intensity, 3; cycles per burst, 200; time, 80 sec |
Nuclease-Free Water | GeneMate | G-3250-125 | Commercially-available water is recommended to reduce the possibility of sample cross-contamination |
QIAQuick PCR Purification Kit | Qiagen | 28106 | Used to purify DNA during library construction |
End-It DNA End-Repair Kit | Epicentre | ER81050 | Used to repair DNA ends of sonicated DNA samples |
Klenow Fragment (3'-5' exo–) | New England Biolabs (NEB) | M0212S | Used with dATP to A-tail repaired DNA fragments |
dATP | Thermo Scientific | R0141 | Deoxyadenosine triphosphate |
MseI | NEB | R0525L | Restriction endonuclease for genomic DNA cleavage |
BglII | NEB | R0144L | Restriction endonuclease to suppress amplification of upstream HIV-1 U5 sequence |
T4 DNA Ligase | NEB | M0202L/6218 | Enzyme for covalent joining of compatible DNA ends |
DNA Oligonucleotides | Integrated DNA Technologies | custom | Have the company purify the oligos. HPLC purification suffices for DNAs <30 nucleotides; PAGE purify longer DNAs |
Advantage 2 Polymerase Mix | Clontech | 639202 | Commercial mix containing DNA polymerase for PCR |
dNTPs (100 mM solutions) | Thermo Scientific | R0181 | Dilute the four chemicals on ice with sterile water to reach the intermediate worrking concentrations of 2.5 mM each dNTP |
NanoDrop | Thermo Scientific | NanoDrop 2000 | Spectrophotometer for determination of DNA concentration |
Qubit Fluorimeter | Life Technologies | Qubit® 3.0 | Fluorometer used to confirm integration site library DNA concentration |
2200 TapeStation System | Agilent | G2964AA | Tape-based assay to confirm integration site library DNA size distribution |
MiSeq | Illumina | SY-410-1003 | Used for NGS |