ATAC-seq is a DNA sequencing method that uses the hyperactive mutant transposase, Tn5, to map changes in chromatin accessibility mediated by transcription factors. ATAC-seq enables the discovery of the molecular mechanisms underlying phenotypic alterations in cancer cells. This protocol outlines optimization procedures for ATAC-seq in epithelial cell types, including cancer cells.
The assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) probes deoxyribonucleic acid (DNA) accessibility using the hyperactive Tn5 transposase. Tn5 cuts and ligates adapters for high-throughput sequencing within accessible chromatin regions. In eukaryotic cells, genomic DNA is packaged into chromatin, a complex of DNA, histones, and other proteins, which acts as a physical barrier to the transcriptional machinery. In response to extrinsic signals, transcription factors recruit chromatin remodeling complexes to enable access to the transcriptional machinery for gene activation. Therefore, identifying open chromatin regions is useful when monitoring enhancer and gene promoter activities during biological events such as cancer progression. Since this protocol is easy to use and has a low cell input requirement, ATAC-seq has been widely adopted to define open chromatin regions in various cell types, including cancer cells. For successful data acquisition, several parameters need to be considered when preparing ATAC-seq libraries. Among them, the choice of cell lysis buffer, the titration of the Tn5 enzyme, and the starting volume of cells are crucial for ATAC-seq library preparation in cancer cells. Optimization is essential for generating high-quality data. Here, we provide a detailed description of the ATAC-seq optimization methods for epithelial cell types.
Chromatin accessibility is a key requirement for the regulation of gene expression on a genome-wide scale1. Changes in chromatin accessibility are frequently associated with several disease states, including cancer2,3,4. Over the years, numerous techniques have been developed to enable researchers to probe the chromatin landscape by mapping regions of chromatin accessibility. Some of them include DNase-seq (DNase I hypersensitive sites sequencing)5, FAIRE-seq (formaldehyde-assisted isolation of regulatory elements)6, MAPit (methyltransferase accessibility protocol for individual templates)7, and the focus of this paper, ATAC-seq (assay for transposase-accessible chromatin)8. DNase-seq maps accessible regions by employing a key feature of DNase, namely the preferential digestion of naked DNA free from histones and other proteins such as transcription factors5. FAIRE-seq, similar to ChIP-seq, utilizes formaldehyde crosslinking and sonication, except no immunoprecipitation is involved, and the nucleosome-free regions are isolated by phenol-chloroform extraction6. The MAPit method uses a GC methyltransferase to probe chromatin structure at single-molecule resolution7. ATAC-seq relies on the hyperactive transposase, Tn58. The Tn5 transposase preferentially binds to open chromatin regions and inserts sequencing adapters into accessible regions. Tn5 operates through a DNA-mediated "cut and paste" mechanism, whereby the transposase preloaded with adapters binds to open chromatin sites, cuts DNA, and ligates the adapters8. Tn5 bound regions are recovered by PCR amplification using primers that anneal to these adapters. FAIRE-seq and DNase-seq require a large amount of starting material (~100,000 cells to 225,000 cells) and a separate library preparation step before sequencing9. On the other hand, the ATAC-seq protocol is relatively simple and requires a small number of cells (<50,000 cells)10. Unlike the FAIRE-seq and DNase-seq techniques, the sequencing library preparation of ATAC-seq is relatively easy, as the isolated DNA sample is already being tagged with the sequencing adapters by Tn5. Therefore, only the PCR amplification step with appropriate primers is needed to complete the library preparation, and the prior processing steps such as end-repair and adapter ligation need not be performed, thus saving time11. Secondly, ATAC-seq avoids the need for bisulfite conversion, cloning, and amplification with region-specific primers required for MAPit7. Due to these advantages, ATAC-seq has become a hugely popular method for defining open chromatin regions. Although the ATAC-seq method is simple, multiple steps require optimization to obtain high-quality and reproducible data. This manuscript discusses optimization procedures for standard ATAC-seq library preparation, especially highlighting three parameters: (1) lysis buffer composition, (2) Tn5 transposase concentration, and (3) cell number. In addition, this paper provides example data from the optimization conditions using both cancerous and non-cancerous adherent epithelial cells.
1. Preparations before beginning the experiment
2. Cell harvest
3. Cell lysis
4. Tn5 tagmentation
5. DNA purification
NOTE: Purification is required before amplification. DNA purification is done using the MinElute PCR purification kit (Table of Materials).
6. PCR amplification
NOTE: PCR amplification of transposed (tagmented) DNAs is necessary for sequencing. Nextera kit adapters (Table of Materials) were used in this example. The primers used in this study are listed in Table 1.
7. Beads purification
NOTE: Here, AMPure XP (Table of Materials) beads were used.
8. DNA concentration and quality check
9. Sequencing
10. Data analysis
To obtain successful and high-quality ATAC-seq data, it is important to optimize the experimental conditions. ATAC-seq library preparation can be separated into the five major steps (Figure 1), namely cell lysis, tagmentation (fragmentation and adapter insertion by Tn5), genomic DNA purification, PCR amplification, and data analysis. As an initial process, the cell lysis (nuclear isolation) buffer must be first optimized for each cell type. Either the hypotonic buffer described in the original ATAC-seq paper8 or the CSK buffer12,13 was used to lyse breast cancer cells. Trypan blue staining can be used to confirm nuclei isolation18. For ATAC-seq library preparation in human cancer cell lines such as MDA-MB-231, T47D, MCF7, and A375 cells, CSK buffer has been used by multiple groups12,13,19,20. CSK buffer was also used for ATAC-seq library preparation in other cell types such as embryonic stem cells21,22 and Drosophila S2 cells23. For the lysis buffer optimization, Trypan blue staining is useful to assess the cell lysis efficiency. When NMuMG, non-cancerous mouse mammary gland epithelial cells, were treated with the hypotonic buffer8 (see step 1) and CSK buffer, higher cell lysis efficiency was observed in CSK-treated cells (Figure 2). For frozen tissues, Omni-ATAC has been shown to improve ATAC-seq signals24. In the Omni-ATAC, a buffer containing 0.1% NP40, 0.1% Tween-20, and 0.01% Digitonin is used for cell lysis. Although the Digitonin-based buffer is known to decrease the fraction of mitochondrial DNA, the signal-to-noise ratios and TSS read counts from ATAC-seq with CSK buffer were shown to be better than those from Omni-ATAC in breast cancer cells12. Therefore, the rest of this manuscript is mainly focused on the results from ATAC-seq data with CSK buffer.
Following the determination of the best cell lysis buffer (nuclear isolation) conditions, it is important to optimize the ratios between input cell number and Tn5 transposase concentration12,25. Typically, the Tn5 concentration can be titrated by varying the volume of Tn5 from 1.25 µL, 2.5 µL, to 5 µL in a total volume of 25 µL reaction mixture. Alternatively, input cell numbers can be changed (e.g., from 10,000 to 100,000), while maintaining the Tn5 enzyme unchanged at 2.5 µL to optimize the Tn5 tagmentation reaction. To evaluate the quality of ATAC-seq libraries and the efficiency of Tn5-induced chromatin digestion, the library PCR products can be analyzed by agarose gel electrophoresis. Figure 3 shows examples of successful ATAC-seq libraries. Typically, PCR amplification of 8 or 9 cycles (total PCR cycles including initial PCR) results in 3-6 ng/µL ATAC-seq library concentration. As Tn5 preferentially "attacks" nucleosome-free regions at open chromatin (Figure 1), the nucleosome ladders should be obvious on the gel image (Figure 3). Ethidium bromide or SYBR Gold can be used to detect amplified DNAs on the agarose gel.
Next, qPCR analysis using ATAC-seq libraries can be used to briefly check the quality of the libraries and the fragment enrichment at open chromatin regions such as promoters of active genes (Figure 4). Primers can be designed to generate <80 bp amplicons using promoters of known active genes as positive controls and known "closed" (inactive) chromatin regions as negative controls (Figure 4A). Among the tested conditions, higher enrichment was observed in the T47D ATAC-seq libraries with CSK buffer (Figure 4B). As expected, we found more enrichment using primers K and L at the Estrogen Receptor alpha (ESR1) gene promoter region relative to a closed region upstream of ESR1 with primer C (Figure 4B)26. Figure 5 shows examples of ATAC-seq data with different Tn5 concentrations (25,000 cells were used). In this case, higher background noise was detected in the lowest (1.25 µL of Tn5) Tn5 condition (Figure 5, vertical bars). ATAC-seq data from 2.5 µL or 5 µL of Tn5 showed similar levels of ATAC-seq signals at open chromatin regions with lower background signals. Considering the potential over-digestion by Tn5 at the higher concentration, it can be concluded that 2.5 µL of Tn5 is suitable for this cell type.
We also performed the ATAC-seq using different cell numbers in NMuMG cells, frequently used to study the epithelial-mesenchymal transition (Figure 6). To optimize the starting volume, four different cell numbers (25,000, 50,000, 75,000 and 100,000) were used for the ATAC-seq library optimization. The cells were lysed with CSK buffer, and nuclei were incubated with 2.5 µL of Tn5 transposase in 25 µL of the reaction mixture. To explore the efficacy of Tn5 digestion and nucleosomal ladders, the size distribution of inserted DNAs was bioinformatically analyzed (Figure 6A). When lower cell numbers were used, nucleosome-free DNA fragments (<100 bp) were more enriched in the sequencing data. On the other hand, the fraction of mono-nucleosomal DNA fragments (175-225 bp) was less in the low cell input samples (25,000 cells and 50,000 cells) compared to higher cell input conditions (75,000 cells and 100,000 cells). Although differential DNA fragment patterns were observed between the samples, input cell number appears to have minimal impact on the enrichment of ATAC-seq signals at promoters and other open chromatin regions (Figure 6B). To further investigate the impact of input cell numbers on downstream analysis, ATAC-seq peaks were defined. The ATAC-seq peaks were determined by the PeaKDEck peak calling program27 using the following parameters: -sig 0.0001 -bin 300 -back 3,000 -npBack 2,500,000. From ~4.9 million uniquely mapped reads, 38,878, 43,832, 41,509, and 45,530 peaks were observed in ATAC-seq data from 25,000, 50,000, 75,000, and 100,000 cell inputs, respectively. The 100,000-cell input condition showed the highest number of peaks. Since the Transcription Start Site (TSS) enrichment score has been used to assess the quality of ATAC-seq data, the TSS score of each ATAC-seq condition was calculated using the GitHub program: Jiananlin/TSS_enrichment_score_calculation28. The TSS scores for the 25,000, 50,000, 75,000, and 100,000 cell input conditions were 9.03, 9,02, 8.82, and 8.28, respectively (the TSS enrichment score is considered ideal if it is greater than 728). This indicated a gradual decrease in the TSS enrichment score with increasing cell numbers. While the largest peak number was observed in the 100,000 cell input condition, the 25,000 cell input condition gave a better TSS score. Peak annotation analysis by Homer29 indicated that the majority of the increased peaks are categorized as promoter distal peaks (Figure 6C). Based on these results, it can be concluded that the higher cell number inputs can produce a higher number of peaks and more frequently detect non-promoter peaks compared to the lower cell number inputs in this condition.
Figure 1: Outline of ATAC-seq method. (A) ATAC-seq measures chromatin accessibility using a hyperactive Tn5 transposase preloaded with forward and reverse adapters. The various steps in the process include (B) cell lysis (including optimization of lysis buffer composition, titration of Tn5, and the number of cells used); (C) transposition (binding of Tn5 to open/accessible chromatin); (D) tagmentation of chromatin; (E) DNA fragment purification and amplification of libraries, and (F) sequencing of libraries and data analysis. Fragment length distribution analysis of ATAC-seq libraries typically shows an initial peak around ~50 bp (nucleosome-free regions) and another peak at ~200 bp (mono-nucleosome). Therefore, without computational or experimental size filtration, ATAC-seq signals contain both nucleosome-free and nucleosomal regions in open chromatin. Created with BioRender.com. Please click here to view a larger version of this figure.
Figure 2: Comparison of cell lysis efficiency using trypan blue. (A) NMuMG cells were treated with the hypotonic buffer8 and stained with trypan blue. (B) NMuMG cells were treated with CSK buffer and stained with trypan blue. The scale bars indicate 50 µm. Please click here to view a larger version of this figure.
Figure 3: Amplified DNA fragment analysis of ATAC-seq libraries. ATAC-seq libraries were prepared from MDA-MB-231 breast cancer cells. (A) After PCR amplification, PCR products were analyzed on a 0.5x TBE, 1.5% agarose gel before and after beads purification. Primers are mostly removed by the initial beads purification. DNA band patterns from (B) 1x TAE, 1% agarose gel electrophoresis and (C) an automated electrophoresis are also indicated. SYBR Gold was used to visualize the DNA fragments shown in A and B. Please click here to view a larger version of this figure.
Figure 4: Enrichment analysis of ATAC-seq libraries. (A) Genome browser track showing the ESR1 locus. The genome coverage of the ATAC-seq data in T47D cells20 is shown. Primers from a previous publication were used26, and their target regions are highlighted in yellow. (B) Bar graph depicting primer amplicon enrichment in ATAC-seq libraries. ATAC-seq libraries were prepared by the hypotonic buffer or CSK buffer. Different concentrations of Tn5 were also used to generate ATAC-seq libraries. 1 ng of each library was used to perform qPCR. Please click here to view a larger version of this figure.
Figure 5: Genome browser visualization for ATAC-seq optimization. Different amounts of Tn5 transposase were added during library preparation. The 1.25 µL of Tn5 condition shows higher background signals (highlighted in yellow), while the other two conditions look similar. Please click here to view a larger version of this figure.
Figure 6: DNA fragment size distribution of ATAC-seq libraries. (A) ATAC-seq libraries were prepared by using the indicated cell numbers. The ATAC-seq data from the 25,000 cell input condition showed the highest enrichment of nucleosome-free fragments, while the 100,000 cell input condition presented the highest mono-nucleosomal DNAs. (B) Genome browser tracks showing ATAC-seq data from different cell numbers. (C) Homer peak annotation analysis. Each peak was classified as a promoter-proximal or distal peak by Homer29. Please click here to view a larger version of this figure.
Name | Sequence |
ESR1 C Forward | TGGTGACTCATATTTGAACAAGCC |
ESR1 C Reverse | CTCCTCCGTTGAATGTGTCTCC |
ESR1 K Forward | TGTGGCTGGCTGCGTATG |
ESR1 K Reverse | TGTCTCTCTTTCTGTTTGATTCCC |
ESR1 L Forward | TGTGCCTGGAGTGATGTTTAAG |
ESR1 L Reverse | CATTACAAAGGTGCTGGAGGAC |
Table 1: List of primers used in this study.
Steps | Temperature | Time | Cycle |
Gap filling | 72 °C | 5 min | 1 |
Initial Denaturation | 98 °C | 30 s | 1 |
Denaturation | 98 °C | 10 s | 5 |
Annealing | 63 °C | 30 s | |
Extension | 72 °C | 1 min | |
Hold | 4 °C | forever |
Table 2: PCR amplification program part 1.
Steps | Temperature | Cycle |
Initial Denaturation | 98 °C 30 s | 1 |
Denaturation | 98 °C 10 s | 20 |
Annealing | 63 °C 30 s | |
Extension | 72 °C 1 min | |
Plate read |
Table 3: qPCR settings.
Steps | Temperature | Time | Cycle |
Initial Denaturation | 98 °C | 30 s | 1 |
Denaturation | 98 °C | 10 s | Total cycles calculated at step 6.4 |
Annealing | 63 °C | 30 s | |
Extension | 72 °C | 1 min | |
Hold | 4 °C | forever |
Table 4: PCR amplification program part 2.
ATAC-seq has been widely used for mapping open and active chromatin regions. Cancer cell progression is frequently driven by genetic alterations and epigenetic reprogramming, resulting in altered chromatin accessibility and gene expression. An example of this reprogramming is seen during the epithelial-to-mesenchymal transition (EMT) and its reverse process, mesenchymal-to-epithelial transition (MET), which are known to be key cellular reprogramming processes during tumor metastasis30. Another example is the acquisition of drug resistance against hormone therapies is often observed in luminal breast tumors31. ATAC-seq is useful to monitor such cell phenotypic alterations at the chromatin level and can be used to predict a cell of origin and cancer subtypes4.
To precisely measure chromatin remodeling by ATAC-seq, the establishment of a reproducible and consistent protocol is necessary. In this manuscript, a standard strategy for ATAC-seq library optimization is described. The ATAC-seq data from MDA-MB-231 and NMuMG cell lines are used as examples of optimization processes. NMuMG cells have been used to study EMT, and MDA-MB-231 cells have been used to study its reverse process, MET. These cellular models have been previously used to study epigenetic alterations during EMT and MET13,32. The use of an appropriate buffer for cell lysis and Tn5 concentration is the key component of successful ATAC-seq library preparation. In addition to these components, a high percentage (>90%) of cell viability, appropriate concentrations of detergent in the lysis buffer, and the preparation of the single-cell suspension are also very important. When Tn5 digestion efficiency is significantly different across samples, it is challenging to correct data variations. This is due to the lack of internal and external controls to quantitatively assess digestion efficiency. Cellular characteristics or properties might also be different between the cells from the control group versus the cells from the test group. Therefore, careful consideration of the experimental conditions, including the choice of lysis buffer, is necessary to obtain high-quality ATAC-seq data from both experimental groups (Figure 5 and Figure 6).
Besides open chromatin profiling, ATAC-seq has been used to identify transcription factor footprints and nucleosome positioning8,33. It is important to note that such information is mostly derived from open chromatin regions (Figure 1F). Therefore, results would be biased toward open chromatin regions. Nevertheless, ATAC-seq is a powerful tool for studying chromatin architecture. This revolutionary tool has more recently been adopted for single-cell genomics and continues to contribute to improving our understanding of gene regulation and chromatin biology.
DATA AVAILABILITY:
The data presented in this protocol are available at Gene Expression Omnibus under Accession Number GSE202791. The ATAC-seq data from GSE72141 and GSE99479 were also used in the analysis.
The authors have nothing to disclose.
We gratefully acknowledge the UND Genomics Core facility for outstanding technical assistance.
This work was funded by the National Institutes of Health [P20GM104360 to M.T., P20 GM104360 to A.D.] and start-up funds provided by the University of North Dakota School of Medicine and Health Sciences, Department of Biomedical Sciences [to M.T.].
1.5 mL microcentrifuge tubes | USA Scientific | 1615-5500 | Natural |
10 µL XL TipOne tips | USA Scientific | 1120-3810 | Filtered and low-retention |
100 µL XL TipOne RPT tips | USA Scientific | 1182-1830 | Filtered and low-retention |
100 µL XL TipOne tips | USA Scientific | 1120-1840 | Filtered and low-retention. Beveled Grade |
15 mL Conical Centrifuge Tubes | Corning | 352096 | |
20 µL TipOne RPT tips | USA Scientific | 1183-1810 | Filtered and low-retention |
200 µL TipOne RPT tips | USA Scientific | 1180-8810 | Filtered and low-retention |
50 mL Centrifuge Tubes | Fisherbrand | 06-443-19 | |
Agarose | ThermoFisher Scientific | YBP136010 | Genetic Analysis Grade |
All the cell lines used in this study are obtained from ATCC | ATCC | ||
Allegra X-30R Centrifuge | Beckman Coulter | 364658 | SX2415 |
AMPure XP beads | Beckman Coulter | A63881 | Bead purification kit |
CellDrop Cell Counter | DeNovix | CellDrop FL | Cell counter |
EDTA | MilliporeSigma | EDS | BioUltra, anhydrous, ≥99% (titration) |
EGTA | MilliporeSigma | E3889 | |
Ethanol 100% | ThermoFisher Scientific | AC615100020 | Anhydrous; Fisher Scientific – Decon Labs Sterilization Products |
Fetal Bovine Serum – TET Tested | R&D Systems | S10350 | Triple 0.1 µm filtered |
Gibco DMEM 1x | ThermoFisher Scientific | 11965092 | [+] 4.5 g/L D-glucose; [+] L-Glutamine; [-] Sodium pyruvate |
Gibco PBS 1x | ThermoFisher Scientific | 10010023 | pH 7.4 |
Gibco Trypsin-EDTA 1x | ThermoFisher Scientific | 25200056 | (0.25%), phenol red |
Glycerol | IBI Scientific | 56-81-5 | |
Glycine | MilliporeSigma | G8898 | |
HCl | MilliporeSigma | H1758 | |
HEPES | MilliporeSigma | H3375 | |
Invitrogen Qubit Fluorometer | ThermoFisher Scientific | Q32857 | |
MgCl2 | MilliporeSigma | M3634 | |
MinElute PCR Purification kit | Qiagen | 28004 | DNA purification kit |
NaCl | IBI Scientific | 7647-14-5 | |
NaOH | MilliporeSigma | S8045 | BioXtra, ≥98% (acidimetric), pellets (anhydrous) |
NEBNext High-Fidelity 2x PCR Master Mix | New England Biolabs | M0541 | |
Nextera DNA Sample Preparation Kit | Illumina | FC-121-1030 | 2x TD and Tn5 Transposase |
NP – 40 (IGEPAL CA-630) | MilliporeSigma | I8896 | for molecular biology |
PCR Detection System | BioRad | 1855484 | CFX384 Real-Time System. C1000 Touch Thermal Cycler |
PIPES | MilliporeSigma | P1851 | BioPerformance Certified, suitable for cell culture |
Qubit dsDNA HS Assay kit | ThermoFisher Scientific | Q32854 | Invitrogen; Nucleic acid quantitation kit |
Quibit Assay Tubes | ThermoFisher Scientific | Q32856 | Invitrogen |
SDS | MilliporeSigma | L3771 | |
Sodium Acetate | Homemade | – | pH 5.2 |
Sucrose | IBI Scientific | 57-50-1 | |
SYBR Gold | ThermoFisher Scientific | S11494 | |
SYBR Green Supermix, 1.25 mL | BioRad | 1708882 | |
T100 Thermal Cycler | BioRad | 1861096 | |
TempAssure 0.2 mL PCR 8-Tube Strips | USA Scientific | 1402-4700 | Flex-free, natural, polypropylene |
TempPlate 384-WELL PCR PLATE | USA Scientific | 1438-4700 | Single notch. Natural polypropylene |
Tris Base | MilliporeSigma | 648311 | ULTROL Grade |
Triton x-100 | IBI Scientific | 9002-93-1 | |
TrueSeq Dual Index Sequencing Primer Kit | Illumina | PE-121-1003 | paired-end |
Trypan Blue Stain | ThermoFisher Scientific | Q32851 | |
Tween-20 | MilliporeSigma | P7949 | BioXtra, viscous liquid |
Water | MilliporeSigma | W3500 | sterile-filtered, BioReagent, suitable for cell culture |