Described is a two-step labeling process using β-glucosyltransferase (β-GT) to transfer an azide-glucose to 5-hmC, followed by click chemistry to transfer a biotin linker for easy and density-independent enrichment. This efficient and specific labeling method enables enrichment of 5-hmC with extremely low background and high-throughput epigenomic mapping via next-generation sequencing.
5-methylcytosine (5-mC) constitutes ~2-8% of the total cytosines in human genomic DNA and impacts a broad range of biological functions, including gene expression, maintenance of genome integrity, parental imprinting, X-chromosome inactivation, regulation of development, aging, and cancer1. Recently, the presence of an oxidized 5-mC, 5-hydroxymethylcytosine (5-hmC), was discovered in mammalian cells, in particular in embryonic stem (ES) cells and neuronal cells2-4. 5-hmC is generated by oxidation of 5-mC catalyzed by TET family iron (II)/α-ketoglutarate-dependent dioxygenases2, 3. 5-hmC is proposed to be involved in the maintenance of embryonic stem (mES) cell, normal hematopoiesis and malignancies, and zygote development2, 5-10. To better understand the function of 5-hmC, a reliable and straightforward sequencing system is essential. Traditional bisulfite sequencing cannot distinguish 5-hmC from 5-mC11. To unravel the biology of 5-hmC, we have developed a highly efficient and selective chemical approach to label and capture 5-hmC, taking advantage of a bacteriophage enzyme that adds a glucose moiety to 5-hmC specifically12.
Here we describe a straightforward two-step procedure for selective chemical labeling of 5-hmC. In the first labeling step, 5-hmC in genomic DNA is labeled with a 6-azide-glucose catalyzed by β-GT, a glucosyltransferase from T4 bacteriophage, in a way that transfers the 6-azide-glucose to 5-hmC from the modified cofactor, UDP-6-N3-Glc (6-N3UDPG). In the second step, biotinylation, a disulfide biotin linker is attached to the azide group by click chemistry. Both steps are highly specific and efficient, leading to complete labeling regardless of the abundance of 5-hmC in genomic regions and giving extremely low background. Following biotinylation of 5-hmC, the 5-hmC-containing DNA fragments are then selectively captured using streptavidin beads in a density-independent manner. The resulting 5-hmC-enriched DNA fragments could be used for downstream analyses, including next-generation sequencing.
Our selective labeling and capture protocol confers high sensitivity, applicable to any source of genomic DNA with variable/diverse 5-hmC abundances. Although the main purpose of this protocol is its downstream application (i.e., next-generation sequencing to map out the 5-hmC distribution in genome), it is compatible with single-molecule, real-time SMRT (DNA) sequencing, which is capable of delivering single-base resolution sequencing of 5-hmC.
1. Genomic DNA Fragmentation
Fragment genomic DNA using sonication to a desired size range suited for the genome-wide sequencing platform. (We usually sonicate to ~300 bp.) Verify the size distribution of the fragmented genomic DNA on 1% agarose gel (Figure 1).
2. DNA Preparation
Determine the starting DNA amounts based on the abundance of 5-hmC in genomic DNA. Since 5-hmC levels vary significantly in different tissue types, starting DNA amounts depend on the 5-hmC levels of the samples. Please refer to Table 1 for examples.
3. β-GT Catalyzed Reaction (Glucose Transfer Reaction)
4. Biotinylation Reaction (Click Chemistry)
5. Capture of 5-hmC-containing DNA
6. Representative Results
If the quality of genomic DNA is high, typical recovery yields after the β-GT and biotinylation reactions are ~60-70%. However, the capture efficiency vary significantly with different tissue types depending on the 5-hmC levels of the samples. Typically, the capture efficiency for brain genomic DNA is ~4-9%, and in some extreme cases, the efficiency may reach up to 12%. For ES cells, the average capture efficiency is ~2-4%, in contrast to ~0.5% for neural stem cells. The lowest efficiency seen so far was for genomic DNA from cancer cells. All enriched DNA is ready for standard next-generation library preparation protocols. In addition, the captured DNA can also be used as template for real-time PCR to detect the enrichment of some fragments compared to the input DNA, if the related primers are available.
Figure 1. Sonicated human genomic DNA fragments in 1% agarose gel. 10 μg of genomic DNA isolated from human iPS cells in 120 μl of 1X TE buffer was sonicated using a sonication device (Covaris). After sonication, 2 μl of the sonicated DNA was loaded onto 1% agarose gel using 100 bp of DNA marker to compare the sizes of the sonicated DNA fragments.
Component | Volume | Final Concentration |
Water | _ μl | |
10 X β-GT Reaction Buffer | 2 μl | 1 X |
Up to 10 μg genomic DNA | _ μl | Up to 500 ng/μl |
UDP-6-N3-Glc (3 mM) | 0.67 μl | 100 μM |
β-GT (40 μM) | 1 μl | 2 μM |
Total volume | 20 μl |
i) For tissue genomic DNA (high 5-hmC content > 0.1%)
Component | Volume | Final Concentration |
Water | _ μl | |
10 X β-GT Reaction Buffer | 10 μl | 1 X |
Up to 20 μg genomic DNA | _ μl | Up to 500 ng/μl |
UDP-6-N3-Glc (3 mM) | 1.33 μl | 100 μM |
β-GT (40 μM) | 2 μl | 2 μM |
Total volume | 40 μl |
ii) For stem cell genomic DNA (median 5-hmC content ~0.05%)
Component | Volume | Final Concentration |
Water | _ μl | |
10 X β-GT Reaction Buffer | 10 μl | 1 X |
Up to 50 μg genomic DNA | _ μl | Up to 500 ng/μl |
UDP-6-N3-Glc (3 mM) | 3.33 μl | 100 μM |
β-GT (40 μM) | 5 μl | 2 μM |
Total volume | 100 μl |
iii) For cancer cell genomic DNA (low 5-hmC content ~0.01%)
Table 1. Examples of amounts of input DNA and labeling reactions using the samples with various 5-hmC levels by the selective chemical labeling method.
Sample | 5-hmC level | Starting DNA (μg) | Recovery after labeling (input to beads) (μg) | Recovery yield | Pull-down DNA (ng) | Pull-down yield |
Adult mouse cerebellum | 0.4% | 10 | 7.5 | 75% | 236 | 3.1% |
Postnatal day 7 mouse cerebellum | 0.1% | 11 | 9 | 82% | 140 | 1.6% |
Mouse ES cell E14 | 0.05% | 60 | 42 | 70% | 350 | 0.8% |
Table 2. Representative results from mouse brain tissues and ES cells.
5-hydroxymethylcytosine (5-hmC) is a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types. The method presented here is for determining the genome-wide distribution of 5-hmC. We use T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC. The azide group can be chemically modified with biotin for detection, affinity enrichment, and sequencing of 5-hmC-containing DNA fragments in mammalian genomes. This protocol has advantages over 5-hmC antibody-based hydroxymethylated DNA immunoprecipitation (hMeDIP-Seq)13-15, although the hMeDIP is simple, it has a strong bias towards high-density 5-hmC regions and usually gives inconsistent results from independent pull-downs. Our method would not introduce such bias.
There are, however, two possible concerns in terms of our labeling and capture method. The first concern could be the specificity of the labeling and capture. Since a recent report indicated that 5-hydroxymethyluracil (5-hmU) can also serve as a substrate of β-GT, false-positive signals might be introduced, leading to enriched fragments that are nonspecific to 5-hmC-relatedness16. This concern might be unfounded, though, since highly active 5-hydroxymethyluracil-DNA glycosylases are constantly removing this DNA damage, resulting in essentially undetectable 5-hmU in mammalian genome17,18. The second concern is the efficiency of the labeling and the capture. Since 5-hmC levels vary significantly in different tissues and cells, ranging from 0.5% in brain tissues to 0.05% in mES cells and 0.01% in cancer cells, it is essential that our methods be applicable to all these tissues in terms of the downstream applications of the labeled and captured genomic DNA. Our experience shows that the methods described here are indeed sensitive and specific enough to analyze all these tissues for the purpose of either quantification-based comparison of 5-hmC levels among the tissues or for downstream applications, such as deep sequencing to map the distribution of the 5-hmC in the genome19-21.
The key to our method is the use of an appropriate amount of starting genomic DNA. If the genomic DNA concentration is too low, the labelling efficiency and specificity will decrease accordingly. Based on our experience, even though the abundance of 5-hmC is high enough in DNA from brain tissues, if the concentration is lower than 25 ng/μl in 20 μl of reaction, the labelling and capture efficiency, as well as the specificity, will be significantly reduced. For the low-abundance DNA samples, in addition to the concentration requirement, the total amount of DNA is essential to get high and specific capture efficiency. Thus, although one needs only 25 ng of 5-hmC-enriched DNA fragments for the downstream standard library generation protocols employed by next-generation sequencing platforms, the starting amount of genomic DNA from brain tissues should not be less than 2 μg (and the ideal starting amount is 5-10 μg). Through trial and error, we have optimized the starting amount of genomic DNA from different tissues with variable 5-hmC abundances to get high labeling efficiency and specificity, as detailed in Table 2.
In summary, our method is qualified to precisely label genomic 5-hmC and specifically capture the 5-hmC-containing fragments, making it a successful protocol in terms of sensitivity and specificity for downstream next-generation sequencing assays.
The authors have nothing to disclose.
This study was supported in part by the National Institutes of Health (GM071440 to C.H. and NS051630/MH076090/MH078972 to PJ).
Name | Company | Catalog # | Comment |
Reagents | |||
5M Sodium chloride (NaCl) | Promega | V4221 | |
0.5M pH8.0 Ethylenediaminetetraacetic acid (EDTA) | Promega | V4231 | |
1M Trizma base (Tris) pH7.5 | Invitrogen | 15567-027) | |
HEPES 1M, pH7.4 | Invitrogen | 15630 | |
Magnesium chloride (MgCl2) 1M | Ambion | AM9530G | |
Dimethyl sulfoxide (DMSO) | Sigma | D8418 | |
Tween 20 | Fisher BioReagents | BP337-100 | |
DBCO-S-S-PEG3-Biotin conjugate | Click Chemistry Tools | A112P3 | |
1,4-Dithiothreitol, ultrapure (DTT) Superpure | Invitrogen | 15508-013 | |
QIAquick Nucleotide Removal Kit | Qiagen | 28304 | |
Micro Bio-Spin 6 Column | Bio-Rad | 732-6222 | |
Dynabeads MyOne | Invitrogen | 650-01 | |
Streptavidin C1 | |||
Qiagen MinElute PCR Purification Kit | Qiagen | 28004 | |
UltraPure Agarose | Invitrogen | 16500500 | |
UDP-6-N3-glucose | Active Motif | 55013 | |
Enzyme | |||
β-glucosyltransferase (β-GT) | New England Biolab | M0357 | |
Equipment | |||
Sonication device | Covaris | ||
Desktop centrifuge | |||
Water bath | Fisher Scientific | ||
Gel running apparatus | Bio-Rad | ||
NanoDrop1000 | Thermo Scientific | ||
Labquake Tube Shaker | Barnstead | ||
Labquake Tube Shaker | Thermolyne | ||
Magnetic Separation Stand | Promega | Z5342 | |
Qubit 2.0 Fluorometer | Invitrogen | ||
Reagent setup 10 X β-GT Reaction Buffer (500 mM HEPES pH 7.9, 250 mM MgCl2) 2 X Binding and washing (B&W) buffer (10 mM Tris pH 7.5, 1 mM EDTA, 2 M NaCl, 0.02% Tween 20). |