Nucleosome fibre topology guides transcription factor binding to enhancers

Inclusion and ethics

All animal experiments for the iPS cell and iTS cell generation from MEFs were approved by the University of Edinburgh Animal Welfare and Ethical Review Body, performed at the University of Edinburgh, and carried out according to regulations specified by the Home Office and Project License. All reprogramming experiments have been approved by the University of Edinburgh SBS ethics committee (asoufi-0001). This research was performed in compliance with the joint ethics committee (IACUC) of the Hebrew University and Hadassah Medical Center and the National ethic committee (Israel health ministry) and NIH, which approved the study protocol for animal welfare. The Hebrew University is an AAALAC international accredited institute.

MEF isolation

Primary MEFs were generated from 129 and 129/C57BL/6 mouse embryos at E12.5–13.5 after removing internal organs and heads. The remaining body of each embryo was incubated in 200 µl of trypsin-EDTA (0.25%, Gibco) for 15 min at 37 °C. The trypsin was then inactivated by adding 800 µl MEF medium and the embryos were quickly dissociated with an 18 gauge needle fixed to 1 ml syringe. The embryo suspension was passed through the syringe several times (4 to 6) until becoming homogeneously cloudy and then transferred, drop-wise, to a 15 ml falcon tube containing 9 ml of warm MEF medium (GMEM (Sigma-Aldrich, G5154), 10% FCS, 1 mM sodium pyruvate, 1 mM l-glutamine, 1× non-essential amino acids (Thermo Fisher Scientific, 11140035)). The suspension was sedimented by gravity until forming a cell debris pellet. The majority of the supernatant (10 ml), containing single cells, was gently removed and plated onto a 10 cm dish containing warm MEF medium. The cells were monitored daily under a microscope and, if not confluent after 2 days, the cells were discarded. The confluent cells (passage 0) were collected by trypsin digestion and cryopreserved or used immediately.

Chimeric embryo

Blastocyst injections were performed using CB6F1 host embryos. After priming with PMSG (M.I.P. Veterinary) and hCG (Merck) hormones and mating with CB6F1 males, embryos were obtained at 3.5 days post-coitum (blastocyst stage), and then injected with 10–20 PB-integrated ES cells, tdTomato-marked ES cells or TMR-iPS cells with a flat tip microinjection pipette with an internal diameter of 16 mm (Origio) in a drop of FHM medium (Zenith Biotech, ZEHP-050) covered by mineral oil. Shortly after injection, blastocysts were transferred to 2.5 days post-coitum pseudopregnant CD1/ICR females (10–15 blastocysts per female). Chimeric embryos and placentas were isolated at E13.5 and observed under the fluorescence microscope (Nikon Eclipse T!). Gonads were excised from chimeric embryos at E13.5 and observed by fluorescence microscope (Nikon Eclipse T!).

Cell culture

MEFs were maintained in MEF medium (GMEM (Sigma-Aldrich, G5154), 10% FCS, 1 mM sodium pyruvate, 1 mM l-glutamine, 1× non-essential amino acids (Thermo Fisher Scientific, 11140035), 0.1 mM β-mercaptoethanol, penicillin (50 U ml⁻¹) and streptomycin (50 µg ml⁻¹)) at 37 °C and 5% CO₂. Human embryonic kidney 293T (HEK293T) cells (Lenti-X TAKARA, 632180) were maintained in HEK medium (GMEM, 10% FCS, 1 mM sodium pyruvate, 1 mM l-glutamine) at 37 °C and 5% CO₂. Mouse ES cells were grown on 0.2% gelatine and maintained in ES cell medium (GMEM, 10% FCS, 1 mM l-glutamine, 0.1 mM β-mercaptoethanol, 1× non-essential amino acids, 100 U ml⁻¹ leukaemia inhibitory factor (LIF)) at 37 °C and 5% CO₂. TS cells were maintained on γ-irradiated feeder MEFs on 0.2% gelatin in TS cell medium (RPMI-1640 (Thermo Fisher Scientific, 21875034), 20% FCS, 0.1 mM β-mercaptoethanol, 1× non-essential amino acids, penicillin–streptomycin (100 µg ml⁻¹), 25 ng ml⁻¹ hFGF4 (R&D, 235-F4-025), 1 µg ml⁻¹ heparin (Sigma-Aldrich, H3149)). TS cells cultured without feeders were maintained on Matrigel-coated plates (Corning) in TX medium (DMEM/F12 without HEPES and l-glutamine (Life Technologies), 64 mg l^–1 l-ascorbic acid-2-phosphate magnesium, 14 μg l^–1 sodium selenite, 19.4 mg l^–1 insulin, 543 mg l^–1 NaHCO₃, 10.7 mg l^–1 holo-transferrin (all Sigma-Aldrich), 2 mM l-glutamine, 1% penicillin and streptomycin), freshly supplemented with 25 ng ml⁻¹ hFGF4, 2 ng ml^–1 hTGF-ß1 (PeproTech) and 1 µg ml⁻¹ heparin (Sigma-Aldrich)⁴⁹. All ChIP–seq, ATAC–seq, RNA-seq, Micro-C and MNase–seq experiments were performed under feeder-free conditions.

Reprogramming to iPS cells and iTS cells

Reprogramming MEFs to iPS cells and iTS cells was performed as previously described^5,7. All infections were performed on MEFs (passage 0 or 1) that were seeded at 60–80% confluency 2 days before the first infection. For infection, replication-incompetent lentivirus expressing vectors encoding for reprogramming TFs and ratios (GETM, 3:3:3:1; TMR, 4.5:1:4.5; GETMR, 2:2.5:2:1:2.5; OSKM, 3:3:3:1; and BS₉G₄M, 3:3:3:1) were packaged with a lentiviral packaging mix (5.1 μg psPAX2 and 2.4 μg pMD2.G) in 10 cm dishes containing HEK293T cells and collected at 48 h after transfection. The supernatants were filtered through a 0.45 µm filter, supplemented with 8 µg ml⁻¹ of polybrene (Sigma-Aldrich), and then used to infect MEFs. Then, 24 h after the infection, the medium was replaced with fresh GMEM (Thermo Fisher Scientific) containing 10% FBS. To initiate reprogramming, 2 µg ml⁻¹ doxycycline was added to the culture medium (GMEM containing 10% FBS) for the first 48 h before switching into the relevant reprogramming medium. For iPS cell reprogramming, the medium was replaced to ES cell medium supplemented with LIF at a final concentration of 200 U ml⁻¹ and 2 µg ml⁻¹ doxycycline for a further 12 days before withdrawing doxycycline. For iTS cell reprogramming, medium was replaced to TS cell reprogramming medium and 2 µg ml⁻¹ doxycycline. For reprogramming to iTS cells or iPS cells with GETMR, reprogramming medium was replaced every other day for 20 days with doxycycline, followed by 10 days culture without doxycycline. The plates were monitored for primary iPS cell and iTS cell colonies. For iPS cell clone isolation, single-iPS-cell colonies were trypsinized (0.25%), and individually plated in separate wells in a six-well plate on feeder cells. The morphology of the isolated colonies was monitored under the microscope and medium was replaced every other day for five to ten passages, until stable iPS cell colonies were developed.

Early reprogramming

Owing to the large chromatin amounts required to carry out ChIP–seq in early reprogramming, large-scale concentrated lentiviruses encoding for each TF were generated. First, HEK293T cells were seeded at a density of 2 × 10⁶ cells per 15 cm plate and grown in 30 ml HEK medium for 24 h, before being transfected with the relevant lentivirus plasmids. Each virus was prepared in a separate dish. For transfection, 2.4 µg pMD.G, 5.1 µg psPAX2 and 7.5 µg of the corresponding FUW-tet-O-TF vector were dissolved in 1,710 µl Opti-MEM medium (Thermo Fisher Scientific, 31985062) and 90 µl Fugene 6 reagent (Promega, E2692), thoroughly mixed by vortexing and incubated for 15 min at room temperature, before adding to the 15 cm plate containing HEK293T cells, which were incubated for 16 h. The transfection medium was replaced with fresh HEK medium and the transfected cells were cultured for a further 60 h. The lentiviruses were collected by collecting the 30 ml supernatant, which was passed through a 0.45 µm polyethersulfone filter-fitted syringe and incubated for 16 h at 4 °C with 10 ml Lenti-X reagent (Clontech, 631232). The virus was then pelleted by centrifugation at 1,500g for 1 h at 4 °C. The supernatants were removed, and the viral pellet was dissolved in 200 µl GMEM overnight at 4 °C and then aliquoted and stored at −80 °C. On average, the titre of each virus was identified as around 7 × 10⁸ infection units per ml.

For early-reprogramming ChIP–seq analysis, 4.8 × 10⁶ MEFs (passage 1) were cultured in MEF medium on a 15 cm dish for 16 h. The next morning, the cells were infected by replacing the medium to MEF medium containing the Tet-ON OSKM, GETM, GETMR or BS₉G₄M lentiviruses at a multiplicity of infection (MOI) of 5 for each TF plus 5 MOI of rtTA2M2 lentivirus and 8 µg ml⁻¹ polybrene. After 24 h, the medium was changed to MEF medium without polybrene. The next day, the infected cells reached around 90% confluency and were split 1 in 2 and incubated for a further 16 h, before TF induction by adding 2 μg ml⁻¹ doxycycline to the medium and incubating for 48 h. The cells were then cross-linked to collect the chromatin (see the ‘ChIP-seq’ section).

ChIP-seq

Chromatin fragments were prepared from approximately 1.5 × 10⁷ cells per TF. For cell cross-linking, 3 ml of formaldehyde cross-linking buffer (50 mM HEPES-KOH, pH 7.5, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 11% formaldehyde) was added to 15 cm dishes (Corning, 430599) containing 30 ml medium and incubated at room temperature for 10 min with swirling every 2 min. Cross-linking was blocked by adding 1.65 ml 2.5 M glycine and incubating for 5 min with swirling at room temperature. Cells were collected in their medium using a silicon scraper (Thermo Fisher Scientific, 08100240) and centrifuged for 5 min at 1,350 rcf at 4 °C. The cross-linked pellet was washed three times with 10 ml ice cold PBS by resuspension and subsequent centrifugation for 5 min at 1,350 rcf at 4 °C. Five 15 cm dishes of ES cells or iTS cells and seven 15 cm dishes of infected MEFs were combined into single pellets for processing. The pellets were subsequently flash-frozen with liquid nitrogen and stored at −80 °C.

For efficient lysis, MEF samples were flash-frozen in liquid nitrogen and thawed in ice three times before thawing on ice for 1 h. Cell pellets were resuspended in 10 ml lysis buffer 1 (50 mM HEPES-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40 substitute (Sigma-Aldrich, 74385), 0.25% Triton X-100 and cOmplete Ultra Protease Inhibitor (Roche, 5892970001)) with rotation for 10 min at 4 °C. Nuclei were extracted by passing the cell lysates through a tight 7 ml Dounce homogeniser with 40 strokes on ice. Nuclei were collected by centrifugation at 1,350 rcf for 5 min at 4 °C. The nuclei were washed in 10 ml lysis buffer 2 (10 mM Tris-HCl pH 8, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA and cOmplete Ultra Protease Inhibitor) for 10 min with rotation at room temperature. The nuclei were then collected by centrifugation at 1,350 rcf for 5 min at 4 °C and resuspended in 5 ml lysis buffer 3 (10 mM Tris-HCl pH 8, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA 0.1% Na-deoxycholate, 0.5% N-lauroylsarcosine and cOmplete Ultra Protease Inhibitor).

The resuspended nuclei were split into five aliquots in prechilled 1 ml millitubes containing AFA Fibre (Covaris, 520130) and sonicated using the Covaris M220 focused ultrasonicator (Covaris) (peak power, 75 W; duty factor, 10; cycles per burst, 200; minimum temperature, 5 °C; set temperature, 7 °C; maximum temperature, 9 °C). The millitubes were each sonicated for 10 min intervals sequentially and kept on ice. Sonicated chromatin was transferred to Protein Lobind tubes (Eppendorf). Then, 100 µl of 10% Triton X-100 was added to each 1 ml sonicated chromatin to increase chromatin solubility. Chromatin samples were then centrifuged (20,000g at 4 °C for 10 min) and the supernatants transferred into fresh tubes. The optimum sonication time was determined by taking 50 µl aliquots in 10 min intervals and checking DNA fragment size distribution by agarose gel electrophoresis until predominantly generating a 150–350 bp band. Early reprogramming samples were sonicated for 60–70 min, ES cells samples for 30–40 min and iTS cells for 50–60 min. Another 50 µl aliquot from the final sonication was retained to be used as an input DNA control for ChIP analysis. The sonicated chromatin and the input DNA samples were snap-frozen in liquid nitrogen and stored at −80 °C.

For each ChIP replicate, 30 μl of Protein G Dynabeads (Thermo Fisher Scientific, 10004D) was washed three times in blocking solution (PBS, 0.5% (w/v) BSA). The beads were saturated with 10 μg antibody raised against the appropriate TF (Supplementary Table 2) diluted in 200 μl blocking solution by rotating for 6 h at 4 °C. The beads were then washed three more times in blocking solution. ChIP was performed by incubating the beads with 40 μg of chromatin (based on DNA content) on a rotator for 20 h at 4 °C. The beads were then transferred to a fresh prechilled tube, washed five times with RIPA wash buffer (50 mM HEPES-KOH pH 7.5, 500 mM LiCl, 1 mM EDTA, 1% NP-40 substitute, 0.7% Na-deoxycholate) and once with TE NaCl (10 mM Tris-HCl pH 8, 1 mM EDTA, 50 mM NaCl). Bound chromatin was eluted by resuspending the beads in 200 μl ChIP elution buffer (50 mM Tris-HCl pH 8, 10 mM EDTA, 1% SDS) and shaking at 65 °C for 30 min before transferring the supernatant to a fresh tube. Cross-linking was reversed by incubating for 16 h at 65 °C with shaking. The samples were diluted with 200 μl TE (10 mM Tris-HCl pH 8, 1 mM EDTA) and then incubated with 0.2 mg ml⁻¹ RNase A (Sigma-Aldrich, R4642) for 2 h at 37 °C. Proteins were then digested by incubating with 0.2 mg ml⁻¹ proteinase K (Ambion, AM2546) for 2 h at 55 °C. The DNA was then purified by phenol–chloroform extraction followed by ethanol precipitation. Precipitated DNA was eluted in 20 μl of 10 mM Tris-HCl pH 8.5 for library generation or qPCR analysis. ChIP reactions were quantified by Qubit 2.0 using the HS dsDNA quantification kit (Thermo Fisher Scientific, Q32854).

ChIP–seq DNA libraries were prepared using the NEBNext Ultra II Library Preparation Kit (NEB, E7645S) with dual-index primers (NEB, E7600S). For each TF, libraries were prepared using 5–20 ng ChIP DNA corresponding to a pool of at least three ChIP replicates. Input libraries were generated using 20 ng of sonicated DNA. Size selection (200 bp) was performed according to the manufacturer’s instructions. PCR amplification during library preparation was limited such that samples with 5–10 ng of ChIP DNA underwent 11 cycles of PCR amplification and samples with 10–20 ng of ChIP DNA underwent 10 cycles. Input libraries were generated using 20 ng of DNA starting material. PCR clean-up was performed with 45 µl Seramag Speeadbeads in 10% PEG-8000 solution. Libraries were quantified using a Qubit 2.0 device with a high-sensitivity dsDNA kit (Thermo Fisher Scientific, Q32854) and fragment size was determined using an Agilent 2200 Tapestation with D1000 HS reagents (Agilent, 5067-5584, 5067-5585). The samples were sequenced by Edinburgh Genomics on either an Illumina HiSeq 4000 using 75 bp paired-end settings or on an Illumina NovaSeq using 50 bp paired end settings.

RNA-seq

Total RNA was isolated using the Qiagen RNeasy kit. All mRNA libraries were prepared using the SENSE mRNA-Seq library prep kit V2 (Lexogen), and pooled libraries were sequenced on the Illumina NextSeq 500 platform to generate 75 bp single-end reads.

ATAC–seq

ATAC–seq library preparation was performed as previously described^5,7,50. In brief, 100,000 cells per replicate (two biological replicates per line) were incubated with 0.1% NP-40 to isolate nuclei. Nuclei were then transposed for 30 min at 37 °C with adaptor-loaded Nextera Tn5 (Illumina, Fc-121-1030). Transposed fragments were directly PCR amplified and sequenced on the Illumina NextSeq 500 platform to generate 2 × 36 bp paired-end reads.

For H1 OE and H1 KD, 400,000 cells per sample were incubated with 0.1% NP-40, 0.1% Tween-20 and 0.01% digitonin (Calbiochem, 300410) to isolate nuclei. Nuclei were then split into four replicates of 100,000 cells each for transposition for 30 min at 37 °C using the Illumina Tagment DNA Enzyme and Buffer small kit (20034210). Transposed fragments were directly PCR amplified and sequenced on the NovaSeq 600 system to generate 50 bp paired-end reads.

MNase–seq

MNase samples were prepared from approximately 1.5 × 10⁷ cells per digestion condition. For cross-linking, 1.1 ml of cross-linking buffer (Dulbecco’s PBS with 11% formaldehyde) was added to 10 ml medium and incubated at room temperature for 10 min with swirling on a 10 cm cell culture plate (Corning, 430167). Cross-linking was blocked by adding 0.55 ml 2.5 M glycine and incubating for 5 min with swirling at room temperature. The medium was aspirated from the cross-linked cells and the cells were washed twice with 10 ml ice cold SST (150 mM NaCl, 0.5 M trisodium citrate, 10 mM Tris-HCl pH 7.5). Cells were scraped into 5 ml ice cold RSB (10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl₂ and 10 mM sodium butyrate with cOmplete Ultra EDTA-Free Protease Inhibitor (Roche, 5892953001) supplemented with 0.5% NP40 substitute (Roche, 11332473001). Cells were then pelleted at 1,000 rpm for 3 min at 4 °C in a Rotina 380R centrifuge (Hettich) with a swinging-bucket rotor (Hettich, 1754). The supernatants were discarded, and cells were resuspended in 1 ml RSB with NP40 substitute and incubated for 1 min on ice. The cells were disrupted by passing through a tight 2 ml Dounce homogeniser with 20 strokes on ice. 4 ml RSB with NP40 substitute was added to the sample and the sample was centrifuged for 7 min at 4 °C at 1,400 rpm. The supernatants were discarded, and the nuclei were resuspended in 10 ml cold RSB with NP40 substitute. Nuclei were pelleted by centrifugation at 900 rpm for 10 min. The supernatants were discarded, and nuclei were resuspended in 600 µl cold RSB. A 2 µl aliquot was taken and mixed with 98 µl 1 N NaOH, and the optical density at 260 nm (OD₂₆₀) was measured using the Eppendorf BioPhotometer Kinetic system. The dilution-corrected OD₂₆₀ value of the nuclei was adjusted to 1 using RSB.

For a reaction of MNase, 5 ml of OD₂₆₀ = 1 nuclei was transferred to a 15 ml tube. One tube was processed at a time. Then, 150 µl 100 mM CaCl₂ was added to a final concentration of 3 mM, and the sample was incubated for 90 s in a 37 °C water bath. Micrococcal nuclease (Worthington Biochemicals, LS004797) was added to a final concentration of 0, 1, 4, 16 or 64 U ml⁻¹ and chromatin was digested for 2 min in a 37 °C water bath. To inactivate the MNase, 5.2 ml 2× room temperature TNESK (20 mM Tris-HCl pH 7.5, 200 mM NaCl, 2 mM EDTA, 2% SDS and 0.2 mg ml⁻¹ proteinase K) was then added and the sample was mixed vigorously. The sample was placed at 37 °C for at least 2 h and then placed at 65 °C overnight to reverse cross-linking. The samples were purified by phenol–chloroform extraction followed by ethanol precipitation. RNase A was then added to a concentration of 0.2 mg ml⁻¹ and the sample was incubated for 2 h at 37 °C. The DNA sample was subsequently purified by phenol–chloroform extraction and ethanol precipitation. Then, 7.5 µg of this sample was run on a 1.3% agarose gel to check the digestion pattern.

To purify digested samples, digested DNA was run on a 6% polyacrylamide TBE gel for 3 h and 30 min at 90 V on a 20 × 20 cm vertical electrophoresis system. The gel was post-stained with ethidium bromide and a band was excised corresponding to around 90 to 210 bp. This excised gel was broken up by centrifuging through a 0.5 ml tube that had been pierced with a needle into a 1.5 ml tube. Two gel volumes of diffusion buffer (500 mM ammonium acetate, 10 mM magnesium acetate, 1 mM EDTA, 0.1% SDS) were added and the sample was shaken overnight at 37 °C. The sample was then rotated for 2 h on a wheel at room temperature. The sample was then centrifuged for 10 min at 20,000g and the supernatant was transferred to new tube. This was centrifuged for a further 10 min at 20,000g to remove gel fragments and the supernatant was transferred to a new tube. The DNA was then purified by ethanol precipitation followed by further purification using the Monarch PCR and DNA cleanup kit (NEB, T1030). DNA was quantified using the Qubit 2.0 Flourometer (Thermo Fisher Scientific) using the HS dsDNA quantification kit (Thermo Fisher Scientific, Q32854).

MNase–seq libraries were prepared using 30 ng, 300 ng, 500 ng or 1 µg DNA for 1 U ml⁻¹, 4 U ml⁻¹, 16 U ml⁻¹ or 64 U ml⁻¹, respectively, using the NEBNext Ultra II Library Preparation Kit (NEB, E7645S) with dual-index primers (NEB, E7600S). The manufacturer’s instructions were followed for library preparation, apart from deviations in bead-based size-selection and PCR clean-up. A modified size-selection protocol was carried out before PCR cycling, the volumes of size-selection beads for 200 bp libraries were changed to 42 µl and 37.5 µl for the first and second size-selection bead additions. The size-selection beads used here were Seramag SpeedBeads carboxyl-coated particles (GE healthcare, GE65152105050250), prepared with a 1 in 50 dilution in a solution of 18% (w/v) PEG-8000 solution (10 mM Tris-HCl pH 8, 1 mM EDTA, 1 M NaCl, 0.05% Tween-20). This deviation from the manufacturer’s protocol was to avoid small-fragment loss. 1 U ml⁻¹ MNase samples underwent ten cycles of PCR amplification, and all of the other samples underwent seven cycles of PCR amplification. PCR clean-up was done with 45 µl of Seramag Speedbeads prepared in 18% PEG-8000. Library fragment size was determined using an Agilent 2200 Tapestation with D1000 HS reagents (Agilent, 5067-5584, 5067-5585). MNase–seq libraries were sequenced by Edinburgh Genomics on the Illumina NovaSeq platform using an SII flow cell with 50 bp paired-end settings to a depth of approximately 160 million reads pairs per library.

Micro-C

To prepare cross-linking samples for Micro-C, cells were grown to a confluency of approximately 80% on 15 cm cell culture dishes. Before starting, a 15 cm or 10 cm plate, which was prepared in parallel, was trypsinized and used to obtain approximate cell numbers by counting on a haemocytometer. Micro-C samples were then allowed to come to room temperature before the culture medium was aspirated and the samples were washed twice with 30 ml DPBS. For cross-linking, 3.3 ml of cross-linking buffer (DPBS with 11% formaldehyde) was added to the plate, containing 30 ml DPBS and incubated at room temperature for 10 min with swirling. Cross-linking was blocked by adding 1.65 ml 2.5 M glycine and incubating for 5 min with swirling at room temperature. The samples were then transferred onto ice and incubated for 15 min. Cells were then scraped on ice and transferred to 50 ml conical tubes. Cells were then pelleted at 1,000 rpm for 5 min at 4 °C in a Rotina 380R centrifuge (Hettich) with a swinging-bucket rotor (Hettich, 1754). Subsequently, pellets of the same type were combined in a single 15 ml conical tube and resuspended in 10 ml cold DPBS before being pelleted at 1,000 rpm for 5 min at 4 °C. Cell pellets were then resuspended at 4 million cells per ml in DPBS with 3 mM DSG and rotated for 40 min at room temperature (DSG stock was initially prepared by making up a 300 mM stock in DMSO and diluting into DPBS). DSG was quenched by adding glycine to a final concentration of 400 mM and incubating at room temperature for 5 min before transferring to ice for 15 min. Cells were then washed twice with DPBS 0.5% BSA and snap-frozen in pellets containing 5 million cells using liquid nitrogen before being stored at −80 °C. One cell pellet was used per Micro-C library.

To prepare MNase digestions, two cell pellets were resuspended in 600 µl PBS with 0.1 mg ml⁻¹ BSA (NEB, B9000S) and incubated on ice for 20 min. Cells were then collected by centrifugation by spinning at 5,000g for 5 min at 4 °C. The cell pellet was then washed with MB1 (10 mM Tris-HCl, pH 7.5, 50 mM NaCl, 5 mM MgCl₂, 1 mM CaCl₂, 0.2% NP-40, 1× Roche cOmplete EDTA-free (Roche Diagnostics, 04693132001)), collected by centrifugation (5,000g for 5 min at 4 °C) and then resuspended into 225 µl MB1 per 1 million cells (1,125 µl per sample). The pellet was then split into five 200 µl digestion aliquots (with 100 µl taken as a no-digestion control). To one set of five 200 µl aliquots, 15 U of MNase was added and, to the other five 200 μl aliquots, 20 U of MNase was added by adding 15 or 20 µl of 1 U µl⁻¹ MNase, respectively, and the samples were incubated for 10 min in 37 °C water bath. A 20 min digestion was used for the H1-OE MEF samples. To stop the digestion, 2 µl of 0.5 M EGTA was then added and the samples were incubated at 65 °C for 10 min to inactivate the MNase.

MNase digestion samples of the same MNase concentration were then recombined into a single tube, and 100 µl was taken as a no-ligation control. The remaining recombined samples were then split across two tubes and the cells were collected by centrifugation (5,000g for 5 min at 4 °C). Cells were then washed with 500 µl 1× NEB buffer 2.1 (NEB, B7202), pelleted by centrifuging at 5,000g for 5 min at 4 °C and were then resuspended in 45 µl 1× NEB buffer 2.1. Then, 5 μl rShrimp alkaline phosphatase (NEB, M0203) was added and the samples were incubated at 37 °C for 45 min to dephosphorylate DNA ends. The reaction was then stopped by incubating at 65 °C for 5 min. Next, 40 μl Klenow pre-mix buffer (5 μl 10× NEB buffer 2.1, 2 μl 100 mM ATP (Thermo Fisher Scientific, R0441), 3 μl 100 mM DTT, 30 μl water), 8 μl large Klenow fragment (NEB, M0210L) and 2 μl T4 PNK (NEB, M0201L) were added, in that order. 5′ DNA overhangs were then generated by incubating at 37 °C for 15 min. 5′ overhangs were then filled in with biotinylated nucleotides by adding 100 µl biotin pre-mix (10 μl 1 mM biotin-14-dATP (Jena Biosciences, NU-835-BIO14-L), 10 μl 1 mM biotin-14-dCTP (Jena Biosciences, NU-956-BIO14-L), 1 μl of 10 mM dGTP and 10 mM dTTP (NEB, N0446), 10 μl 10× T4 ligase buffer (NEB, B0202S), 0.5 μl 200× BSA (NEB, B9000S), 67.5 μl water) and incubating for 45 min at 25 °C. Then, 12 µl of 0.5 M EDTA was added and the sample was incubated for 20 min at 65 °C to stop the reaction.

The samples were pelleted at 10,000g for 5 min at 4 °C, the supernatant was removed and the pellet was resuspended in 500 µl 1× T4 ligase buffer with 50 mM NaCl. The samples were centrifuged at 10,000g for 5 min at 4 °C and the was supernatant removed. The samples were then resuspended in 500 µl ligation pre-mix (5 µl 200,000 U ml⁻¹ T4 ligase (NEB, M0202M), 1.25 µl 200× BSA, 50 µl 10× T4 ligase buffer, 443.75 µl water) and incubated for 2.5 h at room temperature. Next, 5 µl of 5 M NaCl was then added, the sample was centrifuged at 16,000g and 4 °C and the supernatant was discarded.

To remove biotin nucleotides from unligated DNA ends, pellets were resuspended in exonuclease mix (20 µl 10× NEB buffer 1 (NEB, B7001S), 170 µl water and 10 µl 100,000 U ml⁻¹ exonuclease III (NEB, M0206L)) and incubated at 37 °C for 15 min with agitation. Subsequently, 1.25 µl 20 mg ml⁻¹ RNase A, 10 µl 20 mg ml⁻¹ proteinase K and 25 µl 10% SDS were added. At this point, no-ligation control samples were also processed by diluting to 200 µl with 100 µl water and RNase A, proteinase K and 10% SDS were added as above. The samples were incubated at 65 °C overnight to lyse cells. The DNA was then purified by phenol–chloroform extraction followed by ethanol precipitation and eluted in 100 µl 10 mM Tris-HCl pH 8.5. A further round of DNA purification was carried out using the Zymo Research DNA Clean & Concentrator-5 kit (Zymo Research, D4013) and eluted in 6.5 µl of 10 mM Tris-HCl pH 8.5. The ligation efficiency was tested by comparing the no-ligation control and unligated samples on the Agilent 2200 Tapestation using HS D1000 reagents. At this point, individual replicates of ligation samples were pooled (that is, 2 replicates of 2.5 million cells generated by splitting the MNase digest across two tubes).

To purify dinucleosome-sized ligated fragments, a 1.5% gel prepared with either NuSieve GTG low-melting-point agarose (Lonza, 50081) or TopVision low-melting-point agarose (Thermo Fisher Scientific, R0801). TAE running buffer was prechilled to 4 °C and ligation samples were run at 60 V for 2.5 h on ice. A band was excised corresponding to around 250–400 bp. DNA was purified from this using the Zymoclean Gel DNA Recovery Kit (Zymo Research, D4001T) using 31 µl 10 mM Tris-HCl pH 8.5 as the elution buffer. The DNA concentration was determined using the Qubit 2.0 system and high-sensitivity dsDNA reagents.

To prepare Micro-C libraries, 2.5–10 µl Dynabeads MyOne Streptavidin C1 beads (Invitrogen, 65001) were prepared depending on the amount of DNA present in the Micro-C sample relative to the binding capacity of beads as specified by the manufacturer. These beads were washed once with 300 μl 1× TBW (5 mM Tris-HCl, pH 7.5, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20) and suspended in 150 μl 2× BB (10 mM Tris-HCl, pH 7.5, 1 mM EDTA, 2 M NaCl). Micro-C samples were diluted to 150 µl final volume by adding 120 µl nuclease-free water and then added to the bead suspension. The samples were incubated for 20 min at room temperature with agitation. The beads were washed twice with 300 µl 1× TBW by incubating at 55 °C for 5 min with agitation. The beads were then suspended in 35 µl 10 mM Tris-HCl pH 8.5, 3.5 µl end prep reaction buffer and 1.5 µl end prep enzyme mix (from the NEBNext Ultra II DNA library prep kit) was added. The samples were then incubated for 30 min at 20 °C with agitation and then for 30 min at 65 °C with agitation. Then, 0.5 µl of NEBNext Adapter for Illumina, 15 µl NEBNext ligation master mix and 0.5 µl NEBNext ligation enhancer were added and the samples were incubated for 30 min at 20 °C with agitation. Next, 1.5 µl NEBNext USER enzyme was then added, and the samples were incubated for 30 min at 37 °C with agitation. The beads were then washed once with 100 µl 1× TBW by incubating at 55 °C for 5 min with agitation. The beads were then washed once with 100 µl 10 mM Tris pH 7.5 and then suspended in 20 µl 10 mM Tris pH 7.5. Then, 2 µl of bead suspension was then taken as a test quantitative PCR (qPCR) reaction to find a suitable number of PCR cycles for library generation. The beads were then split into nine PCR tubes (to reduce the number of beads settling in individual PCR tubes during PCR cycling), and 10 µl NEBNext Ultra II Q5 Master Mix, 2 µl 10 µM NEBNext i5 primer, 2 µl 10 µM NEBNext i7 primer (NEB, E7600S) and 4 µl water were added. PCR was then performed according to the NEBNext Ultra II Library kit cycling conditions with 9 or 10 PCR cycles typically being used. The supernatants from each separate PCR reaction when then combined into a single tube for each library and DNA was purified using a 0.9× ratio of NEB sample-purification beads (NEB, E7103S). The library fragment size was determined using the Agilent 2200 Tapestation with D1000 HS reagents. Micro-C libraries were sequenced by Edinburgh Genomics on the Illumina NovaSeq platform using an SI or SP flow cell to a depth of approximately 1 billion read pairs per cell type using 50 bp paired-end settings.

Immunostaining

Cells were fixed in PBS containing 4% paraformaldehyde for 10 min at room temperature. Fixed cells were permeabilized with 0.1% Triton X-100 for 10 min at room temperature and blocked with 4% donkey or goat serum (Sigma-Aldrich) in PBS for at 60 min at room temperature, or overnight at 4 °C. Blocked cells were incubated overnight in blocking buffer (4% serum in PBS) containing an appropriate concentration of antibodies (Supplementary Table 2). Antibody-stained cells were washed three times with TBST (20 mM Tris-HCl, pH 7.4, 0.15 NaCl, 0.05% Tween-20) before being incubated with the appropriate secondary antibodies in blocking solution for 2 h at room temperature. Nuclei were stained with 3 mg ml⁻¹ 4,6-diamidino-2-phenylindole (DAPI) (Invitrogen, Thermo Fisher Scientific) for 10 min at room temperature. Fluorescence images were taken using the IRIS Digital Cell Imaging System (Logos Biosystems) and visualized using ImageJ⁵¹. Infection efficiency quantification was performed by counting TF-positive nuclei as the percentage of DAPI-positive nuclei across multiple images.

For CDX2-positive iTS cell colonies, cells were fixed in 4% paraformaldehyde in PBS for 20 min, rinsed three times with PBS, blocked for 1 h with PBS containing 0.1% Triton X-100 and 5% FBS, and incubated overnight in PBS containing 0.1% Triton X-100 and 1% FBS with anti-CDX2 (Biogenex, CDX2-88, 1:500). The cells were then washed three times with PBS, incubated in PBS containing 0.1% Triton X-100 and 1% FBS with the relevant (Alexa) secondary antibody (1:500 dilution) for 1 h. DAPI (1:1,000) was added for the last 10 min of incubation. The cells were washed three times with PBS and visualized under a fluorescence microscope (Nikon eclipse Ti).

Co-IP

Reprogrammed cells at 48 h were lysed with lysis buffer (100 mM Tris-HCl, 300 mM NaCl, 2% Triton X-100, 0.2% sodium deoxycholate, 10 mM CaCl₂) supplemented with EDTA-free protease inhibitor (Roche, 11873580001) for 20 min on ice. The lysate were then centrifuged for 20 min at 14,000 rpm to get rid of the cell debris, then the supernatant containing the proteins was precleared by adding Dynabeads (A and G mix) (Invitrogen, 10004D/10002D) and incubating at 4 °C for 1 h on a shaker. The precleared supernatant was then incubated overnight with pre-bound Dynabeads (A and G mix) using anti-TFAP2C (Abcam, ab110635), anti-ESRRB (Perseus Proteomics, PP-H6705-00), anti-EOMES (Abcam, ab3345) or anti-IgG (Santa Cruz, sc-2025, sc-2027). The samples were then washed twice with ice-cold lysis buffer, the Dynabeads with the protein complexes were resuspended with sample buffer and boiled for or 10 min at 100 °C and subjected to western blot analysis. Blots were probed with the following primary antibodies: anti-TFAP2C (Abcam, ab110635) and anti-MYC (Abcam, ab32072) and the appropriate IgG-HRP secondary antibody (1:10,000) and visualized using the ECL detection kit.

Western blotting

Whole-cell extracts were prepared from doxycycline-induced and uninduced MEFs using RIPA extraction buffer (25 mM Tris HCl pH 7.6, 150 mM NaCl, 1% Na-deoxycholate, 1% NP-40, 0.1% SDS) supplemented with cOmplete ultra protease inhibitor and Pierce phosphatase inhibitor cocktail (Thermo Fisher Scientific, A32957).

The protein concentrations of the lysates were quantified using the Pierce BCA Protein Assay Kit according to the manufacturer’s instructions (Thermo Fisher Scientific). Proteins resolved by SDS-polyacrylamide gel electrophoresis were electroblotted onto a PVDF membrane. Membranes were blocked overnight in PBST with milk (0.1% Tween-20, 10% non-fat dry milk overnight) at 4 °C with rocking. Membranes were washed three times for 5 min with PBST on a rocker at room temperature. The primary antibody incubations were performed for 4 h at room temperature diluted into PBST 5% BSA (Supplementary Table 2). Membranes were washed three times for 5 min with PBST on a rocker at room temperature. Secondary antibody incubations were carried out PBST 10% non-fat dry milk for 2 h at room temperature with rocking followed by three washes with PBST. Blots were visualized by using SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific) using Amersham Hyperfilm ECL (GE Healthcare) developed in Mi5 Processor (Jet X-Ray).

Histone proteins were isolated from MEF129 cells and TNG-MKOS-MEFs, after 72 h of doxycycline induction, and uninfected cell line control, or 144 h of H1.4 shRNA infection, and empty vector infected cell line control, by extraction with 0.2 N sulfuric acid, as previously described^52,53. In brief, cells were resuspended in a 0.3 M sucrose buffer and nuclei were obtained using a Dounce homogeniser. Nuclei were lysed using a high-salt buffer containing 0.35 M KCl, and then histones were dissolved using 0.2 N sulfuric acid, subsequently precipitated with ethanol and finally resuspended in nuclease-free water.

The protein concentrations of the acid extracted histones were quantified using the Pierce BCA Protein Assay Kit according to the manufacturer’s instructions (Thermo Fisher Scientific). Proteins resolved by SDS–polyacrylamide gel electrophoresis were electroblotted onto a PVDF membrane at 200 mA for 2.5 h. Membranes were blocked for 4 h in PBST with milk (0.1% Tween-20, 10% non-fat dry milk) at room temperature with rocking. Membranes were washed once with PBST on a rocker at room temperature. The primary antibodies against H1, and the H3 loading control were diluted into PBST 5% BSA (Supplementary Table 2) and incubated overnight at 4 °C. Membranes were washed six times for 5 min, and once for 10 min with PBST on a rocker at room temperature. Secondary antibody incubations were carried out in PBST 5% BSA for 1 h at room temperature with rocking followed by six washes for 5 min and one wash for 10 min with PBST. Blots were visualized by using SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific) and a BioRad ChemiDoc imager on the white tray using the chemiluminescent setting. A list of antibodies used in this study is provided in Supplementary Table 2.

EMSA

To prepare the cell lysates, MEFs (WT 129) were infected with doxycycline-inducible lentiviruses encoding the TF of interest and overexpressed in for 48 h by doxycycline treatment (see the lentivirus protocol above). In total, 10 million cells were collected for each preparation. Cells were then lysed in buffer A (10 mM HEPES pH 7.5, 1.5 mM MgCl₂, 10 mM KCl, 0.5 mM DTT) on ice for 10 min and dounced 40× (tight dounce). The cells were then pelleted and resuspended in 100 μl of buffer B (20 mM HEPES pH 7.5, 30% glycerol, 420 mM NaCl, 1.5 mM MgCl₂, 0.2 mM EDTA, 0.5 mM DTT) per each 10 million cells, and incubated for 30 min at 4 °C. After spinning, the supernatant was dialysed for 2 h at 4 °C in dialysis buffer (20 mM HEPES pH 7.5, 30% glycerol, 100 mM KCl, 0.83 mM EDTA pH 8, 1.66 mM DTT, 0.2 mM PMSF). The cell lysates were aliquoted and stored in −80 °C after flash freezing in liquid nitrogen until use for EMSA.

For EMSA, Cy5-end-labelled oligonucleotide duplexes (50 nM) were prepared as described previously¹². The Cy5-end-labelled oligonucleotide duplexes were mixed with the increasing cell lysates (0.5 μl to 4 μl) and non-specific competing poly(G-C) oligonucleotides (1 μg) in binding buffer (50 mM Tris HCl pH 7.5, 5 mM MgCl₂, 50 μM ZnCl₂, 50 mM KCl, 5 mM DTT, 25% (v/v) glycerol, 2.5 mg ml⁻¹ BSA) to a final volume of 10 μl and incubated in the dark at 21 °C for 1 h. For the EMSA-supershifts, 5 μg of antibody or 20× of non-labelled oligonucleotide competitor was mixed with TF–DNA mixture in binding buffer and incubated for 20 min at room temperature. The full volume was run on a 5% polyacrylamide gel at 90 V and 100 mA for 4 h in 0.5× TBE (45 mM Tris-borate, 1 mM EDTA) and imaged detecting Cy5 fluorescence using the Bio-Rad ChemiDoc MP (Bio-Rad).

Flow cytometry

For flow cytometry analysis, cells were first trypsinized and then neutralized with medium containing 10% fetal bovine serum (FBS). The cells were next centrifuged and washed twice with PBS to ensure the removal of any residual trypsin and medium. The washed cells were then resuspended in PBS for subsequent analysis.

The fluorescence markers eGFP and tdTomato were used to identify and quantify specific cell populations. Flow cytometry analysis was performed using the Beckman Coulter (Gallios) flow cytometer. Data acquisition and analysis were conducted using the Kaluza Software (v.1.0.14029.14028).

To remove dead cells, all of the samples were initially gated using the FSC-A/SSC-A gating to identify the live-cell population (below 200 FS Area). To remove cell doublets, single cells were selected by gating forward scatter height versus area. The positively fluorescent cells were gated based on the fluorescence intensity of positive control cells. Examples of the gating strategy for eGFP and tdTomato are shown in Supplementary Fig. 2.

DNA constructs

The plasmids constructed in this study are as follows:

The pFUW-TetO-hEsrrb plasmid was generated by PCR amplifying human ESRRB from pPB-PGK-hEsrrb (Addgene, 60434)⁵⁴ and inserting the amplified fragment into an EcoRI digested FUW-tet-O-hOct4 plasmid (Addgene, 20726) backbone using an IN-Fusion HD Cloning Plus kit according to the manufacturer’s instructions (Takara Clontech) and the following primers: 5′-GCCTCCGCGGCCCCGAATTCGCCACCATGTCCTCGGACGACA-3′; and 5′-ATAAGCTTGATATCGAATTCTTATTACATGGTGAGCCAGAGATGCTT-3′.

The H1.4 cDNA was generated synthetically by Twist Bioscience and inserted into the pET-28a(+) bacterial plasmid. H1.4 cDNA was then amplified by PCR using the pET28-H1.4 construct as a template and primers containing EcoRI site and Kozak fragment (forward) and XbaI restriction site (reverse) (5′-CCCCGAATTCGCCACCATGTCCGAGACTGCGCCT-3′ and 5′-TATCTCTAGACCTACTTTTTCTTGGCTGCCGCC-3′). The PCR product was digested with EcoR1 and XbaI and ligated into a linear FUW-TetO plasmid digested with the same enzymes.

The dual PiggyBac reporter (PB-TAP-InsX3-Nanog_enh-eGFP-Nanog_flip-tdTomato) plasmid was constructed according to the following steps:

(1)

The PB-TAP-InsX3-eGFP- ccdB plasmid was constructed by first removing the Tet-ON-CMV promoter and AttR1 from a PB-TAP-InsX2-Tet-ON-ccdB plasmid (provided by K. Kaji) using XhoI and NotI digestion. Then, eGFP-poly(A), chicken β-globin insulator and AttR1 PCR products were inserted in that order into the linear PB-TAP-InsX2-ccdB plasmid by Gibson assembly using the IN-Fusion cloning kit (Takara). The resulting construct was used to transform One Shot ccdB Survival 2 T1R chemically competent cells (Invitrogen). PB-TAP-InsX3-eGFP-ccdB was purified and validated by restriction enzyme digestion and Sanger sequencing. The eGFP-poly(A) gene was amplified from pConditional-pac-eGFP plasmid (K. Kaji laboratory), introducing XhoI into the forward primer and AatII into the reverse primer. Chicken β-globin insulator, and AttR1 were amplified from the PB-TAP-InsX2-Tet-ON-ccdB plasmid.
(2)

The PB-TAP-InsX3-Nanog_enh-eGFP-ccdB plasmid was then constructed by inserting the Nanog enhancer–signpost–promoter fragment (~5 kb) upstream of the eGFP gene. The Nanog enhancer–signpost–promoter was isolated from the pNanog_enh-Luc plasmid (I. Chambers laboratory)⁵⁵, by restriction enzyme digestion using SpeI and XhoI. The ligated construct was used to transform One Shot ccdB Survival 2 T1R chemically competent cells (Invitrogen). PB-TAP-InsX3-Nanog_enh-eGFP-ccdB was purified from selected clones using colony PCR. The correct construct was validated by restriction enzyme digestion and Sanger sequencing.
(3)

PENTR-Nanog_flip-part1 was constructed by inserting the Nanog enhancer, fliped-signpost-1 and fliped-signpost-2 PCR products in that order by Gibson assembly using the In-Fusion kit (Takara) into the Gateway pENTR 2B2 (Thermo Fisher Scientific). The pENTR vector was first linearized with KpnI and NotI to remove the ccdB gene insert. The IN-Fusion reaction was used to transform Stellar Competent Cells (Takara), and positive clones were selected by restriction digestion of mini-preps (Qiagen). pENTR-Nanog_flip-part1 with the correct insert was validated by Sanger sequencing.
(4)

PENTR-Nanog_flip-part2 was constructed by inserting the fliped-signpost-3, fliped-signpost-4 and Nanog promoter PCR products in that order by Gibson assembly using In-Fusion kit (Takara) into PENTR-Nanog_flip-part1 linearized with XhoI (downstream of the Nanog enhancer). The In-Fusion reaction was used to transform Stellar Competent Cells (Takara), and positive clones were selected by restriction digestion of mini-preps (Qiagen). pENTR-Nanog_flip-part2 with the correct insert was validated by Sanger sequencing.
(5)

PENTR-Nanog_flip-tdTomato was constructed by inserting the tdTomato PCR products into PENTR-Nanog_flip-part2 linearized with EcoRV (downstream of the Nanog enhancer–fliped_signpost-promoter element). The IN-Fusion reaction was used to transform Stellar Competent Cells (Takara), and positive clones were selected by colony PCR. pENTR-Nanog_flip-tdTomato with the correct insert was validated by EcoRI/EcoRV digestion and Sanger sequencing. The tdTomato gene was amplified from the pPyCAG-tdTomato-i-puro plasmid (K. Kaji laboratory).
(6)

Finally, the PB-TAP-InsX3-Nanog_enh-eGFP-Nanog_flip-tdTomato plasmid was constructed by Gateway technology (Invitrogen). Essentially, PENTR-Nanog_flip-tdTomato was used as the entry vector and PB-TAP-InsX3-Nanog_enh-eGFP-ccdB as the destination vector for the LR recombination using the LR Clonase II enzyme mix (Invitrogen, 11791-020). Successful insertion resulted in replacing the ccdB gene in the destination vector with Nanog_flip-tdTomato from the entry vector. The LR recombination reaction was used to transform One Shot Stbl3 chemically competent Escherichia coli (Thermo Fisher Scientific) and positive clones were selected by restriction digest of mini-preps (Qiagen). The final construct was validated by whole-plasmid sequencing using Oxford Nanopore technology (Source BioScience). As expected, the Nanog_enh-eGFP and Nanog_flip-tdTomato cassettes were separated by an insulator and flanked by two other insulators.

The rest of the plasmids were obtained from the following sources: the lentivirus plasmids FUW-TetO-hOct4 (Addgene, 20726), FUW-tet-O-hSox2 (Addgene, 20724), FUW-TetO-hKlf4 (Addgene, 20725), FUW-TetO-hMyc (Addgene, 20723) FUW-TetO-mSox9 and FUW-TetO-mGata4 (Addgene, 41084) were generated in the R. Jaenisch laboratory^56,57. FUW-TetO-Gata3, FUW-TetO-Tfap2c and FUW-TetO-Eomes were generated as described previously⁵. The pWPT-rtTA2M2 vector was generated in the K. Zaret laboratory⁵⁸. The pFUW-TetO-mBrn2 (Addgene, 27151) vector was generated by the Wernig laboratory²². A set of four hairpin shRNAs against the Hist1h1e gene (H1.4) in the pLKO.1 lentiviral vector were designed by The RNAi Consortium (TRC)⁵⁹, and obtained from Horizon/ Dharmacon. The empty pLKO.1 plasmid was obtained from Addgene (8453)⁶⁰. The pCMV-hyPBase was obtained from the Kaji laboratory⁶¹. A list of all of the DNA constructs used in this study and their sources is provided in Supplementary Table 3.

Integration of mES cells with piggyBac transposon vectors

Two days before nucleofection, a near-confluent (70–80%) mES cell culture was split at a 1:10 ratio. For each nucleofection, 2 × 10⁶ mES cells were prepared. For each nucleofection, one 15 ml falcon tube with 9.5 ml warm medium was prepared. After washing with PBS, mES cells were treated with 0.25% trypsin EDTA and incubated for 2–3 min at 37 °C. Trypsin was inhibited by adding serum–medium and mES cells were collected by centrifugation for 3 min at 300 rcf. The cell pellet was washed with PBS and centrifuged for 3 min at 300 rcf. In a 1.5 ml tube, 1 µg of pBase and 1 µg of the PB vector were mixed (high-quality plasmids with concentrations between 0.5–2 µg µl⁻¹ to keep volumes below 10 µl are required). The nucleofection mixture was prepared by adding 90 µl nucleofector solution and 20 µl supplement 1 to the plasmid mix (pBase and PB vector) using the Mouse Embryonic Stem Cell Nucleofector Kit (Lonza, VAPH-1001). The mES cell pellet (2 × 10⁶ cells) was resuspended quickly in nucleofection mix. The cell suspension was then transferred into a cuvette without introducing bubbles (bubbles will short the electric current and negatively affects cell viability). The cuvette was placed into the Nucleofector machine (Amaxa Biosystems) and pulsed with the program A-023. The cuvette was quickly brought to the tissue culture hood and 500 µl prewarmed media was added. The cell suspension was removed from cuvette using the Lonza Pasteur pipettes and transferred to the prepared 15 ml falcon tube with 9.5 ml warm medium. The cell suspension was plated onto a gelatine-coated 10 cm dish. The cells were incubated at 37 °C and culture medium was changed every 2 days. Green (eGFP) and red (tdTomato) fluorescence was checked under the microscope and cells were sorted by fluorescence-activate cell sorting as explained below.

Transgene silencing analysis

For transgene expression analysis, total RNA from the indicated samples was extracted using the Macherey-Nagel kit (Ornat). Between 500 and 2,000 ng of total RNA was reverse transcribed using the iScript cDNA Synthesis kit (Bio-Rad). qPCR analysis was performed on three biological replicates (n = 3), using 1/100 of the reverse transcription reaction in a StepOnePlus Real-Time PCR System (Applied Biosystems) with the SYBR Green Fast qPCR Mix (Applied Biosystems).

Specific primers were used to exclusively detect transgene expression. For the genes Gata3, Eomes, Tfap2c, Myc, Esrrb and Sox2, primers targeting the last exon (forward primer) and the WPRE element of the FUW-TetO vector (reverse primer) were used. For the genes Oct4 and Klf4, primers targeting the first exon (reverse primer) and the beginning of the viral vector (TetO, forward primer) were utilized. The amount of cDNA in each sample was normalized to the level of the housekeeping control gene Gapdh. A list of the primers used in this study is provided in Supplementary Table 4.

H1 overexpression (OE)

For infection, HEK293T cells were seeded at a density of 2.4 × 10⁶ cells per 15 cm plate and grown in 30 ml HEK medium for 24 h, before being transfected with rtTA2 or H1.4 lentivirus plasmids. Each virus was prepared in a separate dish. For transfection, 2.4 µg pMD.G, 5.1 µg psPAX2 and 7.5 µg of the corresponding plasmid were dissolved in 1,710 µl Opti-MEM medium (Thermo Fisher Scientific, 31985062) and 90 µl Fugene 6 reagent (Promega, E2692), thoroughly mixed by vortexing and incubated for 15 min at room temperature, then added to the 15 cm plate containing HEK293T cells, which were incubated for 16 h. The transfection medium was replaced with fresh HEK medium, and the transfected cells were cultured for a further 60 h. The lentiviruses were collected by collecting the 30 ml supernatant, which was passed through a 0.45 µm polyethersulfone filter-fitted syringe. The virus was then pelleted by ultracentrifugation in Ultraclear 38.5 ml centrifuge tubes at 25,000 rpm (77,000g), using the Beckman Coulter OptiMAXPN-80 ultracentrifuge and the SW32-Ti swinging-bucket rotor (Beckman Coulter) for 2.5 h at 4 °C. The supernatants were removed, and the viral pellets were dissolved in 300 µl GMEM by swirling and then aliquoted the same day and stored at −80 °C. On average, the titre of each virus was determined to be around 5 × 10⁷ infection units per ml.

MEF129 (passage 2) were seeded at 25,000 cells per cm² 24 h before infection, two 10 cm dishes were used (1.4 million cells per plate) for H1 infection for Omni-ATAC and western blotting, to confirm overexpression compared with uninfected cell line controls. Seven 15 cm dishes (3.6 million cells per plate) were used for Micro-C. The next morning, the medium was changed to MEF medium supplemented with 8 μg ml⁻¹ polybrene and pFUW-TetO-H1.4 and pWPT-rtTA2M2 viruses at a MOI of 2. Then, 24 h after infection, the medium on all plates was changed to fresh MEF medium. Next, 48 h after infection, the expression of H1.4 was induced by the addition of MEF medium containing doxycycline to a final concentration of 2 μg ml⁻¹ and the cells were incubated for 72 h. Next, 72 h after doxycycline induction, the infected and uninfected 10 cm plates were collected by trypsinization and counted using a haemocytometer. In total, 400,000 cells from the infected and uninfected samples were immediately subjected to the Omni ATAC protocol⁵⁰ (see the ‘ATAC–seq’ section (libraries and sequencing for H1 OE and KD experiments) below) while the remaining cells were acid extracted for HPLC quantification and western blot (see the ‘Western blotting’ section). The 15 cm plates were subjected to double cross-linking for Micro-C (see the ‘Micro-C’ section (MNase digestion and ligation)).

H1 knockdown (KD)

For infection, HEK293T cells were seeded at a density of 2 × 10⁶ cells per 15 cm plate and grown in 30 ml HEK medium for 24 h. Double the number of plates was seeded for each 15 cm plate of MEFs, due to two rounds of infection using viral supernatant (VSN), in total. Twenty two 15 cm plates of HEK cells were used for H1.4-targetting shRNA and two 15 cm plates of HEK cells were used for the empty vector control. For transfection, 2.4 µg pMD.G, 5.1 µg psPAX2 plus either a mixture of 1.875 μg each of the four H1.4- targeting shRNA plasmids (Horizon Discovery TRC-ID TRCN0000096935: TTTGGCCGCTTTAGGCTTTAC, TRCN0000096936: TTGACGGGTGTCTTCTCGGCG, TRCN0000096937: TCTTAGCCTTAGTTGCCTTTG, TRCN0000096938: TAGCTGCCTTAGGCTTGGAGG) together or 7.5 µg of the empty pLKO plasmid were dissolved in 1,710 µl Opti-MEM medium (Thermo Fisher Scientific, 31985062) and 90 µl Fugene 6 reagent (Promega, E2692). The shRNA and empty transfection mixes were thoroughly mixed by vortexing, and incubated for 15 min at room temperature, before adding to the 15 cm plates containing HEK293T cells. After 16 h of incubation with the transfection mixes, the medium was replaced with fresh HEK medium, and the transfected cells were cultured for a further 60 h. The lentiviruses were collected by collecting the 30 ml supernatant, which was passed through a 0.45 µm PVDF filter unit (Stericup Millipore) and supplemented with 8 μg ml⁻¹ polybrene. Half of the VSN was flash-frozen in liquid nitrogen and stored at −80 °C.

At 24 h before the virus was collected, 2.8 million of MEF129 cells (passage 2) were seeded into two 10 cm dishes (density, 25,000 per cm²) for the empty vector control to be used for Omni-ATAC and western blotting. Ten 15 cm dishes (3.6 million cells per plate) were infected with H1.4-targetting shRNAs for Micro-C, Omni-ATAC and western blotting. For infection, 25 ml of H1.4-targetting shRNA VSN was added per 15 cm plate, with an additional 15 ml of MEF medium, supplemented with 8 μg ml⁻¹ polybrene. A total of 8.5 ml of empty pLKO vector VSN was added per 10 cm plate, with an additional 5 ml of MEF medium, containing 8 μg ml⁻¹ polybrene. The remaining VSN, with 8 μg ml⁻¹ polybrene, was flash-frozen in liquid nitrogen and stored at −80 °C until a second round of infection after 72 h. At 24 h after infection, the medium was changed for MEF medium with 1 μg ml⁻¹ puromycin, to select for pLKO-vector-containing MEFs. Then, 72 h after the initial infection, puromycin selection was paused and a second round of infection was carried out as previously, using ice-thawed VSN and prewarmed at 37 °C. Then, 24 h later, the medium was changed for 1 μg ml⁻¹ puromycin-containing MEF medium. Next, 144 h after the initial infection, two of the 15 cm plates that were infected with H1.4-targetting shRNA VSN and both 10 cm plates infected with empty vector control VSN were collected by trypsinization and cells counted using a haemocytometer. About 400,000 cells from the infected and uninfected samples were immediately subjected to the Omni ATAC protocol (see the ‘ATAC–seq’ section (libraries and sequencing for H1 OE and KD experiments)), and the remaining cells were acid-extracted for HPLC quantification and western blotting (see the ‘Western blotting’ section). The remaining 15 cm plates, infected with H1.4 shRNA were subjected to double cross-linking for Micro-C (see the ‘Micro-C’ section (MNase digestion and ligation)).

OSKM reprogramming with H1.4 KD and H1.4 OE

H1.4 OE and KD was performed in TNG-MKOS-MEFs⁶² for ATAC–seq similarly to that for MEF129 WT cells (see the ‘H1 overexpression’ and ‘H1 KD’ sections). In brief, TNG-MKOS-MEFs (passage 3) were seeded at 27,000 cells per cm² 24 h before viral infection for H1.4 OE, (seven 10 cm plates for H1 overexpression and three 10 cm plates for uninfected control). Cells were infected with pFUW-TetO-H1.4 and pWPT-rtTA2M2 viruses at a MOI of 2. The medium was changed the next day and viral gene expression was achieved by administering doxycycline (2 μg ml⁻¹) 48 h after infection to infected and uninfected cells. ATAC was performed on samples at 0 h of induction and 72 h after induction, for the infected and uninfected samples. Western blotting was performed on histone extractions 72 h after doxycycline, to confirm successful overexpression, compared with the uninfected cell line control.

To achieve H1.4 KD, HEK293T cells were seeded, 72 h before MEFs, then, 24 h later, were transfected to make VSN of the H1.4 shRNA pool (see the ‘H1 KD’ section above). TNG-MKOS-MEFs (passage 3) were seeded at 21,500 cells per cm² on five 10 cm plates 24 h before infection. Three 10 cm plates were infected with 10 ml of H1.4 shRNA VSN, and two were infected with 10 ml of empty vector VSN; the remaining VSN was flash-frozen in liquid nitrogen and stored at −80 °C. All VSN was supplemented with 8 μg ml⁻¹ polybrene, and an additional 5 ml of fresh MEF medium with polybrene was added for all plates. The medium was changed for fresh MEF medium with 1 μg ml⁻¹ puromycin. Then, 48 h later, one plate infected with H1.4-shRNA-VSN and one plate infected with empty-vector-VSN were collected for the 0 h doxycycline timepoint of the ATAC experiment. The remaining plates were infected for a second time with VSN, as before, with the addition of doxycycline to a final concentration of 2 μg ml⁻¹ to induce the expression of MKOS. The medium was changed the next day to 2 μg ml⁻¹ doxycycline-containing MEF medium. Then, 72 h after beginning doxycycline induction, the remaining plates were collected for ATAC (see the ‘ATAC–seq’ section (libraries and sequencing for H1 OE and KD experiments)).

GETM reprogramming with H1.4 KD and H1.4 OE

To downregulate H1.4 expression in fibroblasts, four different shRNA sequences targeting H1.4 (PLKO.1 vector) were incorporated into replication-incompetent lentiviruses. Lentiviruses were packaged using a mix of lentiviral packaging vectors (psPAX2 and pGDM.2, ratio: 1:1) and the four shRNAs (ratio, 1:1:1:1) at a ratio of 1:1. The packaging was performed in HEK293T cells, and the VSNs were collected at 48 h after transfection. The supernatants were filtered through a 0.45 μm filter, supplemented with 8 μg ml⁻¹ polybrene (Sigma-Aldrich), and used to infect MEFs. Then, 24 h after infection, the medium was replaced with fresh DMEM containing 10% FBS.

Next, 4 days after infection, replication-incompetent lentiviruses containing GETM factors (ratio, 1:1:1:0.3) were similarly packaged and used to infect the H1.4-downregulated cells. Twenty-four hours after this second infection, the medium was replaced with fresh DMEM containing 10% FBS and 2 μg ml⁻¹ doxycycline. Two weeks later, the medium was switched to TS cell reprogramming medium (RPMI supplemented with 20% FBS, 0.1 mM β-mercaptoethanol, 2 mM l-glutamine, 25 ng ml⁻¹ human recombinant FGF4 (PeproTech), 1 μg ml⁻¹ heparin (Sigma-Aldrich) and 2 μg ml⁻¹ doxycycline). After 1 week, the medium was replaced with TX medium without doxycycline. Then, 1 week later, the plates were fixed and stained for CDX2 to identify positive colonies.

Similarly, to overexpress H1.4, MEFs were infected with lentiviruses encoding H1.4 using the pFUW-TetO-H1.4 plasmid. The lentiviruses were packaged in HEK293T cells as described above. To initiate iTS cell reprogramming, H1.4 overexpression was induced along with GETM using doxycycline (2 μg ml⁻¹) as described above.

H1 quantification by HPLC

Histone proteins were isolated by extraction with 0.2 N sulfuric acid, as previously described⁵². In brief, cells were resuspended in a 0.3 M sucrose buffer and nuclei were obtained using a Dounce homogeniser. Nuclei were lysed using a high-salt buffer containing 0.35 M KCl, and then histones were dissolved using 0.2 N sulfuric acid, subsequently precipitated with ethanol and finally resuspended in nuclease-free water. Acid-extracted histones were quantified using the Pierce BCA Protein Assay Kit according to the manufacturer’s instructions (Thermo Fisher Scientific). Acid-extracted histones were analysed by reversed-phase high-pressure LC using the Waters 2695 system equipped with the Vydac 218TP C18 HPLC column. The effluent was monitored, and peaks were recorded using the Waters 996 Photodiode Array Detector at 214 nm. H1 peak integrations were performed using the Waters Empower Pro software (v.2) and normalized to H2B peaks.

Mass spectroscopy (MS)

Acid extracts were reduced in 10 mM DTT, 0.02% NP-40 and 100 mM NH₄HCO₃ at 37 °C for 1 h. The samples were then alkylated with 30 mM IAA for 45 min at room temperature in the dark. The reactions were then desalted into 50 mM NH₄HCO₃ using ZebaSpin 7k columns (Thermo Fisher Scientific) and the eluates were supplemented with trypsin (0.1 mg ml⁻¹) and digested for 2 h at 37 °C. At the end of the 2 h, the samples were supplemented with additional trypsin and the digestions were allowed to proceed overnight. The digestions were quenched with 1% formic acid, dried in SpeedVac and then resuspended in 130 µl MS sample buffer (0.1% formic acid, 1% acetonitrile in water).

MS instrument settings

LC–MS analyses were performed on the TripleTOF 5600+ mass spectrometer (AB SCIEX) coupled with the M5 MicroLC system (AB SCIEX/Eksigent) and PAL3 autosampler. LC separation was performed in a trap-elute configuration, which consists of a trap column (LUNA C18(2), 100 Å, 5 μm, 20 × 0.3 mm cartridge, Phenomenex) and an analytical column (Kinetex 2.6 μm XB-C18, 100 Å, 50 × 0.3 mm microflow column, Phenomenex). The mobile phase (phase A) consisted of 0.1% formic acid in water, and phase B consisted of 0.1% formic acid in acetonitrile.

Peptides in MS sample buffer were injected into a 50 μl sample loop, trapped and cleaned on the trap column with 3% mobile phase B at a flow rate of 25 μl min⁻¹ for 4 min before being separated on the analytical column with a gradient elution at a flow rate of 5 μl min⁻¹. The gradient was set as follows: 0–24 min, 3% to 35% phase B; 24–27 min, 35% to 80% phase B; 27–32 min, 80% phase B; 32–33 min, 80% to 3% phase B; and 33–38 min at 3% phase B. An equal volume of each sample (30 μl) was injected four times, once for information-dependent acquisition (IDA), immediately followed by DIA/SWATH in triplicate. Acquisitions of distinct samples were separated by a blank injection (80 µl MS sample buffer) to prevent sample carryover. The mass spectrometer was operated in positive-ion mode with an EIS voltage at 5,200 V, source gas 1 at 30 psi, source gas 2 at 20 psi, curtain gas at 25 psi and the source temperature at 200 °C.

IDA and data analyses

IDA was performed to generate reference spectral libraries for SWATH data quantification. The IDA method was set up with a 250 ms TOF-MS scan from 300 to 1,250 Da, followed by MS/MS scans in a high-sensitivity mode from 100 to 1,500 Da of the top 25 precursor ions above the 100 cps threshold (100 ms accumulation time, 100 ppm mass tolerance, rolling collision energy and dynamic accumulation) for charge states (z) from +2 to +5. IDA files were searched using ProteinPilot (v.5.0.2, ABSciex) with the default setting for tryptic digest and IAA alkylation against a protein sequence database.

The Mus musculus proteome FASTA file (54,910 protein entries, UniProt: UP000000589) augmented with sequences for common contaminants was used as a reference for the search. Up to two missed cleavage sites were allowed. Mass tolerance for precursor and fragment ions was set to 100 ppm. A false-discovery rate (FDR) of 5% was used as the cut-off for peptide identification.

SWATH acquisitions and data analyses

For sequential window acquisition of all theoretical mass spectra (SWATH-MS) acquisitions⁶³, one 50 ms TOF-MS scan from 300 to 1,250 Da was performed, followed by MS/MS scans in a high-sensitivity mode from 100 to 1,500 Da (15 ms accumulation time, 100 ppm mass tolerance, +2 to +5 z, rolling collision energy) with a variable-width SWATH window⁶⁴. DIA data were quantified using PeakView (v.2.2.0.11391, ABSciex) with SWATH Acquisition MicroApp (v.2.0.1.2133, ABSciex) against selected spectral libraries generated in Protein-Pilot. Retention times for individual SWATH acquisitions were calibrated using 23 peptides for core histone H4c1 (UniProt: P62806), which was highly representative in the IDA ion library and all SWATH acquisitions. The following software settings were used: up to 25 peptides per protein, 6 transitions per peptide, 95% peptide confidence threshold, 5% FDR for peptides, XIC extraction window 10 min and XIC width 100 ppm. In all SWATH files, the quantification data for core and linker histone proteins were manually curated to exclude from consideration the peptides that exhibited an aberrant retention time in at least one SWATH acquisition (>20% difference from that in the IDA/ion library or other SWATH acquisitions). Protein peak areas were exported as Excel files and processed as described below.

Quantification of proteomics data

Quantification of individual H1 subtypes in MEFs was modelled using the combination of relative LC–MS determinations and absolute HPLC quantifications of known mouse embryonic stem cell standards as described previously⁶⁵.

Bioinformatics

Sequencing data processing and alignment

Initial quality-control analysis was performed using the FastQC toolkit (https://github.com/s-andrews/FastQC). ES cell H1 ChIP–seq reads and their associated inputs were trimmed to remove adapters and bases with a phred score of <30 using Cutadapt⁶⁶ (cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -q 30). ChIP–seq, ATAC–seq and MNase–seq samples were aligned to mouse reference genome MGSCv37 (mm9) using Bowtie2⁶⁷ v.2.3.4.1, using a –very-sensitive call and paired-end settings (or single-end settings where appropriate). Aligned reads were sorted and subsequently converted to BAM format using the samtools suite⁶⁸. RNA-seq samples were aligned using STAR (v.2.7)⁶⁹ with –outFilterMultimapNmax 1. Duplicated reads were eliminated using the Picard (https://github.com/broadinstitute/picard) function MarkDuplicates, except for MNase–seq and RNA-seq, for which duplicates were retained. Sequencing replicates were merged using samtools merge. The sequencing coverage and the insert size distribution were measured from the resulting BAM files using Qualimap (v.2.2.1)⁷⁰.

Micro-C libraries were aligned to the mm9 reference genome and processed using the Nextflow (https://www.nextflow.io/) pipeline distiller-nf (https://github.com/open2c/distiller-nf) using the following configurations; make_pairsam = False, drop_readid = False, parsing_options: ‘–add-columns mapq –walks-policy mask’, max_mismatch_bp = 1. Balanced multi-resolution cool (mcool) files were outputted with the following bin sizes: 10,000,000, 5,000,000, 2,500,000, 1,000,000, 500,000, 250,000, 100,000, 50,000, 25,000, 10,000, 5,000, 2,000, 1,000, 500, 100. 15U and 20U Micro-C libraries for each cell type were merged using pairtools merge (https://github.com/open2c/pairtools).

ChIP–seq peak calling

ChIP–seq narrowPeaks and summits showing significant enrichment over input DNA were called using MACS2 (v.2.1.1.20160309)⁷¹, and were controlled to a q-value (minimum FDR) cut-off of 0.01. To identify broadPeaks of TF binding, peaks were called using MACS2 with the following flags: -B –broad-cutoff 0.1 –broad –nomodel –extsize 200. Regions that overlapped with the ENCODE blacklist⁷² were removed using the bedtools⁷³ intersect function (flag –v).

MNase peak calling

To obtain a consensus list of nucleosome positions, the alignments for each MNase concentration were merged into a single BAM file using samtools merge. Nucleosome and nucleosome dyad positions were called using the DANPOS2⁷⁴ function dpos with a 1% FDR, paired-end settings and bin size of 1 bp to ensure dyad position accuracy. A file of nucleosome dyad positions was then generated by taking the summit position and adding 1 to create a bed file of 1 bp chromosome coordinates. The smoothened.wig file of MNase signal from DANPOS2 was converted to bigwig using wigToBigWig (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/) and used for heat maps and profiles of MNase signal.

Read density analysis

The aligned reads (BAM files) were normalized for sequencing coverage to 1× genome depth (RPGC) using the bamCoverage tool from DeepTools2⁷⁵ with a bin size of 10 bp and extendReads parameter, chromosome X was ignored for normalization. The resulting bigwig files were converted to wig format using the UCSC bigWigToWig tool⁷⁶, and subsequently converted into a bed file using the wig2bed⁷⁷. To sort peaks of individual TFs based on either ChIP–seq or ATAC–seq enrichment, 1 bp summits produced by MACS2 were extended by 150 bp on each side to produce a 301 bp peak using the bedtools slop function⁷³. The tag density under these peaks was then quantified using the bedmap function of BEDOPS⁷⁷, against the RPGC-normalized bed file of either the ChIP or ATAC samples. Peaks were then sorted from highest-to-lowest enrichment using the UNIX command line sort function.

For ATAC–seq-sorted peaks, peaks were split based on RPGC to open (>20 RPGC) or closed chromatin (<20 RPGC), representing the value whereby no ATAC enrichment is observed within the central 301 bp peak over the flanking 350 bp either side (total region of 1 kb). As there is no input DNA for ATAC–seq, we compared the enrichment of ATAC–seq within the peak to a 1 kb local region. By plotting ATAC–seq enrichment of TF sites as function of number of reads (sequence coverage normalised in RPGC or reads per genome coverage), we identified the baseline of 20 RPGC.

To generate the read density heat maps and line profiles, we first computed a density matrix using the DeepTools2 tool computeMatrix reference-point and the following parameters: –referencePoint center, –binsize 10, -b 1000 -a 1000, –sortregions keep, –missingDataAsZero and –averageTypeBins sum using the peak bed files as reference files (-R) and the normalized ChIP–seq and ATAC–seq bigwig files as score files (-S)⁷⁵. The ENCODE blacklist was excluded. The resulting matrix was subsequently used to generate heat maps and profiles using Deeptools2 functions plotHeatmap and plotProfile, respectively⁷⁵.

Histone H1 ChIP–seq data were processed as described above except that the Deeptools2 function bigwigCompare⁷⁵ was used to subtract the RPGC-normalized input signal from the RPGC-normalized H1 ChIP–seq signal. ChIP–seq data for H1c and H1d⁷⁸ were merged for analysing H1 in ES cells to obtain maximum coverage of H1-bound regions in ES cells.

Profiles of Micro-C contact junctions around TF sites were produced by generating a bed file containing 1 bp coordinates for each junction in a ‘.pairs’ contact file generated by the distiller-nf pipeline. This was then used to generate a genome coverage bedgraph using the bedtools⁷³ function genomecov before being subsequently converted to a bigwig file using bedGraphToBigWig (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/). This bigwig file was then used as a sample file with Deeptools2⁷⁵.

Genomic intervals

To assess peak overlaps between conditions (but not co-bound sites), all peaks were considered as 301 bp centred round the summit. This is because the average peak size was identified by MACS to be ~300 bp, and one nucleotide was added to place the summit in the middle. Overlapping peaks between conditions were identified using the Intervene venn function with the flag –save-overlaps⁷⁹, such that regions would be called as overlapping based on a 1 bp or greater overlap. Bar plots were generated by counting the number of peaks in each list. For comparison of MYC peaks within closed and open chromatin across all reprogramming systems, intersection over union or the Jaccard index was measured using the bedtools jaccard function and ggplot2 was used to generate the resulting heat map⁷³. Peaks were assigned to transcription start sites using the GREAT tool available online with mm9 association settings for ‘Single Nearest Gene’ with a maximum distance of 1,000 kb⁸⁰.

To quantify ATAC–seq on co-bound TF sites, 301 bp peaks for each TF were labelled with a single-letter identifier for each TF and combined into a single file using bedops –everything⁷⁷. Bedtools merge was used to collapse each overlapping peak with a –distinct settings used for the single letter label column to label each peak with the letter code for each TF present (that is, ‘OS’ for OCT4 and SOX2), awk was then used to count the number of TFs present by counting the number of letters. RPGC-normalized ATAC–seq data on these merged peaks were then quantified using bedmap —echo and the value was scaled by dividing by the peak width, to account for variability in peak size. These values were then used to generate violin plots using the ggplot2 functions geom_violin() and geom_boxplot().

To generate lists of TF sites distal or proximal to MYC, closed chromatin peaks of all TFs within a combination were combined using bedops –everything without merging overlapping peaks. The master peak list was used as an input in the bedtools window with MYC peaks from that condition as the –b file, and a –w flag of 350 to ensure detection of nearby MYC peaks. Proximal or distal sites were then obtained with the –u or –v flags, respectively.

Motif discovery

De novo motif analysis was performed using the MEME suite installed on a local Linux server⁸¹. First, the DNA sequences (FASTA) were generated from the central 200 bp of the ChIP–seq peak regions using bedtools getfasta⁷³. To use as the background, DNA sequences (200 bp) were extracted from genomic regions located 1 kb upstream from the summit of each peak using bedtools shift⁷³. All regions were filtered through the ENCODE blacklist. Finally, meme-chip was run using the Fasta sequence files and the corresponding Markov model and the following parameters: -nmeme 600, -meme-mod zoops, -meme-minw 6, -meme-maxw 18, -meme-maxsize 50000000, -dreme-e 0.00001, -dreme-m 20 using the JASPAR core motif database⁸². The most enriched de novo motifs discovered by MEME⁸³ and DREME⁸⁴ were analysed using CentriMO to confirm their central enrichment over the background sequences and compared to the canonical motifs using Tomtom.

Gene expression analysis

Gene expression quantification was performed using the featureCounts function of the R subRead package⁸⁵, using a gtf file containing the UCSC genes for mm9 with paired or single-end settings depending on the samples. Tables generated for the paired and single-end data were combined using cbind(). Differential gene expression analysis was performed using the package DESeq2 (v.1.22.2)⁸⁶ with DESeqDataSetFromMatrix() followed by DESeq2(). Genes with 0 counts in all of the conditions were excluded and the samples were normalized according to library size using sizeFactors(). Values then underwent regularized log-transformation with rlog() and counts were obtained using assay(). Pearson correlation analysis was performed using the top 500 most variable genes with cor() with method=c(“pearson”) followed by package pheatmap(). A PCA plot was generated using plotPCA() on the regularised log transformed matrix. Differentially expressed genes at 72 h in each of the early reprogramming systems were identified using the results() function in DESeq2 using a contrast versus MEFs, lfcThreshold=1, altHypothesis=“greaterAbs” and alpha = 0.05.

To perform upset analysis of DEGs, unique DEG gene IDs were combined into a dataframe in R. This was then used as the input for the function upset() from the package upsetR. DEGs targeted by MYC were identified by taking the gene IDs from the output of the GREAT analysis of ChIP peaks and finding matching gene IDs in the DEG lists with join(). These were combined into a data.frame and plotted with upset().

To analyse TF enrichment at differentially expressed genes, the gene IDs of differentially expressed genes were combined with a list of coordinates of transcription start sites for the mm9 genome using UCSC refGene TSS mm9 coordinates of seqMINER⁸⁷ with join(), and a bed file was generated. Approximately 4–5% of genes per set did not have matching gene IDs due to release differences in the annotation and were excluded. The Deeptools function plotProfile was used to plot TF enrichment as described for the ChIP/ATAC–seq analysis.

To define whether differentially expressed genes were targets of a specific TF, the nearest gene from each TF summit was obtained using GREAT. This gene list was compared with the list of differentially expressed genes using join() such that each gene appeared once in a final list of genes that are both TF targets and differentially expressed. Overlaps were identified using the package UpsetR⁸⁸.

MNase fragment-size maps

Fragment-size enrichment heat maps were drawn using plot2DO⁸⁹ with ChIP–seq peak summits or TSS as a reference and the aligned raw BAM files as the sample. Only fragment sizes between 50 and 250 bp were considered. Heat-map scales were scaled to the same value between open and closed chromatin to allow for direct comparison of fragment enrichment.

Identifying TF-bound nucleosomes

Bound nucleosomes were identified by selecting the closest 1 bp nucleosome dyads to ChIP–seq summits using the closest features function from bedops⁷⁷ (closest-features –delim ‘\t’ –dist –closest). Nucleosomes where the ChIP–seq summit was greater than 80 bp from the dyad were filtered out using awk and the remaining nucleosomes were labelled with a column containing a single letter identifier for that TF (that is, ‘O’ for OCT4). Co-bound mononucleosomes were identified by combining the lists of bound nucleosome dyads for each individual TF and merging using bedtools merge⁷³, such that the single-letter TF label column would contain multiple identifiers if the same dyad was present in each list of bound nucleosomes. The number of TFs present on each nucleosome was counted by using awk to count the number of characters in this column.

Motif position analysis on TF-bound nucleosomes

Bound nucleosome dyad positions were used to generate a 1 bp GRanges object⁹⁰. IRanges⁹⁰ was used to extend this object to 160 bp, representing our average nucleosome fragment size. Sequences were obtained for the positive strand using the BSgenomes function getSeq(). The position weight matrix (pwm) was obtained from the MEME-ChIP⁹¹ and used to scan each strand separately for nucleosome sequence using the seqPattern⁹² function motifScanHits() with 100% match score. To count motifs in the reverse orientation, the pwm was passed through Biostrings function reverseComplement() before scanning. The motif count for each strand was then assigned to the corresponding nucleosome dyad by counting the number of times that each sequence identifier appeared in the motifScanHits() output. Total motif counts were obtained by summing the values for the positive and negative strands.

To generate heat maps of motif density, nucleosome dyads were extended symmetrically by 500 bp in each direction using IRanges⁹⁰. An image matrix for each strand was generated using the function PatternHeatmap() of the R package heatmaps⁹³ with the pwm and minimum score between 80 and 95% depending on the motif lengths (shorter motifs used a higher match score)⁹². To generate a matrix for the reverse orientation of a the motif, the sequences on the positive DNA strand was queried using the pwm reverse complemented using the Biostrings function reverseComplement(). Kernel smoothing was applied to the matrix using smoothHeatmap(). To plot both strands together, the matrix produced for motif reverse complement was multiplied by −1. Positive and negative matrices were converted to data frames and then combined using rbindlist() from the package data.table (https://github.com/Rdatatable/data.table) by alternating lines according to the row number in the data frame such that, for every line of positive-strand scores on a sequence, the next line is corresponding scores for the motif reverse complement on that same sequence. The combined data frame was then converted back to a matrix using data.matrix(). Heat maps were then plotted using the R package heatmaps functions Heatmap() and plotHeatmapList().

To generate density plots of motif position around the dyad, the positive and negative strands were considered independently. Dyad positions were extended symmetrically by 200 bp using IRanges⁹⁰, and sequences were obtained using getSeq(). The seqPattern function plotMotifOccurrenceAverage() was used with the MEME pwm and its reverse complement. A smoothing window of 3 bp was used for plotting. To increase the resolution of motif identification around the nucleosome dyad, only perfect motif matches were considered.

Identifying TF-bound nucleosome arrays

To identify TF-bound nucleosome arrays, a column containing a single letter label was added to the broadpeak file for each TF. These broadpeaks were then combined into a single file using bedops –everything. Bedtools merge was used to collapse each overlapping broadpeak with a –distinct settings used to for the single-letter label column to label each peak with the letter code for each TF present (that is, ‘O’ for OCT4), with awk being used to count the number of TFs present by counting the number of letters. The RPGC-normalized ATAC–seq signal was then quantified on these broadpeaks using bedmap –echo –sum –delim ‘\t’, this value was then scaled by the broadpeak length in kb, and open and closed sites were separated using a read counts per kb cut-off value of 40. The positions of flanking nucleosome dyads were identified using bedops closest-features (with flags –delim ‘\t’ –dist –no-overlaps) and the array width was obtained by between by subtracting the first coordinate position of the upstream dyad from the downstream dyad using awk. This value was used for sorting on the basis of distance using the UNIX command line sort function. TF combinations were identified by selecting for different letter combinations using an awk equality. Oligonucleosomes were centred on their array midpoint by taking the coordinates of the upstream nucleosome dyad, and shifting them by half the array width using awk. Arrays were centred on the left or right edge by shifting to either the upstream or downstream dyad coordinate. Array width histograms were generated by passing the array width values to the geom_hist() function of ggplot2.

Motif analysis on TF-bound nucleosome arrays

Motif analysis for TF-bound nucleosome arrays was performed similarly to mononucleosomes. The seqPattern function plotMotifOccurrenceAverage() with the MEME pwm and its reverse complement were used to generate density plots. A smoothing window of 10 bp was used for plotting and percentage match cut-offs were set between 80 and 95% depending on the motif length. Motif heat maps were generated using the heat-map library as on mononucleosomes.

To identify motif occurrences within arrays, a GRanges object was object was created using the 1 bp array midpoint coordinates and adding metadata columns for the array width, half the array width, a left boundary of (5,000 − half array width) and a right boundary of (5,000 + half the array width). This object was then extended ±5 kb using promoters() and the sequences were obtained using getSeq(). This gives arrays a maximum array size for motif identification of 10 kb but prevents most sequences from extending off the chromosome boundaries. Arrays that were extended off the chromosome boundary were filtered using GenomicRanges:::get_out_of_bound_index() (which occurred for approximately 1 in every 15,000 arrays). This filtering was applied to the 10 kb extended sequences, the 1 bp midpoints and the left boundaries as applicable. An ID column was then generated for each array using seqalong() and added as metadata. To identify motifs occurring within an array (such as for SOX2) motifScanHits() was used to identify motif occurrences on each 10 kb sequence using a MEME pwm (with a 95% match score used for SOX2 motif). This produces a table of the motif positions in which each line contains two columns, sequence ID and the start position of a single motif on that sequence. A left_join() was used to match the motif table with the GRanges object of array midpoints by sequence ID. subset() was used to filter sequences for which the motif start position was outside the array edges (motif_position >= left_boundary and motif_position <= right_boundary). A frequency table was then made to count the occurrence of each sequence ID in the filtered motif table and these counts were appended to the Granges object of array midpoints to produce a column containing motif counts within each array. This process was repeated to add another column motif counts on the bottom strand by passing the motif pwm through reverseComplement(). Motif counts per kb were obtained by dividing the motif count within the array by the array width in kb (after setting the width values for any array >10 kb to 10 kb). This value was used to filter arrays based on motif counts per kb. Count histograms were generated by passing these motif counts per kb to the geom_hist() function of ggplot2. Motif heat maps and profiles were generated as described above for mononucleosomes, and the percentage matches for the motif pwm were typically set between 80 and 95% depending on the motif length and degeneracy.

All of the scripts with nucleosome array positions and motifs have been deposited at GitHub (https://git.ecdf.ed.ac.uk/soufi_lab/motif_nucleosome_arrays).

Micro-C pileup analysis

Micro-C pileup analysis was performed using the Coolpup.py package⁹⁴. To generate pile-up heat maps, a bed file containing TF-bound sites and a Micro-C mcool file were used to generate a matrix Micro-C contacts using the –local settings and –ignore_diags set to 0. For 20 kb padding windows around the TF site, 100 bp mcool bins were used and, for 400 kb padding windows, 2 kb mcool bins were used.

Micro-C loop calling and cis-interactions between ChIP–seq peaks

Statistically significant loops connecting TF-binding sites were called using the FitHiChIP pipeline⁹⁵ (https://ay-lab.github.io/FitHiChIP/html/index.html). The following settings were used in the configuration file: COOL= path to.mcool files from the Nextflow pipeline (see above), PeakFile=path to.broadpeak files output from MACS2, BINSIZE=1000, IntType=5, LowDistThr=1000, UppDistThr=2000000, QVALUE=0.01, UseP2PBackgrnd = 0, BiasType= 1 (coverage bias regression was used), MergeInt=1.

Micro-C arc plots and contact heat maps

Plots of Micro-C contacts at individual loci were generated using the cLoops2 package⁹⁶ (https://github.com/YaqiangCao/cLoops2). First, Micro-C.pairs files were preprocessed using cLoops2 pre –format pairs. Reasonable contact matrix resolution was estimated using the cLoop2 estRes function. cLoops2 plot was used to plot contact density on specified genomic coordinates with –m obs, -arch was specified to plot arch plots and –triu was specified to plot binned triangular contact matrices. A bin size of 20 kb was used for regions larger than 1 Mb, otherwise a bin size of 500 bp was used.

Circular genome tracks

Circular tracks of Micro-C contacts, MNase, ChIP–seq and ATAC–seq were prepared using the HOMER software package⁹⁷ in combination with Circos⁹⁸. First, duplicate-filtered ChIP–seq and ATAC–seq BAM files, and merged MNase BAM files were converted into HOMER tag directories using makeTagDirectory with the flag –keepAll. To prepare Micro-C samples, .pairs files were converted to the .hicsummary format by rearranging the columns as follows: egrep -v “(^#.*|^$)” filename.pairs | awk ‘BEGIN {OFS = “\t”} {print($1, $2, $3, $6, $4, $5, $7)}’ - > file.hicsummary. This was then converted a HOMER tag directory with makeTagDirectory with the flag –format HiCsummary. Tracks were produced with the analyzeHiC command and the following parameters: -res 5000 -superRes 10000 –circos cirOutput -nomatrix -minDist 20000 -pvalue 0.000000000000001. Track scales, line thickness and colours were edited in the cirOutput.config file and replotted using circos –conf.

Micro-C nucleosome orientation density profiles

To define nucleosome orientation from Micro-C ligation, a .pairs file was split into three files according to the orientation of read pairs using awk. First, intrachromosomal ligation events were isolated, then the reads were filtered to obtain junctions between 200 bp and 2 kb of one another. Inward (IN–IN) pairs were defined by matching read pairs with the read orientations +/− as follows: egrep -v “(^#.*|^$)” filename.pairs | awk ‘BEGIN {OFS = “\t”} {if ($2 == $4 & $5-$3 > = 200 & $5-$3 < = 2000 & $6 == “+” & $7 == “−”) print}’ -. Outward (OUT–OUT) pairs were defined by matching +/− read orientations as follows: egrep -v “(^#.*|^$)” filename.pairs | ‘BEGIN {OFS = “\t”} {if ($2 == $4 & $5-$3 > = 200 & $5-$3 < = 2000 & $6 == “−” & $7 == “+”) print}’ -. Tandem (IN–OUT or OUT–IN) pairs were identified by match pairs with +/+ or −/− orientations as follows: egrep -v “(^#.*|^$)” filename.pairs | awk ‘BEGIN {OFS = “\t”} {if ($2 == $4 & $5-$3 > = 200 & $5-$3 < = 2000 & (($6 ==“+” & $7 == “+”) || ($6 == “−” & $7 == “−”))) print}’ -. Tandem orientations are considered to be theoretically interchangeable and are not separated³⁰. The distances between ligation junctions in each pair were then determined and were plotted using the ggplot2 function geom_density().

Genome tracks visualizations

Genome track screen shots were generated with genome-coverage-normalized (RPGC) data using Integrative Genomics Viewer⁹⁹.

Protein structure visualization

Mononucleosome structures were built using Protein Data Bank (PDB) 5NL0 (ref. ¹⁰⁰), and visualized using the PyMOL Molecular Graphics System, v.3.0 Schrödinger.

The nucleosome arrays were modelled using PDBs 6IPU and 6HKT^36,101 and EMD-2601 (ref. ³⁷) in the open-source 3D computer graphics software, Blender¹⁰².

Structure prediction of MYC/MAX–TFAP2C complex was performed using AlphaFold-Multimer run in the COSMIC² portal using the amino acid sequences of the TF-DBDs only¹⁰³. The resulting complex was then aligned with the crystal structure of human TFAP2A in complex with DNA (PDB: 8J0K)¹⁰⁴, using the PyMOL align function. The electrostatic surface charge was calculated using the APBS plugin in PyMOL¹⁰⁵.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link