Symbolic recording of signalling and cis-regulatory element activity to DNA

Molecular cloning

Sequences of the 300 native CREs, 98 synthetic CREs (motifs), three signal-responsive elements and primers/oligos used in this paper are listed in Supplementary Tables 2, 3, 4 and 5, respectively. Three hundred native CREs were picked from ref. ³⁰ with a wide range of activities. The set of 98 synthetic CREs were generated in two steps; first, 841 vertebrate motifs in the JASPAR database were clustered by similarity^32,33 and then manually curated to a set of 98 mammalian motifs (6–20 bp), each representing a single transcription factor or transcription factor family. Second, to design the 98 synthetic CREs, six homotypical motif repeats separated by 4 bp spacers were embedded in an inactive DNA sequence³⁰. The TRE, consisting of seven modified tet operator sequences (tetO, 19 bp), was obtained from the Tet-On system (Takara). The sequence of the NF-κB response element was obtained by embedding six homotypic NF-κB motifs separated by 2 bp spacers into an inactive DNA sequence³⁰. The TCF-LEF response element was obtained from the TCF-LEF reporter (Promega)³⁵.

All PCR and digestion purifications were performed with AMPure XP beads (0.6× for plasmids and 1.2× for fragments of base pair size 200–300) using the manufacturer’s protocol unless otherwise specified. All ligation reactions used Quick ligase (NEB) with a vector:insert molar ratio of 1:6 unless otherwise specified. All Gibson reactions used NEBuilder (NEB) with a vector:insert molar ratio of 1:6 unless otherwise specified. All plasmid DNA for mammalian cell experiments was prepared using the ZymoPURE II Plasmid Kit.

The pegRNA-5N recorders were cloned in two steps. First, a gene fragment containing CTT pegRNA (Addgene, 132778) was PCR amplified using primer sets, adding a 5 bp degenerate barcode and flanking BsmBI site for use in downstream cloning steps. A carrier plasmid containing two BsmBI sites and two csy4 hairpins was ordered from Twist Bioscience. The carrier plasmid and PCR product from the previous step were digested with BsmBI (NEB, buffer 3.1) at 55 °C for 1 h and then purified for ligation. The complete pegRNA with 5N degenerate barcode and csy4 hairpins was PCR amplified from the ligation product. The ENGRAM plasmid and PCR product from the previous step were digested with BsmBI (NEB) at 55 °C for 1 h. Ligation products were purified and resuspended with 5 µl of water for electroporation, which was performed using NEB 10-beta Electrocompetent E. coli (C3020) with the manufacturer’s protocol. Transformed cells were cultured at 30 °C overnight.

The libraries of 300 CREs, 98 synthetic CREs and plasmids bearing signal-responsive elements were cloned in two steps. First, a library of DNA oligonucleotides containing CREs, two BsmBI restriction sites, DNA insertion barcode, the 3′ end of pegRNA and the csy4 hairpin were ordered as oPools from IDT. The 5′ ENGRAM recorder was digested with Xbal and Ncol (NEB) at 37 °C for 1 h and purified. DNA oligonucleotides were first amplified with primers to add Gibson overhangs and then cloned into the 5′ ENGRAM recorder using Gibson assembly. Second, a gene fragment containing minP, csy4 hairpin, HEK3 spacer sequence and pegRNA backbone flanked with two BsmBI sites was ordered as a gBlock from IDT. The gBlock and plasmid constructed from the first step containing BsmBI restriction sites were digested with BsmBI (NEB, buffer 3.1) at 55 °C for 1 h to generate compatible sticky ends, and were purified for ligation. Ligation products were transformed into Stable Competent E.coli (NEB, C3040). Transformed cells were cultured at 30 °C overnight.

The synHEK3-TAPE construct was cloned in two steps. First, piggyBac-CMV-MCS-EF1α-Puro plasmid was digested with BsiWI and SphI to remove core insulators and selection markers from piggyBac transposon long terminal repeats. A gBlock (IDT), consisting of a flanking sequence (part of green fluorescent protein (GFP)) and two divergent BsmBI restriction sites, was cloned to the BsiWI- and SphI-digested piggyBac plasmid using Gibson assembly (NEBuilder, NEB) to create a shuttle vector. Second, a 87 bp region around the HEK3 locus was synthesized (IDT) and amplified with a pair of primers to introduce the T7 promoter and a 16 bp barcode to the 5′ and 3′ end, respectively. The resulting PCR product was purified and cloned into the construct from step 1 (digested with BsmBI) using Gibson assembly. Assembled products were purified and resuspended in 5 µl of water for electroporation, which was performed using NEB 10-beta Electrocompetent E. coli (C3020) following the manufacturer’s protocol. Transformed cells were cultured at 30 °C overnight.

The DTT, consisting of five recording units, was previously cloned¹. Signal-responsive ENGRAM recorders targeting DTT were generated by replacing HEK3-targeting pegRNAs with DTT-targeting pegRNAs. In brief, signal-responsive ENGRAM recorders were subjected to digestion by NcoI and AgelI to remove HEK3-targeting pegRNAs. DTT-targeting pegRNAs with Gibson overhangs were ordered as gBlocks (IDT). Assembled products were transformed into Stable Competent E.coli (NEB, C3040). Transformed cells were cultured at 30 °C overnight.

Cell culture, transient transfections, nucleofection and piggyBac integrations

HEK293T (CRL-11268) and K562 cells (CCL-243) were purchased from ATCC. CF-1 MEF (ASF-1216) feeder cells were purchased from Applied StemCell. Mouse ES cells (E14TG2a) were a gift from C. Schröter. HEK293T cells and MEFs were cultured in DMEM, high glucose (Gibco). K562 cells were cultured in RPMI 1640 medium (Gibco). All media were supplemented with 10% fetal bovine serum (Hyclone) and 1% penicillin/streptomycin (Gibco). MEF medium was supplemented with additional 1× GlutMAX (Gibco). Normal mES cells were cultured in Ndiff 227 medium (Takara) supplemented with 3 µM CHIR99021 (Selleck, S2924), 1 µM PD0325901 (Selleck, S1036), 1,000 units of ESGRO recombinant mouse LIF protein (Sigma-Aldrich, ESG1107) and 1% penicillin/streptomycin (2i + LIF medium). For culture of both MEFs and mES cells, wells in the culture plates were coated with 0.1% gelatin (Sigma, G1393) in an incubator at 37 °C for 60 min. Cells were grown with 5% CO₂ at 37 °C.

Transfection of HEK293T, K562 and mES cells was performed using Lipofectamine 3000 (ThermoFisher, L3000015), a Lonza 4D-Nucleofector and Lipofectamine 2000 (ThermoFisher, 11668019), respectively, following the manufacturer’s protocol.

For transfection of HEK293T cells, 1 × 10⁵ cells were seeded on a 24-well plate 1 day before transfection; 500 ng of plasmid (prime editor plasmid, pegRNA plasmid or a mixture of both, with a mass ratio of 1:4) was used for transient transfections; 500 ng of cargo plasmid (prime editor plasmid, ENGRAM pegRNA plasmid or DTT) and 100 ng of Super piggyBac transposase expression vector (SBI) were used for piggyBac integrations. PE2(+) HEK293T cells were picked by sorting single cells to a 96-well plate, followed by selection with 1 μg ml⁻¹ puromycin dihydrochloride (Gibco) and prime editing efficiency verification. Single-cell-derived PEmax(+) HEK293T cells were obtained using the same approach and were then used in recording with DNA Typewriter, whereas PE2(+) cells were used in all other recording experiments relying on HEK293T cells. For nucleofection of K562 cells, 4 × 10⁵ cells were transfected with either 2 μg of plasmid (prime editor plasmid, pegRNA plasmid or a mixture of both, with a mass ratio of 1:4) for transient transfection or 2 μg of cargo plasmid (prime editor plasmid, synthetic DNA Tape, 300 CRE library or 98 synthetic CRE library) + 400 ng of transposase expression vector for piggyBac integration. All transfections were performed in 16-well strips (20 μl) with programme code FF-120. Single-cell-derived PE2(+) K562 cells were picked by the methods described above.

For transfection in mES cells and construction of the ENGRAM mES cell line, three recording components—Dox-inducible PEmax (TRE-PEmax-mCherry-BlastR), a library of ENGRAM recorders (including all 98 synthetic CREs, driving expression of uniquely barcoded pegRNA) and DNA TAPE bearing the synthetic HEK3 target sequence (synHEK3-TAPE)—were integrated in two steps to minimize background recording activity. First, 600 ng of TRE-PEmax-mCherry-BlastR plasmid, 3 μg of ENGRAM recorder plasmid and 400 ng transposase expression vector were mixed and transfected into 1 × 10⁶ mES cells using Lipofectamine 2000. At 24 h post transfection, 8 μg ml⁻¹ Blasticidin S HCl (Gibco) was added to the medium for selection of cells with the TRE-PEmax-mCherry-Blast plasmid. Of note, massive cell death was observed about 6 days post transfection, possibly due to the low integration efficiency of the large (over 10 kb) TRE-PEmax-mCherry-Blast plasmid. Polyclonal mES cells bearing Dox-inducible PEmax and ENGRAM recorders were cultured in 2i + LIF Ndiff 227 medium. Second, 600 ng of plasmid encoding the puromycin resistance gene (PuroR), 3 μg of plasmid bearing synHEK3-TAPE and 400 ng of transposase expression vector were mixed and transfected into 1 × 10⁶ mES cells using Lipofectamine 2000. At 24 h post transfection, 800 ng ml⁻¹ puromycin dihydrochloride was added to 2i + LIF Ndiff 227 medium for selection of cells with PuroR plasmid.

Signal recording with ligands

Doxycycline hyclate (Dox; Sigma, D9891) was reconstituted in PBS to a final concentration of 10 mg ml⁻¹. TNF (R&D Systems, 210-TA-020/CF) was reconstituted in 1 ml of PBS to make 20 μg ml⁻¹ stock. CHIR99021 (Selleck, S2924) was purchased as 10 mM stock (1 ml in DMSO). All ligands were stored at −20 °C, thawed immediately before use and diluted with the appropriate culture medium. Concentrations tested here fall within the range in which these agonists are typically used^36,37,38.

For ligand-recording experiments, 1 × 10⁵ PE2(+) HEK293T cells were seeded on a 48-well plate 6 h before treatment then 1 ml of medium with ligand or negative control was added to each well. For the time-series experiment, cells were washed with warm medium and harvested 24 h following ligand removal. The same volume of DMSO or PBS was added to the medium as a negative control. Cells were split in a 1:5 ratio every 2 days and medium was changed every day.

For sequential editing with DNA Typewriter, 1 × 10⁵ PEmax(+) HEK293T cells were seeded on a 48-well plate 6 h before treatment then 1 ml of medium with 100 ng ml⁻¹ doxycycline or 3 μM CHIR99021 was added to each well. Cells were split in a 1:5 ratio every 2 days and medium was changed every day. Cells were harvested on day 6 of the experiment.

Gastruloid induction and recording

Mouse gastruloids were induced using a published protocol^36,37. In brief, 100,000 ENGRAM mES cells were seeded on a gelatin-coated, six-well plate and cultured in 2i + LIF Ndiff medium for 2 days, which produced a more homogenous starting population for gastruloid induction. To start induction, cells were dissociated with TrypLE Express Enzyme (Gibco) at 37 °C for 4 min to create a single-cell suspension. Cells were counted and diluted in Ndiff medium to a concentration of 6,000–7,000 ml⁻¹, and 300–350 then seeded to a 96-well, U-shaped-bottom microplate (Nunclon Sphera, treated, Thermo, 174929) with 50 μl of Ndiff medium. The medium was changed every day, and 3 μM CHIR99021 was added briefly from 48–72 h following aggregation. Windowed recording was activated by the addition of 50 ng ml⁻¹ doxycycline for 24 h. Gastruloids were harvested for sequencing 24 h post activation.

Recovery of recorded information from DNA Tape and DTT

Genomic DNA was extracted using a previously described protocol²². In brief, cells were washed once with PBS and lysed with freshly prepared lysis buffer (10 mM Tris-HCl pH 7.5, 0.05% SDS and 25 μg ml⁻¹ proteinase K (ThermoFisher, EO0492)) to a final concentration of 5,000 cells μl⁻¹. The lysate was incubated at 50 °C for 1 h, followed by an 80 °C enzyme inactivation step for 30 min.

For retrieval of information recorded to various kinds of DNA Tape (including the endogenous HEK3 locus, the synthetic HEK3 locus integrated into the genome and the DTT integrated into the genome), the target region in gDNA was amplified with two-step PCR (KAPA2G Robust HotStart ReadyMix) and sequenced on an Illumina sequencing platform. The first PCR reaction included 2 μl of cell lysate and 0.5 μM forward and reverse primer with a final reaction volume of 50 μl. The number of PCR reactions required for each sample depends on the complexity of the recorded signal, because more complex recording patterns would require more reactions to capture the full diversity of edits. We typically aimed to PCR amplify at least 2,000 DNA Tape-containing amplicon molecules per signal, which is equivalent to 1,000 cells per signal for the endogenous HEK3 locus or 100 cells for synthetic DNA Tapes such as synHEK3-TAPE or DTT, assuming 20 integrations per cell. PCR reactions were performed as follows: 95 °C for 3 min and 22 cycles of 98 °C for 20 s, 65 °C for 15 s and 72 °C for 40 s. The resulting PCR product was then size selected using a dual-size-selection clean-up of 0.5× and 1.0× AMPure XP beads (Beckman Coulter) to remove gDNA and small fragments (below 200 bp), respectively. The second PCR reaction included 1 ng of the size-selected product and 0.2 μM forward and reverse primers containing a flow-cell adaptor and sample index, with a final reaction volume of 25 μl. PCR reactions were performed as follows: 95 °C for 3 min and five cycles of 98 °C for 20 s, 65 °C for 15 s and 72 °C for 40 s). The final PCR product was pooled and cleaned with 0.9× AMPure XP beads (Beckman Coulter). The library was sequenced as a single-end read with either a 150 cycle kit on MiSeq or NextSeq 500/550, or a 100 cycle P1/P2 kit on NextSeq 2000. FASTQ files were demultiplexed with bcl2fastq (v.2.20, Illumina). Primers used for PCR are provided in Supplementary Table 5.

Analysis of recording data

The barcodes used in this paper include CTT insertion, pentamer (5 bp degenerate or specific barcodes) and hexamer (300 specific barcodes for 300 unique CREs) on the HEK3 DNA Tape, and the hexamer (NNNGGA, two unique barcodes for two signals) on DTT. To ensure distinctiveness for CRE and signal recordings, hexamer and pentamer barcodes were selected with a Hamming distance greater than two from other members within the same set. For some but not all experiments, barcodes were picked to have a balanced editing score to minimize recording efficiency bias across different insertion sequences. The criteria by which insertional barcodes were chosen for ENGRAM recorders used in experiments throughout this paper is summarized in Supplementary Table 6.

For extraction of barcode information from sequencing reads, custom commands and python code were used. For barcodes recorded in the HEK3 locus, a custom pattern-matching function was used followed by analysis with custom python code. For CRE- and signal-specific barcodes, unexpected barcodes within one Hamming distance from the expected sequences were corrected for insertion counts whereas raw counts were used in 5N degenerate barcode recording. Barcodes with fewer than five reads were excluded from downstream analysis. The editing score was calculated as (genomic reads with specific insertion/total edited HEK3 reads)/(plasmid reads with specific insertion/total plasmid reads). Two-tailed Student’s t-tests were performed for comparison of differences between two recording conditions. Differential activity analysis for 98 synthetic ENGRAM recorders between different cells was performed using DESeq2 (ref. ⁴⁷), with raw barcode counts as input. Barcodes accounting for less than 0.01% of total barcode reads were removed from the analysis. Differential active recorders were called with thresholds of adjusted P < 0.001 (mES versus K562 cells, and mES versus HEK293T cells) or P < 0.1 (mES cells versus gastruloid) for a fold difference greater than two.

For hexamers recorded to DTT (NNNGGA), sequencing reads were first aligned to the five-unit DTT reference using bwa (v.0.7.1)⁴⁸ with default settings. The aligned reads were then processed with custom python code to extract the positional insertion and bigram proportions at adjacent positions on the five-unit DTT. The order of signals was inferred by calculation of the bigram ratio (log₂-transformed ratio of the (Tet-On → WNT) versus (WNT → Tet-On) bigrams at adjacent positions).

Bulk RNA-seq and data analysis

For bulk RNA-seq experiments, HEK293T cells, single-cell-derived PE2(+) HEK293T cells and PE2(+) ENGRAM-NF-κB-recorder(+) HEK293T cells (treated with 10 ng ml⁻¹ TNF or PBS for 48 h) were collected in triplicate. RNA from cells collected was purified using the RNeasy Mini Kit (Qiagen, 74104) with on-column DNase treatment using the RNase-Free DNase Set (Qiagen, 79254). A complementary DNA sequencing library was generated using TruSeq RNA Library Prep Kit v.2 (Illumina), following the manufacturer’s protocol, and sequenced with a paired-end, 100 cycle P2 kit on NextSeq 2000. Fastq files were demultiplexed with bcl2fastq (v.2.20, Illumina). Sequencing reads were trimmed using Cutadapt⁴⁹ and aligned to the human reference genome (hg38) using STAR (v.2.7.3)⁵⁰, both with default settings. Differential expression analysis was performed using DESeq2 (ref. ⁴⁷). Differentially expressed genes were called with thresholds of adjusted P < 0.05 for a change of over 50% (log₂-transformed fold change above 0.58).

Prediction of RNA structure and editing score

Both RNA structure and minimal free energy prediction were performed using the NUPACK python package⁵¹ with default settings. A linear lasso regression model to predict editing score of 5 bp barcodes was trained using the python package scikit-learn. We defined 85 features to characterize the 5 bp sequence for which insertional efficiency is predicted: (1) sequence features or 84 binary features corresponding to one-hot encoded sequence, including 20 for single-nucleotide content (four nucleotides × five positions) and 64 for dinucleotide content (16 dinucleotides × four positions); and (2) structure feature or rescaled minimum free energy within the range (0,1). Samples were split with 724 barcodes in a training set and 300 in a test set. The model was trained with tenfold cross-validation on the training set and then used to predict the test set.

MOI estimation using PCR and qPCR

The MOI of various constructs was determined with one of two methods: quantitative PCR (qPCR) and PCR followed by DNA quantification with TapeStation High Sensitivity reagents (Agilent).

To assess the overall MOI of piggyBac integration, K562 cells were transfected with GFP cargo plasmid with or without piggyBac transposase plasmid. Genomic DNA was purified every 2–3 days for 15 days using the DNeasy Blood & Tissue Kit (Qiagen, 69504). Either qPCR on gDNA was performed using TaqPath qPCR Master Mix (ThermoFisher, A15297) with primers designed for GFP and RPPH1 as internal control. MOI was estimated by normalization of GFP C_t values to RPPH1 C_t values, assuming two copies in the genome.

For assessment of the MOI of specific recording components (PEmax, ENGRAM recorder and synHEK3-Tape), gDNA from the cell lysate was amplified with specific primers and quantified relative to PCR using RPPH1 primers. In brief, 1 million cells were counted and lysed in 200 μl of lysis buffer. PCR included 2 μl of cell lysate (equivalent to an input of 10,000 cells) and 0.5 μM forward and reverse primers targeting a specific region, with a final reaction volume of 25 μl. PCR reactions were performed as follows: 95 °C for 3 min and 22 cycles of 98 °C for 20 s, 65 °C for 15 s and 72 °C for 40 s. PCR products were quantified using Tapestation and MOI was estimated by normalization of the target DNA concentration to RPPH1 DNA concentration. The sequences of primers used for qPCR and PCR are provided in Supplementary Table 5.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link