Exaptation of ancestral cell-identity networks enables C4 photosynthesis

Plant growth

For the de-etiolation time course, seeds of Oryza sativa spp. japonica cultivar Kitaake and Sorghum bicolor BTx623 were incubated in sterile water for two days and one day, respectively, at 29 °C in the dark. Germinated seedlings were transferred in a dark room equipped with green light to a 1:1 mixture of topsoil and sand supplemented with fertilizer granules and grown for five days in the dark by wrapping the tray and lid several times with aluminium foil. Plants were placed in a controlled environment room with 60% humidity and temperatures of 28 °C and 20 °C during the day and night, respectively. Plants were exposed to light at the beginning of a photoperiod of 12 h light and 12 h dark and shoots were harvested at different time points during de-etiolation by flash-freezing tissue in liquid nitrogen. For the 0-h time point, seedlings were harvested in a dark room equipped with green light and flash-frozen immediately.

For microscopy analysis and enrichment of bundle-sheath nuclei using fluorescence-activated nuclei sorting, O. sativa spp. japonica cultivar Kitaake single-copy homozygous T2 seeds were de-husked and sterilized in 10% (v/v) bleach for 30 min. After washing several times with sterile water, seeds were incubated for two days in sterile water at 29 °C in the dark. Germinated seedlings were transferred to half-strength Murashige and Skoog medium with 0.8% agar in Magentas and grown for five days in the light in a growth chamber at temperatures of 28 °C and 20 °C during the day and night, respectively, and a photoperiod of 12 h light and 12 h dark.

Construct design and cloning

To generate constructs for the rice bundle-sheath marker line, the coding sequence for mTurquoise2 was obtained from a previous report⁴³, and the promoter sequence from Zoysia japonica PHOSPHOENOLPYRUVATE CARBOXYKINASE in combination with the dTALE STAP4 system was obtained from a previous report⁴⁴. The coding sequence of Arabidopsis thaliana H2B (At5g22880) was used as an N-terminal signal for targeting mTurquoise2 to the nucleus. All sequences were domesticated for Golden Gate cloning^45,46. Level 1 and Level 2 constructs were assembled using the Golden Gate cloning strategy to create a binary vector for the expression of STAP4-mTurquoise2-H2B driven by PCK-dTALE.

For the transactivation assay in rice protoplasts, transcription factor coding sequences were amplified using rice leaf cDNA or synthesized using GeneArt after domesticating the sequences for Golden Gate cloning^41,42 (OsDOF2, LOC_Os01g15900, OsDOF8, LOC_Os02g45200, OsDOF23, LOC_Os07g32510, OsDOF27, LOC_Os10g26620, SbDOF2, Sobic.003G121400, SbDOF8, Sobic.004G284400, SbDOF11, Sobic.001G489900 and SbDOF17, Sobic.006G182300). The coding sequences were assembled into a Level 1 module with a Zea mays UBI promoter and Tnos terminator module as described previously³⁷. For the minimal SIR promoter, nucleotides –980 to –829, as well as the endogenous core promoter (nucleotides –250 to +42), were fused with the LUCIFERASE reporter to measure transcription activity³⁷.

To generate GUS reporter rice lines, the minimal SIR promoter was assembled into a Level 1 module with the coding sequence for kzGUS (an intronless version of the GUS reporter gene) and the Tnos terminator as described previously³⁷. The DOF motifs in the minimal SIR promoter were mutated using PCR amplification.

Rice transformation

Oryza sativa spp. japonica cultivar Kitaake was transformed using Agrobacterium tumefaciens as described previously⁴⁷, with several modifications. Seeds were de-husked and sterilized with 10% (v/v) bleach for 15 min before placing them on nutrient broth (NB) callus induction medium containing 2 mg l⁻¹ 2,4-dichlorophenoxyacetic acid for four weeks at 28 °C in the dark. Growing calli were co-incubated with A. tumefaciens strain LBA4404 carrying the expression plasmid of interest in NB inoculation medium containing 40 μg ml⁻¹ acetosyringone for three days at 22 °C in the dark. Calli were transferred to NB recovery medium containing 300 mg ⁻¹ timentin for one week at 28 °C in the dark. They were then transferred to NB selection medium containing 35 mg l⁻¹ hygromycin B for four weeks at 28 °C in the dark. Proliferating calli were subsequently transferred to NB regeneration medium containing 100 mg l⁻¹ myo-inositol, 2 mg l⁻¹ kinetin, 0.2 mg l⁻¹ 1-naphthaleneacetic acid and 0.8 mg l⁻¹ 6-benzylaminopurine for four weeks at 28 °C in the light. Plantlets were transferred to NB rooting medium containing 0.1 mg l⁻¹ 1-naphthaleneacetic acid and incubated in Magenta pots for two weeks at 28 °C in the light. Finally, plants were transferred to a 1:1 mixture of topsoil and sand and grown in a controlled environment room with 60% humidity, temperatures of 28 °C and 20 °C during the day and night, respectively, and a photoperiod of 12 h light and 12 h dark.

Transactivation assay

Rice leaf protoplast isolation was performed as described previously^37,48. Protoplasts were transformed using Golden Gate Level 1 modules designed for constitutive expression of transcription factors, alongside the LUC reporter and the ZmUBIpro::GUS-Tnos transformation control, which were prepared with the ZymoPURE II Plasmid Midiprep Kit. The transformation mixture contained 2 µg of control plasmids, 5 µg of reporter plasmids and 5 µg of transcription factor plasmids, which were transformed into 180 µl of protoplasts. After incubating protoplasts for 20 h in the light, proteins were extracted using passive lysis buffer (Promega), and GUS activity was measured with 20 µl of the protein extract. A fluorometric MUG (4-methylumbelliferyl-β-d-glucuronide) assay was used for quantifying GUS activity⁴⁹ in a reaction mixture of 200 µl containing 50 mM phosphate buffer (pH 7.0), 10 mM EDTA-Na₂, 0.1% (v/v) Triton X-100, 0.1% (w/v) N-lauroylsarcosine sodium, 10 mM DTT and 2 mM MUG. The assay was performed at 37 °C, and 4-methylumbelliferone (4-MU) fluorescence was recorded every 2 min for 20 cycles at 360 nm excitation and 450 nm emission using a CLARIOstar plate reader. In addition, LUC activity was determined using 20 µl of protein sample and 100 µl of LUC assay reagent from Promega. Transcription activity was quantified as LUC luminescence relative to the rate of MU accumulation per second.

GUS staining

GUS staining was performed as described previously⁴⁹, with minor modifications. Leaf tissue was fixed in 90% (v/v) acetone for 12 h at 4 °C. After washing with 100 mM phosphate buffer (pH 7.0), samples were transferred into 1 mg ml⁻¹ 5-bromo-4-chloro-3-indolyl glucuronide (X-Gluc) GUS staining solution and vacuum was applied five times for 2 min each. The samples were incubated at 37 °C for 48 h. To clear chlorophyll, samples were incubated in 90% (v/v) ethanol at room temperature. Cross-sections were prepared with a razor blade and images were taken with an Olympus BX41 light microscope.

To quantify GUS activity, a fluorometric MUG assay was used⁴⁹ as described above, using 200 mg of mature leaf tissue. A standard curve of ten 4-MU standards was used to determine the 4-MU concentration in each sample.

Confocal microscopy

To test the bundle-sheath-specific expression of mTurquoise2-H2B, recently expanded leaf 3 of seven-day-old seedlings was prepared for confocal microscopy by scraping the adaxial side of the leaf blade two to three times with a sharp razor blade, transferring to water to avoid drying out and then mounting on a microscope slide with the scraped surface facing upwards. Confocal imaging was performed on a Leica TCS SP8 X using a 10× air objective (HC PL APO CS2 10×0.4 Dry) with optical zoom, and hybrid detectors for fluorescent protein and chlorophyll autofluorescence detection. The following excitation (Ex) and emission (Em) wavelengths were used for imaging: mTurquoise2 (Ex = 442, Em = 471–481), chlorophyll autofluorescence (Ex = 488, Em = 672–692).

SEM

For the de-etiolation experiment of rice and sorghum, samples from four to six individual seedlings for each time point (0 h, 6 h, 12 h and 48 h) were collected for electron microscopy. Leaf segments (around 2 mm²) were excised with a razor blade and immediately fixed in 2% (v/v) glutaraldehyde and 2% (w/v) formaldehyde in 0.05–0.1 M sodium cacodylate (NaCac) buffer (pH 7.4) containing 2 mM calcium chloride. Samples were vacuum infiltrated overnight, washed five times in 0.05–0.1 M NaCac buffer and post-fixed in 1% (v/v) aqueous osmium tetroxide, 1.5% (w/v) potassium ferricyanide in 0.05 M NaCac buffer for three days at 4 °C. After osmication, samples were washed five times in deionized water and post-fixed in 0.1% (w/v) thiocarbohydrazide for 20 min at room temperature in the dark. Samples were then washed five times in deionized water and osmicated for a second time for 1 h in 2% (v/v) aqueous osmium tetroxide at room temperature. Samples were washed five times in deionized water and subsequently stained in 2% (w/v) uranyl acetate in 0.05 M maleate buffer (pH 5.5) for three days at 4 °C and washed five times afterwards in deionized water. Samples were then dehydrated in an ethanol series, and transferred to acetone and then to acetonitrile. Leaf samples were embedded in Quetol 651 resin mix (TAAB Laboratories Equipment) and cured at 60 °C for two days. Ultra-thin sections of embedded leaf samples were prepared and placed on Melinex (TAAB Laboratories Equipment) plastic coverslips mounted on aluminium SEM stubs using conductive carbon tabs (TAAB Laboratories Equipment), sputter-coated with a thin layer of carbon (around 30 nm) to avoid charging, and imaged in a Verios 460 scanning electron microscope at a 4 keV accelerating voltage and 0.2 nA probe current using the concentric backscatter detector in field-free (low-magnification) or immersion (high-magnification) mode (working distance 3.5–4 mm, dwell time 3 µs, 1,536 × 1,024 pixel resolution). For overserving plastid ultrastructure, SEM stitched maps were acquired at 10,000× magnification using the FEI MAPS automated acquisition software. Greyscale contrast of the images was inverted to allow easier visualization.

Enrichment of bundle-sheath nuclei using fluorescence-activated cell sorting

To purify the nuclei population from whole leaves, recently expanded leaves 3 from five seven-day-old wild-type rice seedlings were chopped on ice in nuclei buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl₂, 0.5 mM spermidine, 0.2 mM spermine, 0.01% Triton X, 1× Roche complete protease inhibitors, 1% BSA and Protector RNase inhibitor) with a sharp razor blade. The suspension was filtered through a 70-mm filter and subsequently through a 35-mm filter. Nuclei were stained with Hoechst and purified by fluorescence-activated cell sorting (FACS) on an AriaIII instrument, using a 70-mm nozzle. Nuclei were collected in an Eppendorf tube containing BSA and Protector RNase inhibitor. Using the same approach, nuclei from the bundle-sheath marker line expressing mTurquoise2-H2B were isolated. Nuclei were sorted on the basis of the mTurquoise2 fluorescent signal. Nuclei were collected in minimal nuclei buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl₂, RNase inhibitor and 0.05% BSA). After collection, nuclei were spun down in a swinging bucket centrifuge at 405g for 5 min, with reduced acceleration and deceleration. Nuclei were resuspended in minimal nuclei buffer and mixed with the unspun whole leaf nuclei population to achieve a proportion of approximately 25% mTurquoise2-positive nuclei. The bundle-sheath enriched nuclei population was sequenced using the 10X Genomics Gene Expression platform with v.3.1 chemistry, and sequenced on the Illumina NovaSeq 6000 with 150-bp paired-end chemistry.

Chlorophyll quantification

Seedlings were harvested at specified time points during de-etiolation and immediately flash-frozen in liquid nitrogen. Frozen tissue was ground into fine powder and the weight was measured before suspending the tissue in 1 ml of 80% (v/v) acetone. After vortexing, the tissue was incubated on ice for 15 min with occasional mixing of the suspension. The tissue was spun down at 15,700g at 4 °C and the supernatant was removed. The extraction was repeated, and supernatants were pooled before measuring the absorbance at 663.6 nm and 646.6 nm in a spectrophotometer. The total chlorophyll content was determined as described previously⁵⁰.

Nuclei extraction and single-nucleus RNA-seq (10X RNA-seq)

Frozen tissue from each time point (one biological replicate per time point, eight time points) was crushed using a bead bashing approach, and nuclei were released from homogenate by resuspending in nuclei buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl and 3 mM MgCl₂). The resulting suspension was passed through a 30-μm filter. To enrich the filtered solution for nuclei, an Optiprep (Sigma) gradient was used. Enriched nuclei were then stained with Hoechst, before being FACS purified (BD Influx Software v.1.2.0.142). Purified nuclei were run on the 10X Gene Expression platform with v.3.0 chemistry, and sequenced on the Illumina NovaSeq 6000 with 150-bp paired-end chemistry. Single-cell libraries were made following the manufacturers protocol. Libraries were sequenced to an average saturation of 63% (14% s.d.) and aligned either to the rice (O. sativa, subspecies Nipponbare; MSU annotation)⁵¹ or sorghum (S. bicolor v.3.0.1; JGI annotation)⁵² genome. Chloroplast and mitochondrial reads were removed. For each time point, an average of 12,524 nuclei were sequenced (6,405 s.d.), with an average median unique molecular identifier (UMI) of 1,152 (420 s.d.) across both species. Doublets were removed using doubletFinder⁵³.

Nuclei extraction and single-nucleus RNA-seq (sci-RNA-seq3)

Each individual frozen seedling (10–12 individual seedlings per time point) was crushed using a bead bashing approach in a 96-well plate, after which homogenate was resuspended in nuclei buffer. Resulting suspensions were passed through a 30-μm filter. Washed nuclei were then reverse-transcribed with a well-specific primer. After this step, remaining pool and split steps for sci-RNA-seq3 were followed as outlined previously²⁶. We note the same approach was used to sequence the 48-h time point; however, a population of six plants was used instead of individual seedlings. Libraries were sequenced to an average saturation of 80% (5% s.d.), and sequenced on the Illumina NovaSeq 6000 with 150-bp paired-end chemistry. Reads were aligned to either the rice or the sorghum genome, as described above. Chloroplast and mitochondrial reads were removed. For 0–12-h time points, an average of 6,527 nuclei were sequenced (5,039 s.d.), with an average median UMI of 423 (41 s.d.) across both species. For the 48-h time point, 77,208 and 82,748 nuclei were sequenced with a median UMI of 757 and 740 for rice and sorghum, respectively.

Nuclei extraction and single-nucleus RNA-seq (10X Multiome)

Fresh seedling tissue was collected after 0 or 12 h light treatment (two biological replicates per species, each with two to four technical replicates per time point; n = 11). Fresh tissue was chopped finely on ice in green room conditions in nuclei buffer. The resulting homogenate was filtered using a 30-μm filter. Nuclei were enriched using Optiprep gradient. No FACS was performed. Nuclei were run on the 10X Multiome platform with v.1.0 chemistry. Single-cell libraries were made following the manufacturer’s protocol, and sequenced on the Illumina NovaSeq 6000 with 150-bp paired-end chemistry. Reads were aligned to either the rice or the sorghum genome, as described above. Chloroplast and mitochondrial reads were removed. For each sample, an average of 1,923 nuclei were sequenced (1,334 s.d.), with an average median UMI of 1,644 (646 s.d.) and median ATAC fragments 10,251 (7,001 s.d.) across both species.

Nuclei clustering

Transcriptional atlases were generated separately for each species using Seurat⁵⁴. Nuclei were first aggregated across various time points (ranging from 0 to 48 h) and methods (10X and sci-RNA-seq3). The integrated dataset was subjected to clustering, using the top 2,000 variable features that were shared across all datasets. Each cluster contained nuclei sampled from all time points, indicating that clustering was driven predominantly by cell type rather than by time after exposure to light (Extended Data Fig. 2). Subsequent UMAP projections were constructed using the first 30 principal components. UMAP projections of mesophyll and bundle-sheath sub-clusters in rice and sorghum, respectively, were achieved using genes found to be significantly differentially expressed in response to light as variable features. To analyse the rice bundle-sheath-specific mTurquoise line, we integrated two treatment replicates into a unified dataset. For this dataset, we clustered using the first 30 principal components. Cluster-specific markers were identified using the FindMarkers() command (adjusted P value < 0.01). To determine the correspondence between the mTurquoise-positive cluster and clusters within the rice-RNA atlas, we compared the lists of cluster-specific markers (adjusted P value 0.01, specificity > 2) to those obtained from the rice atlas. For the 10X-multiome (RNA + ATAC) clustering we used Signac⁵⁵. Biological and technical replicates for each species were integrated, and clustering was conducted using the first 50 principal components derived from expression data. After the initial peak calling using Cell Ranger (10X Genomics), peaks were subsequently re-called using MACS2 (ref. ⁵⁶). Differentially accessible peaks between cell types were identified using the FindMarkers() command (adjusted P value < 0.05, per cent threshold > 0.3), before being associated with the nearest gene (±2,000 bp from transcription start site)

Orthology analyses

We determined gene orthologues between rice and sorghum using OrthoFinder⁵⁷. We constructed pan-transcriptome atlases by selecting expressed rice and sorghum genes that had cross-species orthologues. To construct the pan-transcriptome atlas, orthologue conversions were performed in a one-to-one manner, meaning that if multiple orthologues for a gene were found across species, only one was retained. We integrated these datasets with Seurat using the clustering approaches described above. To assign cell identities, we drew on cell-type labels that were previously assigned to each species separately and mapped them onto the pan-transcriptome clusters. To assess specific transcriptional differences in gene expression between the bundle-sheath clusters of sorghum and rice within this dataset, we used the FindMarkers() command (adjusted P value < 0.05). Sorghum DOF transcription factor orthologue names kept the same numerical identifier as their rice orthologues.

To examine the overlap of cell-type-specific gene-expression markers between the two species, we identified cell-type markers from our main transcriptional dataset using FindMarkers() (adjusted P value < 0.05, min.pct > 0.1). We note that some genes were found to be significant across multiple cell types. To assess the significance of the overlap between cell types across species, we converted genes to orthogroups and conducted a Fisher’s exact test, with the total number of orthogroups in the dataset as the background. The proportion of conserved marker genes for each cell type across species ranged from 43% for mesophyll (184 out of 426 rice marker genes conserved in sorghum) to 13% for bundle sheath (31 out of 229 rice marker genes conserved in sorghum). We note that by relying on orthogroups, we included higher-order orthology relationships beyond a one-to-one manner.

Next, we assessed consistent and differential partitioning of gene-expression patterns among each cell-type pair (15 pairs total). To do this, we first calculated differentially expressed genes for each cell-type pair by pseudo-bulking transcriptomes of individual cell types across 0–12-h time points. Next, we identified partitioned expression patterns between cell types using an ANCOVA model implemented in DESeq2 (adjusted P < 0.05). To perform cross-species comparisons of cell-type pairs, we first converted differentially expressed genes to their orthogroup. We then overlapped each cell-type pair across species, using orthogroup membership, and evaluated the significance of these overlaps using the Fisher’s exact test, with the total number of orthogroups as background. Finally, to distinguish whether a gene displayed consistent or differential partitioning in a particular cell type, we examined whether its fold change expression was higher or lower compared with its counterpart in the corresponding cell type of the other species.

Differential expression and accessibility responses to light

We discovered cell-type-specific differentially expressed genes during the first 12 h of light by pseudo-bulking transcriptional profiles. To create pseudo-bulk profiles for each cell type, we first refined our nuclei clusters through re-clustering mesophyll, epidermal and vasculature cell classes separately, before selecting sub-clusters that most strongly expressed known cell-type marker genes. For each cell type, we calculated the first and second principal component of these bulked profiles and found differentially expressed genes through fitting linear models to each of these principal components, as well as those that responded linearly with time using DESeq2 (adjusted P < 0.05). We treated the assay with which the nuclei were sequenced (10X or sci-RNA-seq3) as a covariate. In this list of differentially expressed genes, we also included genes that were differentially expressed between time points 0 h and 12 h in a pairwise test (adjusted P < 0.05). Next, to uncover the different trends of gene expression among differentially expressed genes, we clustered genes using hierarchical clustering, choosing clustering cut-offs that resulted in 10 rice and 18 sorghum clusters that contained at least 10 genes. To visualize the expression of these clusters, we scaled the expression and fitted a non-linear model to capture the dominant expression trend. Accessible chromatin within canonical photosynthesis genes was found through pseudo-bulking accessible chromatin by cell type. Accessible peaks needed to be within 2,000 bp of the gene body. Only one peak per gene was retained for subsequent analyses, and extreme outliers were removed (around 5% of called peaks). To compare peak accessibility across species, reads per peak were re-normalized between 0 and 1. Significant differences in accessibility between cell types of this group of genes were assessed using a Student’s t-test (one-sided).

GO analyses

To identify GO terms associated with cell-type-specific genes and genes that swap expression patterns in rice and sorghum leaves, we performed singular enrichment analysis using the web-based tool AgriGO v.2.0 (ref. ⁵⁸). Oryza sativa or S. bicolor gene identifiers were used for the input sample list, and the whole genome of the respective plant species was used as background.

Cis-element analyses

We detected cell-type-specific accessible motifs within each cell type using the chromVAR function⁵⁹ implemented in Signac. In brief, this approach detected over-represented cis-regulatory elements within the JASPAR2020 plant taxon group⁶⁰ among peaks that are differentially accessible across cell-type clusters. GC enrichment and genomic backgrounds used for statistical tests were derived from BSGenome assembled genomes⁶¹. The same approach was also used to detect light-responsive cis-elements, using light- and dark-treated nuclei within each cell type. We overlapped enriched cis-regulatory elements identified across species by selecting the top 25 most significantly over-represented motifs (adjusted P < 0.05), before computing a Fisher’s exact test using all computed motifs as background, and then clustered the resulting motifs using TOBIAS⁶².

To find consistently and differentially partitioned orthologous genes within our multiome gene-expression dataset, we found mesophyll and bundle-sheath-specific genes in rice and sorghum, respectively, using the FindMarkers() command, with a P value threshold cut-off of 0.01 and an expression specificity above 1.25. To find over-represented motifs within differentially partitioned genes, we correlated peak accessibility with gene expression using the LinkPeaks() command and kept only those peaks which were significantly associated with gene expression. We identified enriched cis-elements within these peaks using the FindMotifs() command; ranking by significance (adjusted P < 0.05). Because the resulting significance depends on the subset of the genome chosen as background, we iterated the FindMotifs() command over 100 permutations to rank motifs that were consistently reported as enriched. We then averaged each motif’s respective rank across the 100 permutations to create a final ranked value (Supplementary Table 13).

To quantify the occurrence of DOF-binding sites, we extracted the genomic sequence of peaks that were proximal to the transcription start site (±1,500 bp). If a peak was proximal to two transcription start sites, it was assigned to the closer one. We then implemented Find Individual Motif Occurrences (FIMO) to quantify the number of DOF consensus sites within these chromatin regions (P value threshold = 0.005). We chose the DOF2 (MA0020.1) motif as representative of the core DOF consensus sequence AAAG.

We implemented analysis of motif enrichment (AME) to detect DOF transcription factor motifs enriched within C. laxum (http://phytozome-next.jgi.doe.gov/info/Claxum_v1_1), H. vulgare (Hvulgare_r1)⁶³ or B. distachyon (Bdistachyon_314_v3.0)⁶⁴ homologues of genes consistently partitioned to the rice and sorghum bundle sheath. To identify homologues, the NCBI BLASTN tool v.2.15.0 was used by comparing coding sequences, and the top identified homologue for each gene was selected for cis-element enrichment analyses. We used 1,000 bp upstream of the transcription start site for each homologous gene and tested against reported plant motifs present within the JASPAR database.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link