Categories: NATURE

The conformational space of RNase P RNA in solution

RNase P RNA sample preparation

RNase P RNA was transcribed in vitro in transcription buffer (20 mM potassium–HEPES buffer pH 7.5, 25 mM MgCl₂, 1 mM DTT) for 3 h with recombinant T7 bacteriophage RNA polymerase and double-stranded DNA template amplified by PCR from a synthesized plasmid pUC18. This plasmid encodes the full-length RNase P RNA sequence from bacterial strain G. stearothermophilus (GenBank access number M19021.1) with an upstream T7 RNA polymerase promoter sequence, GGATCCAGCTCGAAATTAATACGACTCACTATA. After in vitro transcription (IVT), the magnesium pyrophosphate precipitate was removed by centrifugation with a spin rate of 13,000 rpm for 10 min using a high-speed benchtop centrifuge. RNase-free DNase I (New England BioLabs) and the final concentration 5 mM CaCl₂ were added into the IVT supernatant, and the solution was incubated at 37 °C to fully digest the double-stranded DNA template for an additional 30 min. The final concentration of 200 mM sodium chloride (NaCl) was then added to the IVT solution for overnight refolding at 4 °C before further purification. The refolded RNA was subjected to fast protein liquid chromatography (GE HealthCare ÄKTA pure) and purified by a suggested non-denatured method¹⁰ using a size-exclusion chromatography (SEC) column (HiLoad 16/600 Superdex 200 pg). The column was pre-equilibrated with the SEC elution buffer (25 mM Tris pH 7.5, 100 mM NaCl, 1 mM MgCl₂) before purification, and the monomeric RNase P RNA molecules were eluted with a flow rate of 1.0 ml min⁻¹ and separated from aggregation species. Eluting RNase P RNA was detected by absorbance at 280 and 260 nm; peak fractions were collected based on the SEC chromatogram (Supplementary Fig. 4a) and stored at 4 °C for a period of a few minutes (typically, less than 30 min) before the AFM visualization. A small aliquot of the RNA was taken from the elution fractions for checking the purity and folding using 8% native PAGE and electrospray ionization (ESI) mass spectrometry (Supplementary Fig. 4b,c).

AFM experiment and image processing

All AFM experiments were performed in a physiologically relevant buffer solution using a Cypher VRS AFM (Asylum Research, Oxford Instruments) at 4 °C with amplitude-modulated dynamic a.c. mode, known as the tapping mode. To conduct RNA adhesion on an AFM mica surface, the mica supports were freshly treated with 1-(3-aminopropyl) silatrane (APS) (synthesized in-house). A 50 mM APS stock solution was diluted by 300-fold in ultrapure water just before use and coated on freshly cleaved muscovite mica (Highest Grade V1 mica discs, Ted Pella). After 30 min, the mica surface was rinsed with ultrapure water (Pico Pure Water system, Avidity) several times, and dried gently with filtered nitrogen gas. Then, 10 μl 8 nM RNase P RNA in the purified buffer (25 mM Tris pH 7.5, 100 mM NaCl, 1 mM Mg²⁺) was deposited on the APS-functionalized mica surface for 20 min and washed with 500 μl AFM buffer (10 mM MES pH 6.8, 10 mM KCl and 1 mM MgCl₂) at least three times. FASTSCAN-D-SS AFM probes (Bruker, resonance frequency of 110 kHz in fluid and spring constant of 0.25 N m⁻¹) with tip apex radius of 1 nm were used for high-resolution imaging. A pulsed blue laser (BlueDrive) equipped with the AFM instrument was used for photothermal excitation in tapping mode, positioned at the rear of the cantilever, while a super-luminescent photodiode was positioned near the head of the cantilever to detect cantilever deflection. AFM images for particle cropping were collected with a scan size of 500 × 500 nm², 1,024 × 1,024 pixels² and a scan rate of 1.0 Hz. The AFM tip was carefully landed on the RNA molecular surface on a piezoelectric scanner driven by an initial setpoint voltage of 450 mV and a free amplitude of 500 mV. The setpoint voltage was reduced in a stepwise manner (10 mV per step) while the tip was approaching the surface and adjusted based on the image quality during imaging. For images used for 3D topological structure determination, the raw images were processed with Gwyddion⁶² using the following steps: (1) plane flatness correction by applying second-order polynomial levelling to the particle-free region; (2) filtering by correct horizontal scans to remove string artefacts; and (3) fast Fourier transform analysis to remove high-frequency noise in Fourier transform space. Single-particle images were cropped from the processed images and converted into a text file with X, Y, Z topography information for structure recapitulation.

Enzymatic assay of 5′ processing of precursor tRNA

Human pre-tRNA^Gln was produced by in vitro transcription using T7 bacteriophage RNA polymerase. As DNA templates, to avoid the non-specific N + 1 activity of the T7 RNA polymerase, PCR products that bear two consecutive 2′-O-methyl modifications on the 5′ end of the template strand were used. The pre-tRNA^Gln was purified using 8 M urea denaturing PAGE and eluted in 300 mM sodium acetate buffer (pH 5.3). The eluted pre-tRNA^Gln was further refolded at 10 μM using a stepwise and temperature-ramped protocol (90 °C for 3 min to fully denature pre-tRNA then quickly ramped down on a PCR machine to 4 °C at a maximum rate of 5 °C s⁻¹). After the pre-tRNA^Gln refolding, 2 μM of each in vitro-transcribed pre-tRNA was incubated with 2 μM purified RNase P complex at 37 °C in the presence of 25 mM Tris–HCl pH 7.5, 100 mM NaCl and with various MgCl₂ concentrations, ranging from 1 mM to 50 mM (Fig. 4f and Extended Data Fig. 8). Reactions were terminated by adding the denaturing gel loading buffer and incubated at 4 °C. Samples were analysed on an 8 M urea Tris–borate–EDTA denaturing preparative 10% polyacrylamide (29:1 acrylamide:bisacrylamide) gel and stained with SYBR Gold (Invitrogen by Thermo Fisher Scientific). The 5′ processing of pre-tRNA^Gln via RNase P RNA produces the 76-nucleotide product, which can be separated from the remaining pre-tRNA^Gln by denaturing PAGE (Extended Data Fig. 8b,c and Supplementary Fig. 5). Quantification was performed using ImageJ software and the data were analysed using GraphPad Prism 10. All the experiments were performed at least in duplicate.

Dynamic light scattering

Before dynamic light scattering (DLS) measurements, the RNA samples were centrifuged (5 min, 10,000 rpm, 11,000g) in a refrigerated Sigma benchtop centrifuge. DLS experiments were performed at the Biophysics Resource facility, National Cancer Institute, using the DynaPro Plate Reader III Dynamic Light Scattering instrument (Wyatt Technologies) composed of a laser light source (830 nm laser diode), a plate reader cell, a detector placed at a fixed angle of 90°, a photomultiplier amplifying the signal and a correlator. A 20 μl portion of sample solution was loaded into the corresponding sample well of the 384-well microwell plate. The sample plate, placed pairwise on cushion plate holders of the four-place swinging-bucket rotor of a benchtop centrifuge, was centrifuged for 5 min at 1,500g to eliminate air bubbles in the sample well. Before the plate was placed into the instrument, its bottom surfaces were wiped gently with a sheet of soft lens-cleaning tissue (Olympus Optical). At least 20 successive DLS measurements were performed per sample after 1 min waiting time to enable the solutions to be at equilibrium. The translational diffusion coefficient (D_T), molecular weight (M_w) and hydrodynamic radius (R_h) were calculated from averaging of a series of autocorrelation profiles, at a fixed RNA concentration of 0.20 mg ml⁻¹.

LC–ESI MASS experiments

Liquid chromatography–mass spectrometry (LC–MS) experiments were performed on a 6520 Accurate-Mass Q-TOF LC/MS system equipped with a dual electrospray source, operated in positive-ion mode. Samples included 2 μM RNase P RNA in RNase-free double-distilled water solution. Acetonitrile was added to all samples to a final concentration of 10%. Data acquisition and analysis were performed using a Mass Hunter Workstation (v.B.06.01). For data analysis and deconvolution of mass spectra, Mass Hunter Qualitative Analysis software (v.B.07.00) with Bioconfirm Workflow was used. The supernatant was transferred to polypropylene injection vials for LC–MS analysis. LC–MS was performed with a TSQ Quantiva triple quadrupole mass spectrometer (Thermo Fisher Scientific) operating in selected reaction monitoring mode with positive ESI and with a Shimadzu 20AC-XR system using a 2.1 × 50 mm², 2.7 µm Waters Cortecs C18 column.

Recapitulation and accuracy of topological structure

The AFM image of full-length RNase P RNA shows a broad number of distinct and heterogeneous particle shapes in terms of molecular surface. After image processing, as described above, we selected and evenly cropped isolated particles, that is, the particle does not overlap with any neighbour molecule, resulting in a total of 161 individual topological conformers. The individual 3D structures for the 161 AFM images were calculated and the structure quality and degree of accuracy were estimated using the HORNET software package: https://github.com/PNAI-CSB-NCI-NIH/HORNET (ref. ⁴⁴). Three of these 161 particles were not converged within a reasonable computing time. These particles are apparently elongated, which could possibly be explained by partial unfolding, and thus were excluded from further analysis. Briefly, the native information and initial 3D folding were built using a combination of the crystal model (PDB 2A64), SimRNA⁶³ and Coot⁶⁴; the last two packages were respectively applied to modelling the missing residues that were not resolved in the crystal model, followed by the structure refinement procedure. Then, the initial model was aligned against the AFM image, in which the optimized rotation and translation of the model give a maximum score of agreement between the experimental AFM image and the calculated image of the respective model orientation. Next, the optimized initial model orientation was saved, and a configuration file created for the dynamic fitting step applying CafeMol^65,66, for a trajectory with a total of 20 million frames (approximately 0.1 μs). Of note, the native structure information of local contact, stacking and base-pairing energies was scaled by a weighting factor of 5, 9 and 9, respectively. Finally, the trajectory and structure accuracy were analysed and evaluated by HORNET, in which the top models were selected. The same procedure was performed for all 158 particles (Supplementary Figs. 1–3).

SAXS experimental data acquisition and ensemble analysis

The SAXS experiments were carried out using the in-house instrument at the NCI SAXS core facility (BioSAXS-2000, Rigaku) located at the National Cancer Institute. The photon energy used was 8.04 keV (λ = 1.54 Å). The combination of OptiSAXS optic, two-dimensional Kratky collimation and sample-to-detector distance of 0.484 metres enabled us to obtain a q range of 0.0051 < q < 0.6767 Å⁻¹, where q is the magnitude of the momentum transfer, q = (4π/λ)sinθ, 2θ is the scattering angle and λ is the wavelength of the radiation. To minimize radiation damage and obtain a good signal-to-noise ratio, 8 image frames were captured for each sample at 0.55 mg ml⁻¹ (4 μM), in the same buffer conditions used for AFM imaging at various Mg²⁺ concentrations, using a flow cell with an exposure time of 900 s per frame. The two-dimensional scattering patterns were collected using a Dectris PILATUS 100K detector and then converted to one-dimensional SAXS curves through radial averaging. The one-dimensional data from the 8 frames were subsequently averaged after per-curve evaluation of outliers using the software SAXSLab v.4.0.2 (Rigaku). To examine the effect of sample concentration on the scattering profiles, we repeated the SAXS experiments with samples at lower and higher concentrations: 0.38 and 0.76 mg ml⁻¹. The similarity between the SAXS intensity profiles obtained from the two datasets was assessed using CorMap⁶⁷ and the reduced χ² values, neither of which showed any indication of concentration-dependent effects with statistical significance (Supplementary Fig. 6).

To orthogonally classify the conformers determined from HORNET and AFM (Fig. 2a), we applied SAXS experiments, which are non-correlated to AFM-derived results.

The SAXS data and standard analysis for all recorded scattering experiments at different Mg²⁺ concentrations are available in the SAXSBDB public repository⁶⁸; the accession codes can be found in the data availability section of this article. Structural models derived from HORNET and AFM experimental data collected at 1 mM Mg²⁺ (Fig. 2a) were used to fit the SAXS experimental profile collected at each Mg²⁺ concentration. SAXS and AFM present distinct yet complementary experimental approaches for structural investigation. SAXS provides averaged information over the entire ensemble of molecules, in a concentration range from μM to mM, in which the scattering signal is collated over a timescale of minutes. AFM, on the other hand, is a direct visualization of individual particles, at low nM sample concentrations, where each particle represents a snapshot of a single conformer of the ensemble at the time of immobilization. However, the structure calculations of thousands of particles observed by AFM is impractical in terms of both labour and computational resources. A complementary method, such as SAXS, therefore, is more suitable for characterizing the solution ensemble of models derived from AFM data.

As the population fraction of each conformer is unknown, including interconversion of species on various timescales, we assume that the experimental SAXS profile can be described by an ensemble of models where the total SAXS intensity (I_Total) is a linear combination of n conformers. The contribution of each conformer is determined by the minimization of the discrepancy between the calculated profile and experimentally recorded data, that is:

$${I}_{{\rm{Total}}}=\mathop{\sum }\limits_{i=1}^{n}{\nu }_{i}{I}_{i}(q)$$

(1)

where ν_i represents the volume fraction of particle i with a scattering intensity profile equal to I_i(q). The 158 AFM-derived conformers were used as a pool of reference structures for SAXS profile fitting, and the synthesized profile of I(q) for each particle was determined using CRYSOL⁶⁹. The optimized volume fraction, ν, for each component of the ensemble is obtained by minimizing the discrepancy between the back-calculated I_Total and I_experimental (χ²) curves using an in-house Python script that implements an iterative least-squares process⁷⁰ by applying a trust region reflective algorithm⁷¹, with boundaries of 0 ≤ ν_i ≥1.

The best fit to the experimental data (χ² = 1.7) was obtained with an ensemble of 3 of the 158 conformers, with volume fraction percentages of 18% (S31), 76% (S69) and 6% (S53) (Fig. 2b,c). However, different combinations of different models could achieve a similar fit to the SAXS profile, with χ² ranging from 1.8 to 15 (Fig. 2f). This finding is expected, given that the more populated conformers throughout the course of the data collection contribute more to the total scattered intensity, and our calculated particles from AFM images represent only a sampling of the billions of particles immobilized on the mica surface.

The three conformers that best fit the SAXS data exhibited different levels of compactness, from very compact (S31), to partially open (S69), to fully extended (S53). Based on this observation, we then classified the 155 models into 3 clusters (classes C1, C2 and C3) using the 3 representative structures as reference. For this task we partitioned the theoretical SAXS profile, respectively, for each of the 155 models using the cosine distance⁷² among all variations of intensity as a function of q (I(q)). A minimum threshold of 0.1 was set per similarity cluster. The largest group was C2 with 117 models (S69 as reference), which was also the largest volume fraction observed by SAXS, followed by C1 with 31 models (S31 as reference) and C3 with 7 models (S53 as reference).

Indeed, the classified particles based on the SAXS profile similarity to representative structures show similar topological features, as shown in Fig. 2d, overlaid with the reference structure for each class. This analysis hinges on global conformational similarity by ignoring local conformational fluctuation (Fig. 2d). The three topological classes of conformers are defined primarily by the relative orientations of the two modular domains of RNase P RNA, namely the substrate specificity (S-) and catalytic (C-) domains, which are linked by a flexible linker (Fig. 2g), giving R_g values ranging from 46 to 58 Å (Fig. 2e).

To further investigate the variation of χ² during ensemble fitting, we performed an optimization of the volume fraction settings using different combinations of the 158 particles. In this procedure, we start with the fitting of each of 158 conformers, independently, mapping the χ² values, and then correlating them with the class (C1,C2 and C3) to which that particle belongs. Then, we add in permutations of a given number (1, 2, 3, 5, 10, 20, 50) of random conformers from each class, map the χ² for 10,000 rounds, in which (1) assumes one structure from each class, (2) assumes two structures from each class, (3) assumes three structures from each class, and so on.

In terms of χ² fluctuation, we observed that C2-like particles have better agreement with the experimental SAXS profile, as those particles indeed represent an intermediate topology between closed and open conformations with respect to the S- and C-domains. The combination of C2 and C1 can reach a χ² as small as 2.1, but the combination of C1, C2 and C3 gives the best χ² and the lowest standard deviation (Fig. 2f). Combinations including a greater number of conformers yield a better fit in terms of the χ² mean (µ), standard deviation (σ) and minimum value, but do not improve beyond 20 from each class (µ = 2.0, σ = 0.5). C3 conformers show the largest deviation from the experimental SAXS data (µ = 31.4), the largest χ² range (σ = 7.7) and the largest minimum χ² (24.2).

Given that every method has its limitations, we applied two additional independent methods to classify the 158 AFM-derived conformers: clustering in PC space⁷³ and the ensemble optimization method (EOM)^45,74. The analysis of clustering in PC space makes use of orthogonal eigenvectors to describe the maximal variance of the space distribution among the 158 models. We observed that seven components were sufficient to cover more than 70% of the variance. Making use of these components, we clustered the 158 models into three main clusters (Extended Data Fig. 4c). The PC analysis and clustering were performed using the Bio3d package. SAXS data were analysed using the EOM package, applying the genetic algorithm (GAJOE) module, which uses a searching process to select a subensemble of models from a pool that is sufficient to describe the SAXS data. For this analysis, we used a maximum number of conformers per ensemble of 50, a number of ensembles per generation of 50 and a minimum number of models per ensemble of 1, with no curve repetition. The EOM performed the fitting for 100 cycles of repeated searching. The best fits achieved using EOM had χ² values of 2.3 and 1.1, respectively, for SAXS data recorded at 1 mM and 5 mM Mg²⁺. As the results in Extended Data Fig. 4d show, RNase P RNA presents three main distributions of R_g with high frequency, the largest of which is around 51 Å, the next largest between 47.5 and 50 Å and the smallest population showing R_g > 55 Å. At 5 mM Mg²⁺, the largest population shifts to smaller R_g values, R_g < 47.5 Å, and no significant counts are observed for the extended conformers with R_g > 52 Å.

Isothermal titration calorimetry

RNase P RNA (4 µM) was dialysed overnight at 4 °C against the isothermal titration calorimetry (ITC) buffer (20 mM HEPES pH 7.5, 100 mM NaCl, 0.1 mM MgCl₂) before ITC measurements. The dialysis buffer was used to dissolve MgCl₂ hexahydrate (Sigma-Aldrich) to a final concentration of 20 mM MgCl₂, which was used as titrant. Differential heat of the Mg²⁺-induced compaction of RNase P RNA was monitored using a MicroCal PEAQ-ITC instrument (Malvern). After pre-equilibration at 37 °C and an initial delay of 180 s, 0.4 µl of titrant was injected, followed by 18 serial injections (2.0 µl each) with spacing of 720 s. Stirring speed was 750 rpm and the reference power was set at 8 µcal s⁻¹. Thermogram data were recorded as power (µcal s⁻¹) over time. Afterwards, the heat associated with each titration step was integrated and plotted against the molar ratio of Mg²⁺ and the RNA. Each binding isotherm was calibrated for dilution effects by a corrected RNA concentration quantified using a NanoDrop after the ITC experiment.

The beet western yellow virus (BWYV) pseudoknot RNA was transcribed by IVT, followed by SEC purification (Supplementary Fig. 7) using a HiLoad Superdex 75 pg 16/60 prepacked column (Cytiva) with elution buffer (20 mM HEPES pH 7.5, 100 mM NaCl). The purified BWYV pseudoknot RNA was denatured by supplementing 200 mM EDTA into the buffer and heating at 80 °C for 10 min. The 2 µM denatured BWYV pseudoknot RNA was then dialysed overnight at 4 °C against buffer containing 20 mM HEPES pH 7.5. Titrant was prepared by dissolving MgCl₂ hexahydrate (Sigma-Aldrich) in dialysis buffer to a final concentration of 60 mM. Differential heat of the Mg²⁺-induced refolding of BWYV pseudoknot RNA was monitored using the MicroCal PEAQ-ITC (Malvern). After pre-equilibration at 25 °C and an initial delay of 90 s, 0.4 µl of titrant was injected, followed by 22 serial injections (1.0 µl each) with spacing of 100 s. Stirring speed was 750 rpm and the reference power was set at 4 µcal s⁻¹.

For processing of ITC data, the raw thermogram (Supplementary Fig. 8) of each heat compensation profile was used to derive the isotherms (Fig. 2b). Data integration and background heat subtraction were done using the PEAQ-ITC analysis software suite (Malvern).

Invariant core and directional motions

To assess the structural relationship among all 158 resolved conformers of RNase P RNA we applied the Bio3d package⁴⁶ with implemented function for determination of invariant core residues and PCA approach. The invariant core analysis addresses the rigid and 3D invariable region of a pool of models. In this procedure the atom displacements are quantified by an interactive alignment of all structural coordinates, where each round of superposition determines an ellipsoid volume that covers the variance of X, Y and Z coordinates among the structures, and residues with largest fluctuation are removed from the next superposition interaction^46,75. After a sequence of rounds the remaining residues over a small ellipsoid volume define the core region. To define a cutoff of the ellipsoid volume that represents an invariable core we calculate the derivative of the ellipsoid volume as a function of the number of residues present in the remaining refined structure, and the minimum valley is reached with an ellipsoid volume of 150 Å³ (Extended Data Fig. 9). Following the defined core residues, we disseminate the most important directional motions for all resolved 158 conformers using PCA. PCA is a method that is well suited to combine and disseminate similarities and differences of conformational space among different conformers of the same molecule. Briefly, the orthogonal eigenvectors, named as principal components (PC), describe the most variance of the structural data by reducing the dimensionality of features but maintaining the information⁴⁶ and mathematical description in HORNET⁴⁴. The PCA was performed using the core as a reference, to first apply structure superposition of the structure models and, afterwards, the principal components were obtained for the pool of 158 structures. The number of components was determined using the variance and number of components plot (Extended Data Fig. 5a). The five and seven components cover more than 70% and 80% of total fluctuations, respectively.

Correlation of sequence conservation and structural fluctuations

The SCS were calculated by taking into consideration sequence homology, secondary structure conservation and compensatory changes via long-distance tertiary interaction. Multiple sequence alignment, secondary structure consensus and sequence occupancy of 114 bacterial type B RNase P sequences were explicitly defined from the Rfam database⁶¹ with accession number RF00011 (Data availability), the sequence aliment and visualization were performed using Jalview cross-platform⁷⁶. Primary sequence conservation derived from multiple sequence alignment was further transformed and standardized in a probabilistic framework using the ConSurf server⁷⁷. The 3D structural conservation scores are calculated on the basis of ‘1 minus fractional per-residue r.m.s.f.’ of all 158 conformers that were normalized between 0 and 1 (Fig. 3a).