Categories: NATURE

Intermittent rate coding and cue-specific ensembles support working memory

Subjects

Three adult male rhesus monkeys (Macaca mulatta), aged 11, 12 and 8 years, participated in the experiment. Monkeys A, H and J weighed 11, 14 and 12 kg, respectively. All surgical and experimental procedures were approved by the Stanford University Institutional Animal Care and Use Committee and were in accordance with the policies and procedures of the National Institutes of Health.

Behavioural task

Stimuli were presented on a VIEWPixx3D monitor positioned at a viewing distance of 60 cm using Psychtoolbox and MATLAB (v.R2022a, MathWorks). Eye position was monitored at 1 kHz using an Eyelink 1000 eye-tracking system (SR Research). On each trial, the animals were presented with a cue at one of eight possible locations and reported this location after a brief memory delay to receive a fluid reward. Cues were square frames (green for monkey A, black for monkey H, white for monkey J) measuring 1° of visual angle on a side, and presented at 5–7° of eccentricity (depending on the session).

Monkeys initiated behavioural trials by fixating a central fixation spot presented on a uniform grey background. After the monkeys had maintained fixation for 600–800 ms (randomly selected on each trial), a cue appeared for 50 ms at one of eight possible locations separated by 45° around fixation. Cue presentation was followed by a delay period that varied randomly from 1,400 to 1,600 ms. Following the delay period, the fixation spot disappeared and the animal was presented with one of two possible response screens. On MTS trials, two targets appeared (filled blue circles, radius 1° of visual angle (DVA)): one at the previously cued location and the other at one of the seven remaining non-cued locations. On MGS trials, no targets appeared. In either case, the animals received a reward of juice for making an eye movement to within 5 DVA of the previously cued location and then maintaining fixation for 200 ms. MTS and MGS trials were randomly interleaved such that the animals could not predict the trial type. Monkey J was trained on, and performed only, the MGS task. The animals had to maintain their gaze within either 3 DVA (monkey A) or 2 DVA (monkeys H and J) from fixation throughout the trial until the response stage. The intertrial interval was 300–1,000 ms following each correct response. Failure to acquire fixation, fixation breaks and incorrect responses were not rewarded and were followed by a 2,000 ms intertrial interval.

Surgical procedures and recordings

Monkeys were implanted with a titanium headpost to immobilize the head, and with a titanium chamber to provide access to the brain (see ref. ¹² for full details). In a previous study⁶⁶, we identified the frontal eye field based on its neurophysiological characteristics and ability to evoke saccades with electrical stimulation at low currents⁶⁷. Here we recorded from both area 8, within and anterior to the frontal eye field, and the principal sulcus (9/46) (monkey A, area 8; monkey H, areas 8 and 9/46; monkey J, area 9/46), using primate Neuropixels probes³⁹. During each session, we pierced the dura using a screw-driven, 21-gauge pointed cannula and lowered a single probe through this cannula using a combination of custom three-dimensional printed grids and motorized drives (NAN instruments). Probe trajectories spanned several cortical columns, as inferred from the broad distribution of preferred cue locations across neurons. Recordings were allowed to settle for around 30 min before the start of the experiment, to mitigate drift. We configured probes to record from 384 active channels in a contiguous block, allowing dense sampling of neuronal activity along a 3.84 mm span.

Neuropixels filter and digitize activity at the headstage separately for the action potential bands (300 Hz high-pass filter, 30 kHz sampling frequency) and local field potential (1 kHz low-pass filter, 2.5 kHz sampling frequency). Activity was monitored during experimental sessions and saved to disk using SpikeGLX (https://billkarsh.github.io/SpikeGLX/).

Data preprocessing

Spiking in the action potential band was identified and sorted offline using Kilosort3 (ref. ⁶⁸). Because we were interested in population-level coding of memory, we analysed both putative single- and multi-unit clusters identified by Kilosort. Spike times were aligned to a digital trigger on each trial, indicating cue onset, and corrected for a lag in stimulus presentation estimated offline using photodiode measurements from the stimulus display and the timing of the cue-evoked response. Neurons that fired fewer than 1,000 spikes in the experimental sessions (each roughly 3 h) were excluded from further analyses. Spike times were converted to smoothed firing rates (sampling interval, 10 ms) by representing each spiking event as a delta function and convolving this time series with a 100 ms boxcar. For CCG analyses, unsmoothed spikes times were binned with a width and timestep of 1 ms. Incorrect trials were rare (Fig. 1b) and were excluded from subsequent analysis. Waveform templates were localized in space using NeuropixelUtilities (https://djoshea.github.io/neuropixel-utils/). Local field potentials (LFPs) were sampled at 2,500 Hz. Offline, LFPs were filtered using a 2–200-Hz-bandpass, zero-phase Butterworth, notch filtered at 60 Hz and downsampled to 1,000 Hz. LFPs were transformed into the time-frequency domain using Morlet wavelets and downsampled to 100 Hz.

Statistics and reproducibility

All statistical tests are two-sided unless otherwise specified. Key findings and brain–behaviour relationships were evident in each of the three animals (Extended Data Figs. 2 and 7), and in each of the 25 individual experimental sessions (Extended Data Fig. 3). Sample size (three animals, 8,255 neurons) was chosen based on standards in the field. Each animal was exposed to every task manipulation. Within a session, task manipulations were randomized across trials. Neurons were recorded without bias, and electrodes placed to maximize signal-to-noise of the electrophysiological signal.

Functional subtyping

To determine the functional subtype of units⁶⁹ (Extended Data Fig. 1), we analysed firing rates during three time epochs: visual (0–300 ms after cue onset), memory (500–1,400 ms after cue onset) and motor (100–300 ms post-fixation offset). A unit was labelled as being selective during a given epoch if firing rates during that epoch were significantly modulated by cue location (one-way ANOVA, P < 0.05 criterion). Units were then sorted into functional subclasses based on the set epochs during which each unit was selective.

Classification of cue location

Firing-rate estimates for each unit and time point relative to cue onset were z-scored across trials before classification. We used linear classifiers to quantify the amount of information on the location of the cue in populations of simultaneously recorded units. We held out each trial for test one by one, training a logistic regression classifier (as implemented by fitclinear.m in MATLAB) to predict cue location using the population vector of firing rates. Specifically, classifiers were trained to discriminate the same cue location as the test trial from the opposite cue location, using the applicable subset of trials from the training set. Data were subsampled during training to equalize trial counts for the two conditions. A unique classifier was trained and tested for each time point relative to cue onset. ‘Classification accuracy’ reflects the proportion of correctly classified test trials (Fig. 2c); ‘classifier confidence’ is the non-thresholded value of the logistic function corresponding to the probability assigned by the classifier to the correctat test (Fig. 3).

Cross-state classification (Extended Data Fig. 6a) was similar, except that only values from trial time points labelled On or Off (as appropriate) entered the training set or were held out as a test trial. Test confidence values were averaged across the memory delay (500–1,400 ms after cue) to yield the final results.

Cross-temporal classification (Fig. 6b) was also similar, except that we used a split-half approach in which the classifiers for each time point were trained on half of the available population of trials and tested (cross-temporally) using the other half.

Mixture modelling of confidence

We used a mixture-modelling approach to test whether confidence during the memory delay (500–1,400 ms after cue) was best described as drawn from a one- or a two-state distribution (Extended Data Fig. 5). To do this, for each session and cue location we modelled the probability density function of confidence values during the memory delay as either a single beta distribution,

$$p(c)={\rm{Beta}}(c\,;\alpha ,\beta ),$$

or a mixture of two beta distributions,

$$p(c)=w\times {\rm{B}}{\rm{e}}{\rm{t}}{\rm{a}}(c\,;\alpha ,\beta )+(1-w)\times {\rm{B}}{\rm{e}}{\rm{t}}{\rm{a}}(c\,;{\alpha }^{{\prime} },{\beta }^{{\prime} }),$$

where c is confidence, $\alpha ,\beta ,{\alpha }^{{\prime} }$ and ${\beta }^{{\prime} }$ parameterize beta distribution(s) and w is the mixing coefficient. The best-fitting parameters of each model were identified by maximum-likelihood estimation using gradient descent in MATLAB. We used fourfold cross-validation on the population of trials to assess the likelihood of each model on held-out test data, and then normalized by the number of trials and changed the log likelihood to base 2 to yield the cross-validated score of each model in terms of bits per trial. Finally, we subtracted these two model scores and averaged across conditions to yield the difference in model performance for each session.

Note that our choice of beta distribution here is principled: it is extremely flexible, able to demonstrate a broad range of skewness and kurtosis and naturally accommodates bounded continuous variables such as confidence⁷⁰. This flexibility makes this analysis conservative, ensuring that the one-state model is capable of describing a broad range of empirical distributions. Nevertheless, results were not dependent on the exact modelling approach: similar results for our one-versus-two-state model comparison were obtained when using either Gaussian mixture or hidden Markov models.

Analysis of microsaccades

The horizontal and vertical eye position records were convolved with a Gaussian kernel (σ = 4.75 ms) to suppress noise before taking first derivatives, yielding the eye velocity along each dimension. We then took the root sum of squares of the horizontal and vertical velocities to obtain eye speed. We flagged peaks in this time series with a minimum peak height of 10° s⁻¹ and a minimum interpeak distance of 50 ms as microsaccades (ref. ⁷¹ and Extended Data Fig. 4), which were confirmed by visual inspection of the data.

Labelling of On and Off states

To identify On and Off states (Fig. 4a), we repeated the cue classification analysis described above 50 times, randomly shuffling the labels of the training set for each test trial. This yielded, for each trial, a null distribution of 50 confidence time series (Fig. 4a). We then z-scored each time point of the true confidence time series by the mean and standard deviation of this null distribution. Individually significant (above 1.96) z-values were cluster corrected for multiple comparisons over time⁷². In brief, we compared the sum of contiguous individually significant z-values with that expected by chance (randomization test). Clusters with a mass greater than the 95% percentile of the null were labelled On states; contiguous z-values falling below a conservative (P > 0.20) threshold for at least five consecutive time points were labelled Off states.

Tuning curves

To test whether On and Off states reflected coordinated changes in tuning across the neural population, we used a split-half approach. First, firing-rate estimates for each unit and time point relative to cue onset were z-scored across trials. Then, for each session, we randomly divided the population of units in half. We used one half of the units to identify On and Off states, as described above. Next, for each unit in the held-out population, we computed mean firing rate during the memory delay for each cue location separately for On and Off states, averaging across relevant time points and across trials. This yielded, for each unit, two eight-element vectors—the On and Off tuning functions. To align tuning functions across units, the preferred cue location for each was identified as the condition in which the sum of the On and Off functions was greatest, and assigned an arbitrary value of zero degrees. Alignment of tuning curves to the maximum-valued preferred cue in this way will necessarily produce a peak at zero degrees in the average tuning function, even in the absence of true tuning. To correct for this, for each unit we also computed null On and Off tuning functions by first shuffling cue labels across trials, aligned these to the preferred cue and subtracted them from the true On and Off tuning functions (Fig. 3d).

For demonstration purposes, we fit the average On and Off tuning functions with a difference of Gaussians using gradient descent in MATLAB. Difference of Gaussians is useful for describing tuning curves that show surround suppression⁷³.

Population firing rates

To describe how population firing rates evolved over the course of the trial, we averaged these across all units recorded in the same session and across all trials for the preferred cue location (greatest mean classification confidence during the memory delay), yielding a single time series for each session. We then normalized this time series by the mean and standard deviation of a 400 ms baseline period (−400 to 0 ms relative to cue onset), yielding a metric of population spiking in units of standard deviations above baseline (Fig. 4e, grey traces). We repeated this analysis for the memory delay, this time including only data points labelled On or Off (Fig. 4e, orange and blue traces).

Phase–state relationships

To determine whether the phases of different frequency components of the LFP were predictive of On and Off states, we first extracted the phase from the time-frequency representation of the LFP. Next, we identified the onset time of On and Off states during the memory delay across all trials within a session. Then, for each session, probe channel and frequency (4–60 Hz), we computed the (circular) mean phase at On state and Off state onset, and the magnitude of the angular difference between these means. If phase is predictive of state, this difference should be larger than that expected by chance. Accordingly, we obtained a null distribution of phase difference magnitudes by repeating this procedure 1,000 times, randomly permuting On and Off labels for each phase measurement on each iteration and used this null distribution to generate a z-score metric. Z-scores were averaged across channels, yielding a phase–state metric for each session and frequency of interest. Finally, we tested whether these scores were greater than zero for each frequency of interest (cluster-corrected randomization test).

Standard deviation of background noise

To ensure that changes in recording quality could not account for the presence of On and Off states in our recordings, we measured the standard deviation of background noise⁴² in the action potential band (0.3–10 kHz). Specifically, for each On and Off state identified using the non-parametric procedure described above, noise standard deviation was estimated as

$${{\sigma }}_{n}={\rm{m}}{\rm{e}}{\rm{d}}{\rm{i}}{\rm{a}}{\rm{n}}\left(\frac{|x|}{0.6745}\right),$$

where x is the time series of raw action potential band values recorded during the state. Noise estimates were averaged across all On and Off states within each session.

Demixed principal components analysis

We used demixed principal components analysis analysis to decompose population activity into different components reflecting cue location, time and their interaction. As with our classification-based analyses, we applied demixed principal components to the smoothed firing rates from each session, focusing on activity during the delay period (500–1,400 ms after cue). The proportion of variance explained and components were extracted as described in ref. ⁵⁰.

Single-neuron ANOVA

We downsampled the smoothed firing rates for each neuron to 100 ms steps and modelled firing rate during the delay (500–1,400 ms after cue) as a linear combination of cue location, firing rate and their interaction, to estimate the proportion of variance explained by each of these terms.

CCG analysis

To characterize functional connectivity among units, we computed cross-correlations between spike trains of all pairs of simultaneously recorded neurons with mean firing rates greater than 1 Hz. CCGs were computed separately for each cue location. Following previous studies⁴⁶, to mitigate firing-rate effects, we normalized cross-correlation for each pair of neurons by the geometric mean of their firing rates for the cue location condition under consideration. The CCG for a pair of neurons (j, k) in condition c was therefore

$${{\rm{CCG}}(\tau )}_{j,k,c}=\frac{{\sum }_{i=1}^{M}{\sum }_{t=\tau +1}^{N}{x}_{j}^{i}(t-\tau )\times {x}_{k}^{i}(t)}{\sqrt{{\sum }_{i=1}^{M}{\sum }_{t=\tau +1}^{N}{x}_{j}^{i}(t-\tau )\times {\sum }_{i=1}^{M}{\sum }_{t=\tau +1}^{N}{x}_{k}^{i}(t)}},$$

where M is the number of trials collected for cue location c, N is the number of time bins within a trial, $\tau $ is the time lag between the two spike trains and ${x}_{k}^{i}(t)$ is 1 if neuron j is fired in time bin t of trial i, but zero otherwise.

To correct for correlation due to stimulus locking or slow fluctuations in population response, we subtracted a jittered CCG from the original. This jittered CCG reflects the expected value of the CCG computed from all possible jitters of each spike train within a given jitter window^74,75. The jittered spike train preserves both the poststimulus time histogram (PSTH) of the original spike train across trials and the spike count in the jitter window within each trial. As a result, jitter correction removes the correlation between PSTHs (stimulus locking) and those on time scales longer than the jitter window (slow population correlations). We chose a 25 ms jitter window, following previous work^45,46,76,77.

We classified a CCG as significant if the peak of the jitter-corrected CCG occurred within 10 ms of zero and was more than seven standard deviations above the mean of a high-lag baseline period (100 > |$\tau $| > 50)⁴⁵. Zero-lag CCGs were excluded from the analyses reported here, although their inclusion yielded statistically indistinguishable results.

All CCGs were estimated using spike trains during the memory delay (500–1,400 ms after cue) to avoid the influence of visually evoked responses. CCG analyses specific to On and Off states (Fig. 5f) were computed by first setting x(t) to zero for all time points not identified as On or Off (respectively), and then repeating the analysis described above.

Manhattan distance

To determine whether patterns of functional connectivity differed according to the contents of memory, we compared the graphs of significant CCGs across cue locations in a pairwise manner (Fig. 5c–f). For each session and cue location, we represented the results of our CCG analyses as a graph in which nodes were units. The edge (connection) between each pair of units was assigned a weight of 1 if the pair had a significant CCG, and zero otherwise. Then, for each possible pair of cue locations, we computed the Manhattan distance, the number of edges with a weight that differed across the two graphs. Finally, we averaged this metric across all 28 possible pairs of conditions, yielding one summary statistic per session.

To normalize this mean Manhattan distance for comparison across sessions, we shuffled the cue location labels within each pair of neurons for each pair of conditions under consideration across trials, and repeated the entire analysis pipeline 50 times (25 for analyses specific to On and Off states), from CCG estimation through Manhattan distance calculation. We then z-scored the mean Manhattan distance for each session by this null distribution and compared these z-scores to zero (Fig. 5f).

Note that CCGs among both single and multi-units have been widely used as a measure of functional connectivity^{78,79,80,81,82,83,84}. Indeed, CCGs based on multi-unit activity may be more sensitive in detection of correlations in spiking than similar analyses of single-neuron pairs^78,85,86. Nonetheless, the presence of multi-units in our dataset does limit the conclusions that might be drawn about the specific neuronal subtypes involved in the cue-dependent ensembles that we observe—for example, putative pyramidal versus non-pyramidal neurons.

Firing-rate-matched control

The geometric mean firing rate of pairs of units varied significantly across the eight cue locations (one-way ANOVA, P = 0.002; Fig. 6a). Geometric mean firing rates were statistically indistinguishable, however, across cue locations 1–4 (P = 0.332) and 5–8 (P = 0.884). Therefore, we repeated the analysis of Manhattan distance described above, this time computing it among only cue locations 1–4 and 5–8 (Extended Data Fig. 6b), to yield a firing-rate-matched variant of the analysis presented in Fig. 5f.

Joint selectivity

To determine the selectivity of units during the evoked response, we averaged each unit’s cue-locked firing rate over time (0–400 ms after cue onset), yielding an nTrials × 1 vector of firing rates. We then performed one-way ANOVA to evaluate the relationship between cue location and firing rate. If the effect of cue location was significant (P < 0.05), the unit was deemed selective to cue location, and the location at which it had the greatest mean firing rate was labelled the preferred location. Pairs of units were deemed jointly selective if they were selective for the same cue location.