Device fabrication
MZI array
The fabrication started from a silicon-on-insulator wafer (SOITEC) with a 220-nm silicon (Si) device layer and a 2-µm buried oxide layer. A 200-nm-thick positive e-beam resist (CSAR 62) was spin-coated on a diced 1 cm × 1 cm silicon-on-insulator chip, followed by 3 min pre-bake at 150 °C. The e-beam resist was patterned by e-beam lithography (EBL; JEOL JBX-5500 50 kV) and developed in AR 600-546 for 30 s, MIBK for 15 s and IPA for 15 s in sequence. The waveguide patterns were transferred to the Si device layer (etch depth = 110 nm) by reactive ion etching (Oxford Instruments PlasmaPro) with SF6 and CHF3 gases, followed by O2 plasma cleaning of CSAR. A 1-µm-thick silicon dioxide (SiO2) was deposited by plasma-enhanced chemical vapour deposition (Oxford Instruments PlasmaPro) as the upper cladding layer to isolate waveguides from thermo-optic phase shifters. Next, a 2-µm-thick double-layer PMMA (PMMA 495 A8 and PMMA 950 A4) was spin-coated on the chip, followed by EBL patterning and development in MIBK:IPA = 1:3 for 1 min to define the heater patterns. A 200-nm-thick NiCr layer was sputtered using a magnetron sputtering system (physical vapour deposition, AJA International), followed by PMMA lift-off to form NiCr heaters. Gold pads of 100 nm thickness were fabricated using a similar process as NiCr heater fabrication, but with e-beam evaporation (Plassys MEB550S). A 3–5-nm Cr layer was deposited before gold deposition to serve as an adhesion layer. The optical image of the fabricated MZI array is shown in Supplementary Fig. 1.
Photonic memory crossbar array
The Si photonic circuit was fabricated using the foundry multi-project wafer service provided by CORNERSTONE. The detailed specifications of CORNERSTONE standard waveguide components can be found at https://cornerstone.sotonfab.co.uk/. The fabricated Si photonic circuit has a 1-µm-thick SiO2 upper cladding. SiO2 windows were patterned by EBL and opened by hydrogen fluoride for the following deposition of the Ge2Sb2Te5 (GST)/indium tin oxide (ITO) stack. Next, GST/ITO stack windows were opened by the above-mentioned PMMA process. A 10-nm-thick/10-nm-thick GST/ITO stack was deposited on the waveguide using a magnetron sputtering system (physical vapour deposition, AJA International). The GST and ITO targets were respectively sputtered at 30 W RF power with 3 sccm Ar flow and 40 W RF power with 3 sccm Ar flow at a base pressure of 10−7 torr. The stack was then lifted off in acetone for 180 min at 50 °C. Next, the thermo-optic phase shifters were fabricated using the method described for the MZI array. Finally, the chip was annealed on a hotplate for 5 min at 250 °C to fully crystallize the GST. The fabricated photonic memory crossbar array is shown in Fig. 3a.
Photonic EAM tensor core
The photonic EAM tensor core was fabricated using the foundry multi-project wafer service provided by IMEC: iSiPP50G, with details at https://www.imeciclink.com/en/asic-fabrication/si. This platform provides the monolithic integration of passive waveguide circuits, integrated EAMs and integrated photodetectors used in the photonic EAM tensor core.
Measurement setup
Coherence property measurement
The coherent light was generated by a tunable coherent laser (Santec, TSL-550) operating at 1,550 nm. The 0.8-nm-bandwidth C34 partially coherent light was generated by filtering the ASE from an EDFA (Pritel FA-33) with a passive DEMUX module (Gezhi, DWDM-100G-DEMUX) operating at channel C34 of the ITU grid. The 2.0, 4.0, 8.0 and 16.0-nm-bandwidth partially coherent light sources were generated by filtering the same ASE with an optical tunable band-pass filter (Santec, OTF-350) operating at a centre wavelength of 1,550 nm. The spectra were measured by an optical spectrum analyser (Anritsu, MS9710C). For eye diagrams, light was modulated by a pulse generator (Agilent, 8133A) through an electro-optic modulator (Lucent 2623N) and received by a photodetector (Newport New Focus 1611) connected to an oscilloscope (Tektronix, TDS7404B).
System setup for parallel convolutional processing
The experimental setup for parallel convolutional processing on two gait signals is shown in Fig. 4a. The photonic memory crossbar array has three input channels and three output channels, representing a d3×3 matrix consisting of three d1×3 kernels. The input light was switchable between an EDFA (Pritel FA-33) and a tunable pump laser (Santec, TSL-550) using an optical switch (Gezhi GZ-12C-1×2-SM). The phase-change-material photonic memory in each cell of the photonic memory crossbar array was first set to the desired weight to correctly define kernels. The tunable pump laser was used in phase-change-material weight setting. The amplified pump light passed through a DEMUX module (Gezhi, DWDM-100G-DEMUX) so that different wavelengths were routed to different input channels (λ1 = 1,550.12 nm to Ch 1, λ2 = 1,550.92 nm to Ch 2 and λ3 = 1,551.72 nm to Ch 3). After setting all phase-change-material weights, parallel convolution was performed using the ASE from the EDFA. The DEMUX module was used to separate two wavelengths with a spacing of 0.8 nm to two different channels (λ1 = 1,550.12 nm and λ2 = 1,550.92 nm). Each wavelength was split into three channels by an optical splitter (FS PLC splitter). The three channels serve as the input light to the three respective input waveguide channels of the photonic memory tensor core. Adjacent channels have a 1-m path difference, using a further 1-m-long fibre to eliminate the coherence among all three input light sources. The gait-signal data were loaded into each channel using a variable optical attenuator (VOA; Thorlabs V1550A). The VOAs were driven by a digital signal processor (DSP; NI USB-6259). The polarization of output light from the VOA was controlled by a polarization controller (Thorlabs FPC032). Different wavelengths carrying the gait signal at the same time index from different patients were then grouped by a MUX array (Gezhi, DWDM-100G-MUX) to form three inputs to the respective input channels of the photonic memory tensor core. Convolutions were performed naturally as light propagated through the photonic memory crossbar array. Each output channel of the photonic memory tensor core contained both wavelengths λ1 and λ2. The two wavelengths were demultiplexed to obtain the outputs and detected by a photodetector array (Newport New Focus 2011) and finally read out from the DSP.
System setup for high-speed convolutional processing
The experimental setup for high-speed convolutional processing on the MNIST datasets is shown in Supplementary Fig. 13. The whole system operating at 2 GSa s−1 was controlled by a FPGA evaluation board (Xilinx, Zynq UltraScale+ RFSoC ZCU216) with a processing system unit, a programmable logic unit, 16 DACs and 16 analogue-to-digital controllers. The optical input was the 8.0-nm-bandwidth partially coherent light equally split into nine input grating couplers. The MNIST data were read by the processing system unit, stored in its DDR4 memory and accessed by the programmable logic unit to output at nine analogue-to-digital controllers that modulated optical signals through the input EAM array. The weights on the photonic EAM crossbar array were set by a low-speed DSP. The three convolutional processing outputs were received by the integrated photodetector array connected to three transimpedance amplifiers and analogue-to-digital controllers, routed back to the processing system unit and stored in DDR4 memory.
Mapping non-negative transmission to negative convolution results
The input gait signals and image data presented in this work are non-negative, that is, x ∈ [0, 1]. The kernels involve negative values, that is, w ∈ [−1, 1]. The measurable outputs from the photonic system are non-negative as a result of them being physical quantities. We need to map these non-negative outputs to convolution results in the range [−1, 1]. This is done by the following steps:
-
(a)
We normalize every gait signal or image data to [0, 1] using software and load these normalized data to the photonic tensor core using modulators.
-
(b)
We represent the input data x using the output power of the modulator by setting P = x(Pmax − Pmin) + Pmin, in which Pmax and Pmin are the maximum and minimum outputs from the modulator, respectively.
-
(c)
We represent the weight w using the transmission level of the phase-change material or the EAM by setting \(T=w\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right)+\frac{{T}_{\max }+{T}_{\min }}{2}\), in which Tmax and Tmin are the maximum and minimum transmission levels of the weight-setting device, respectively.
-
(d)
We set the input vector x to the target input data and set the kernel w to the target weights. The measured output is:
$${\sum }_{i}{P}_{i}\times {T}_{i}={\sum }_{i}\left[({P}_{\max }-{P}_{\min })\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right){x}_{i}{w}_{i}+({P}_{\max }-{P}_{\min })\frac{{T}_{\max }+{T}_{\min }}{2}{x}_{i}+{P}_{\min }\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right){w}_{i}+{P}_{\min }\frac{{T}_{\max }+{T}_{\min }}{2}\right]$$
(1)
Step (d) should be performed for every input vector x.
-
(e)
We set all x = 0 and all w = 0. Thus all P = Pmin and all \(T=\frac{{T}_{\max }+{T}_{\min }}{2}\). The measured output is:
$${\sum }_{i}{P}_{\min }\frac{{T}_{\max }+{T}_{\min }}{2}$$
(2)
Step (e) only needs to be performed once for the whole system.
-
(f)
We set all x = 0 and set w to the target weights. Thus all P = Pmin and \({T}_{i}={w}_{i}\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right)+\frac{{T}_{\max }+{T}_{\min }}{2}\). The measured output is:
$${\sum }_{i}\left[{P}_{\min }\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right){w}_{i}+{P}_{\min }\frac{{T}_{\max }+{T}_{\min }}{2}\right]$$
(3)
Step (f) needs to be performed once for each kernel.
-
(g)
We set x to the target input data and set all w = 0. Thus Pi = xi(Pmax − Pmin) + Pmin and all \(T=\frac{{T}_{\max }+{T}_{\min }}{2}\). The measured output is:
$${\sum }_{i}\left[\left({P}_{\max }-{P}_{\min }\right)\frac{{T}_{\max }+{T}_{\min }}{2}{x}_{i}+{P}_{\min }\frac{{T}_{\max }+{T}_{\min }}{2}\right]$$
(4)
Step (g) should be performed for every input vector x.
-
(h)
We perform post-processing on a computer using the measured output from steps (d)–(g) as:
$${\rm{Result}}=\left(1\right)-\left(3\right)-\left(4\right)+\left(2\right)=\left({P}_{\max }-{P}_{\min }\right)\left(\frac{{T}_{\max }-{T}_{\min }}{2}\right){\sum }_{i}{x}_{i}{w}_{i}$$
(5)
-
(i)
We normalize the results to [−1, 1] using software because all results share the same factor of \({(P}_{\max }-{P}_{\min })(\frac{{T}_{\max }-{T}_{\min }}{2})\) and x ∈ [0, 1] and w ∈ [−1, 1].
We can see that the hardware computation is doubled using this mapping approach, yet this mapping approach can be implemented without doubling by hardware implementation involving a balanced photodetection scheme (Supplementary Text 2).
Generation, convolution and output of gait signals
The properties of the original gait-signal data collected by force sensors (Ultraflex Computer Dyno Graphy, Infotronic) are described in the next section ‘CNN model; Gait-signal dataset’.
For parallel convolution of the middle three time-domain data of two gait signals, the input matrix is a d3×2 matrix: \(X=\left[\begin{array}{cc}{x}_{11} & {x}_{12}\\ {x}_{21} & {x}_{22}\\ {x}_{31} & {x}_{32}\end{array}\right]\). The jth column of X contains the middle three time-domain data of the jth gait signal (Fig. 4). The ith row of X contains the ith time-domain data of two gait signals. A DSP drove VOAs to load gait signals into the optical domain. The photonic memory tensor core was then effectively performing:
$$\begin{array}{c}{Y=W\times X=\left[\begin{array}{ccc}{w}_{11} & {w}_{12} & {w}_{13}\\ {w}_{21} & {w}_{22} & {w}_{23}\\ {w}_{31} & {w}_{32} & {w}_{33}\end{array}\right]}^{{\rm{T}}}\left[\begin{array}{cc}{x}_{11} & {x}_{12}\\ {x}_{21} & {x}_{22}\\ {x}_{31} & {x}_{32}\end{array}\right]\\ \,\,\,\,=\,\left[\begin{array}{cc}\mathop{\sum }\limits_{n=1}^{3}{{w}_{n1}x}_{n1} & \mathop{\sum }\limits_{n=1}^{3}{{w}_{n1}x}_{n2}\\ \mathop{\sum }\limits_{n=1}^{3}{{w}_{n2}x}_{n1} & \mathop{\sum }\limits_{n=1}^{3}{{w}_{n2}x}_{n2}\\ \mathop{\sum }\limits_{n=1}^{3}{{w}_{n3}x}_{n1} & \mathop{\sum }\limits_{n=1}^{3}{{w}_{n3}x}_{n3}\end{array}\right]=\left[\begin{array}{cc}{y}_{11} & {y}_{12}\\ {y}_{21} & {y}_{22}\\ {y}_{31} & {y}_{32}\end{array}\right]\end{array}$$
in which \({y}_{{ij}}={\sum }_{n=1}^{3}{{w}_{{ni}}x}_{{nj}}\) represents the convolution result of the middle three time-domain data of the jth gait signal using the ith kernel. Each row of Y was output from the respective photonic memory tensor core output channel.
CNN model
Gait-signal dataset
Gait signals from ten patients with Parkinson’s disease were taken from the ‘Gait in Parkinson’s Disease’ database in PhysioNet51,52. This database includes the vertical ground reaction force records of individuals as they walked at their usual, self-selected pace for approximately 2 min on level ground. The corresponding clinical information of ten patients is provided in Supplementary Table 1. Fifty gait pulses were extracted from each patient, leading to a total of 500 gait pulses. Each pulse has a 1.2-s duration. The original electrocardiogram signals have a 0.01-s time resolution. Gait pulses were extracted with a time interval of 0.04 s (that is, one out of every four original data), leading to 31 data in the extracted gait pulses. The 0.04-s time interval was carefully chosen to minimize the extracted dataset while maintaining the key features from the original gait pulses. Eighty per cent of pulses were used for training and 20% were used for testing, that is, a total of 400 pulses for training and 100 pulses for testing.
MNIST dataset
The test dataset of MNIST handwritten digits and MNIST fashion products were respectively taken from https://git-disl.github.io/GTDLBench/datasets/mnist_datasets/ and https://developer.ibm.com/exchanges/data/all/fashion-mnist/. In both cases, the 10,000 test images were split into a training set with 8,000 images and a testing set with 2,000 images.
CNN architecture
The CNN architecture for the classification of the gaits dataset is shown in Fig. 4d. The input layer takes the gait signal, which is in the form of a d31×1 1D array. The 1D array is passed to a convolution layer consisting of three d1×3 kernels. Convolution operations were implemented with a stride of 1 and ‘valid padding’, resulting in a d3×(31-3+1) output. The output was activated by a rectified linear unit layer and flattened to a d87×1 vector. The flattened activated output was then fed to a fully connected layer with ten neurons. The output from the fully connected layer was converted to probabilities by a softmax layer. Finally, the classification result was obtained. The gait signals were classified into ten categories, representing ten patients with Parkinson’s disease. The convolution operations were implemented using the photonic memory tensor core. The convolution results were processed by the following CNN layers using the MATLAB R2021b Deep Learning Toolbox. Weights of the fully connected layer were trained by the Adam optimizer. A hundred epochs were used to reach the final CNN outcomes. The CNN architecture for the MNIST datasets is similar to that for the gaits dataset, as shown in Fig. 5d. We will only mention the key differences here. For the MNIST datasets, besides the trivial difference in layer dimensions, the images were convolved with ‘same padding’ implemented by the photonic EAM tensor core. We used 50 epochs to reach the final CNN outcomes.