Applications of Silicon Photonics
Silicon photonics for neuromorphic computing and machine learning
Background
Artificial intelligence (AI) has always captured our imagination as it has the potential to change almost every aspect of our lives through new medical treatments, new assistive robots, intelligent modes of transportation, and more. Inspired by the human brain and spurred by the advances in deep learning, the past several years have seen a renaissance in AI. Machine learning (ML) based on deep neural networks have demonstrated, in some cases, super-human performance in several complex tasks [Rajendra 2018]. The rise in ML over the last decade can be attributed to: 1) algorithmic innovations; 2) the Internet: an inexhaustible source of millions of training examples; and 3) new hardware: specifically, graphics processing units (GPUs).
IBM [Esser 2016], HP [Pickett 2013], Intel [Davies 2018], Google [Graves 2016], to name a few, have all shifted their core technological strategies from “mobile first” to “AI first”. Deep learning with artificial neural networks (ANNs) [LeCun 2015] has expanded from image recognition [Simonyan 2014] to translating languages [Wu 2016], and beating humans at highly complex strategy games like Go [Silver 2016].
At present, neural network algorithms are executed on traditional central processing unit (CPUs), GPUs and neuromorphic (brain-inspired) electronics such as IBM’s TrueNorth [Esser 2016] and neural network accelerators (matrix multipliers) such as Google’s tensor processing unit (TPU) [Graves 2016]. While digital electronic architectures have improved in both energy efficiency and computational speeds, these architectures face limits as Moore’s law is slowing down [Waldrop 2016]. Furthermore, moving data electronically on metal wires has fundamental bandwidth and energy efficiency limitations [Miller 2009], thus remaining a critical challenge facing neural network hardware accelerators [Chen 2017].
Photonic processors
Photonic processors can outperform electronic systems that fundamentally depend on interconnects. Silicon photonic waveguides bus data at the speed of light. The associated energy costs for on-chip optical transmission are currently on the order of femtojoules per bit [Timurdogan 2014] and, in the near future, expected to reduce further [Sorger 2018]. Aggregate bandwidths continue to increase by combining multiple wavelengths of light (i.e., wavelength-division multiplexing (WDM)), theoretically topping out at 10 Tb/s per single-mode waveguides using 100 Gb/s per channel and up to 100 channels. On-chip scaling of multi-channel dense WDM (DWDM) systems may be possible with comb generators in the near future [Stern 2018].
Recently, there has been much work on photonics processors to accelerate information processing and reduce power consumption using: artificial neural networks [Tait 2017, Shen 2017, Shainline 2017, Hughes 2018, Tait 2014], spiking neural networks (SNNs) [Prucnal 2016, Peng 2018, Deng 2017, Romeira 2016, Shastri 2016, Aragoneses 2014, Nahmias 2013], and reservoir computing [Van der Sande 2017, Brunner 2013, Vandoorne 2014, Larger 2013]. By combining the high bandwidth and efficiency of photonic devices with the adaptability, parallelism and complexity attained by methods similar to those seen in the brain, analog photonic processors have the potential to be: >100 times faster than digital electronics while consuming >1000 times less energy per computation, and having >100 times higher so-called compute density, i.e., speed (operations per sec) normalized over area per operation (mm2) [Prucnal 2017, Ferreira-de-Lima 2018].
Network-compatible photonic neurons are optical-in, optical-out devices that must be able to 1) convert multiple independently weighted inputs into a single output (i.e., fan-in), 2) apply a nonlinear transfer function to the weighted sum of the inputs, and 3) produce an output capable of driving multiple other neurons, including itself (i.e., cascadability). Fundamentally there are challenges to achieve fan-in, nonlinearity, and cascadability, simultaneously in photonics [Keyes 1985]. All of the conditions of network-compatibility have yet to have been conclusively demonstrated in a single device (with the exception of the fiber laser in [Shastri 2016]), and much of this research overlooks fan-in and/or cascadability entirely.
When does it make sense to use photonic neural networks?
Electronics neural networks solve some tasks extremely efficiently (like IBM’s TrueNorth) and in some cases going fast in not warranted (e.g., face recognition). Also, we have to be cognizant that while photonics is fast, so is electronics. The size of photonic components is limited by the diffraction limit of light (limiting components to the micrometer scale) while electronics can be much smaller (tens of nanometers) with higher integration density (electronic chips have billions of components, whereas photonic ones have tens to thousands).
However, photonics can implement a multiply-accumulate (MAC) operation efficiently with a small number of components; e.g., a single MAC or weighted addition operation can be performed in photonics with a single micro-ring resonator (tunable filter for weighting a signal) coupled with a photodetector (summing) driving a modulator (nonlinearity), whereas in electronics it requires several hundred components to do the same operation. We have to be careful to make any bandwidth and energy comparisons while taking into account the area (density). Also, while electronic neural nets can have 1M neurons, with photonics we will only have a fraction of that number (maybe 1000s).
So where is photonic useful? Photonic neural nets will be useful in applications that require a relatively small number of neurons and tasks where the same computation needs to be done quickly over and over again. Ideally the applications has a scaling of the problem (number of variables) that is linear as opposed to the exponential scaling. For example, solving nonlinear optimization problems does not need many neurons but the complexity of the problem grows exponentially with the number of parameters and constraints. Also, in convolution neural networks (CNNs) with many kernels per channel, the number of convolutions operations is costly. Thanks to wavelength multiplexing, photonic can have a potential advantage of other technologies because 1) the fan-in and bandwidth tradeoff fundamentally does not exist like with electronic interconnects; and 2) the nonlinearity in photonics can be exploited to be in the tens of GHz speeds.
Applications
The specific applications of photonic-based processors include:
1) Intelligent signal processing, such as cognitive processing of the radio spectrum, in 5G networks
High-speed photonic processors are well suited for specialized applications requiring either a) real-time response times or b) fast signals. An important application is for 5G cellular signals. Next generation radio frequency receivers will use large, adaptive phased-array antennas that receive many radio signals simultaneously. This is particularly important since the wireless spectrum is becoming increasingly overcrowded and the data rates are continuing to increase in a fixed total bandwidth. Photonic neural networks could perform complex statistical operations to extract important data, including the separation of mixed signals or the classification of recognizable radio frequency signatures.
2) Deep learning acceleration (inference)
Machine learning with deep neural networks are based on convolutional neural networks (CNNs) which are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. The success of CNNs for large-scale image recognition has stimulated research in developing faster and more accurate algorithms for their use. However, CNNs are computationally intensive and therefore results in long processing latency. One of the primary bottlenecks is computing the matrix multiplication required for forward propagation. In fact, over 80% of the total processing time is spent on the convolution . Therefore, techniques that improve the efficiency of even forward-only propagation are in high demand and researched extensively. CPUs are inefficient at evaluating neural network models because they are centralized and instruction-based, whereas networks are distributed and capable of adaptation without a programmer. GPUs are more parallel, but, today, even they have been pushed to their limits . By interleaving between digital electronics and analogue photonic processing, co-integrated photonic and electronic architectures are capable of implementing efficient CNNs by accelerating the processing time and reducing the power consumption when compared to purely digital approaches.
3) Ultrafast training and online learning: Neuromorphic processors to accelerate training of neural networks and enable online learning or learning during inference.
Deep learning consists of two phases:1) training: the network learns to identify the “features” that are salient for making a classification based on known examples; and 2) inference (discussed earlier): the network makes classifications of new data. While inference has to be done in real-time (on a fast time scale), learning or training typically is done offline because it can take a long time (days, weeks, or even months) to train an artificial neural network. Training of a neural network consists of learning weights and biases of neurons, so that the neural network is able to perform task for which it is being trained, e.g., recognizing faces, classifying objects, etc. Artificial neural networks can be trained with so called backpropagation algorithms . Backpropagation is an iterative and recursive method for calculating the weights updates. It consists of taking a partial derivative of a cost function (error between expected obtained result) with respect to any weight (or bias) in the network. Backpropagation gives a detailed insight into how quickly the cost function changes when we change the weights and biases . The goal is to update the weights (at every time step or iteration) such that the cost function is minimized. This is essentially an optimization problem. Once the cost function is minimized (until it cannot be further minimized), we can infer that the neural network has been trained.
Backpropagation is an efficient method to train neural networks. However, because of the computation complexity of back propagation algorithm (which involves taking derivatives with chain rule etc.) many state-of-the-art neural networks are exceedingly expensive to train, requiring either weeks to months of training on single chips or enormous numbers of chips running in parallel. For example, AlphaGo Zero took 170,000 tensor processing unit (TPU) hours (or 110 zettaFLOPs) to train. As a consequence, access to adequate computational power to train the latest neural network architectures is limited to large institutions.
Since a backpropagation algorithm can be translated into a sequence of element-wise vector-vector multiplication and weighted addition operations based on the inference neural network’s outputs, weights, and target vectors, it can therefore, in principle, be implemented in a neuromorphic processor. Hence, neuromorphic photonic processors can potentially implement backpropagation algorithms to speed up the training time while reducing the power consumption. This could enable online learning (as opposed to current offline training methods) for ultrafast learning.
4) Nonlinear programming: Neuromorphic computing for solving nonlinear optimization problems for low-latency, ultrafast control systems.
Solving mathematical optimization problems lies at the heart of various applications that are ubiquitous in our modern life such as machine learning, resource optimization in wireless networks, and drug discovery [McMahon 2016]. Optimization problems can be linear or nonlinear. These problems are solved iteratively, often requiring many time steps to reach a desired solution; i.e., minimize some objective function of real variables subject to a series of constraints represented by equalities or inequalities. Nonlinear optimization problems are often difficult to solve, and sometimes involve exotic techniques such as genetic algorithms [Goldberg 2006] or particle swarm optimization [Kennedy 2010]. Nonlinear optimization problems, however, are nonetheless quadratic to second order around the local vicinity of the optimum. Therefore, quadratic programming (QP)—which finds the minima/maxima quadratic functions of variables subject to constraints [Lendaris 1999] – is an effective first pass at such problems, and can be applied to a wide array of applications. For example, machine learning problems, such as support vector machine (SVM) training [Scholkopf 2001] and least squares regression [Geladi 1986], can be reformulated in terms of a QP problem. Together, these applications represent some of the most effective, yet generalized tools for acquiring and processing information and using the results to control systems. However, QP is an NP-hard problem in the number of variables, which means that conventional digital computers must either be limited to solving quadratic programs of very few variables, or to applications where computation time is non-critical. In machine learning, many algorithms (such as SVM) require offline training because of the computational complexity of QP, but would be much more effective if they could be trained online. Hopfield Networks were discovered to solve quadratic optimization problems over 30 years ago [Tank 1986]. However, Hopfield quadratic optimizers are uncommon today because the all-to-all connectivity required in these networks creates an undesirable tradeoff between neuron speed and neural network size— in an electronic circuit, as the number of connections increases, the bandwidth at which the system can operate without being subject to crosstalk between connections and other issues decreases [Tait 2014]. Photonic neural networks have several advantages over their electrical counterparts. Most importantly, the connectivity concerns prevalent in electronic neurons are significantly ameliorated by using light as a communication medium [Tait 2014]. WDM allows for hundreds of high bandwidth signals (25 GHz) to flow through a single optical waveguide. Moreover, the analog computational bandwidth of a photonic neuron (as designed in Ref. [Tait 2017]) can be in the picosecond to femtosecond time scale. For a Hopfield Quadratic Optimizer, this means that a photonic implementation can simultaneously have large dimensionality and a fast convergence time to the solution.
5) Quantum computing: Cryogenic neural networks could complement quantum computers implemented using superconducting circuits for applications in qubit readout classification at fast time scales.
One of the challenges in quantum information systems is to classify microwave qubit states. These states have fast (sub-nanosecond) initial transients which limits how early the integration can be performed. If the integration time is short, there would be a significant overlap between the probability distribution functions of the one and zero qubit states because of their overlapping transients. This low signal-to-noise (SNR) ratio makes it hard to distinguish between the states and limits the classification accuracy. If the integration time is increased, the SNR of the system improves as the qubit states have had time to settle down (less overlap between their distribution functions), but there is also a decay of the one qubit states which can lead to incorrect classification of the states. Hence, there is a fundamental trade-off between the between the low-SNR (short) and decay probability (long) for classification. Since the turn-on transients are indicative of the state, machine learning techniques with photonic deep neural networks could potentially learn the patterns of these transients leading to qubit readout classification at the timescale of the transients.
Companies pursuing optical computing:
- Lightelligence: https://www.lightelligence.ai
- Lightmatter: https://lightmatter.co
- Luminous Computing: TechCrunch: Bill Gates, Neo, Gigafund backing Luminous in photonics supercomputer moonshot. https://techcrunch.com/2019/06/04/bill-gates-neo-gigafund-backing-luminous-in-photonics-supercomputer-moonshot/
- Xanadu: https://www.xanadu.ai
- PsiQuantum
References:
[Esser 2016] S. K. Esser et al. Proc. Natl. Acad. Sci. U.S.A. 113, p. 11441 (2016). [Pickett 2013] M. D. Pickett et al. Nat. Mat. 12, p. 114 (2013). [Davies 2018] M. Davies et al. IEEE Micro 38, p. 82 (2018). [Graves 2016] A. Graves et al. Nature 538, p. 471 (2016). [LeCun 2015] Y. LeCun et al. Nature 521, p. 436 (2015). [Simonyan 2014] K. Simonyan and A. Zisserman. CoRR abs/1409.1556 (2014). [Wu 2016] Y. Wu et al. CoRR abs/1609.08144 (2016). [Silver 2016] D. Silver et al. Nature 529, p. 484 (2016). [Waldrop 2016] M. M. Waldrop. Nature News 530, p. 144 (2016). [Miller 2009] D. A. B. Miller. Proc. IEEE 97, p. 1166 (2009). [Chen 2017] Y. Chen et al. IEEE J. Sol. State Circ. 52, p. 127 (2017). [Timurdogan 2014] E. Timurdogan et al. Nat. Comm. 5, p. 4008 (2014). [Sorger 2018] V. J. Sorger et al. J. Opt. 20, p. 014012 (2018). [Stern 2018] B. Stern et al. Nature 562, p. 401 (2018). [Tait 2017] A. N. Tait et al. Sci. Rep. 7, p. 7430 (2017). [Shen 2017] Y. Shen et al. Nat. Photo. 11, p. 441 (2017). [Shainline 2017] J. M. Shainline et al. Phys. Rev. Appl. 7, p. 034013 (2017). [Hughes 2018] T. W. Hughes et al. Optica 5, p. 864 (2018). [Tait 2014] A. N. Tait et al. J. Light. Tech. 32, p. 4029 (2014). [Prucnal 2016] P. R. Prucnal et al. Adv. Opt. Photon. 8, p. 228 (2016). [Peng 2018] H. T. Peng et al. IEEE J. Sel. Top. Quant. Elect. 24, p. 6101715 (2018). [Deng 2017] T. Deng et al. IEEE J. Sel. Top. Quant. Elect. 23, (2017). [Romeira 2016] B. Romeira et al. Sci. Rep. 6, p. 19510 (2016). [Shastri 2016] B. J. Shastri et al. Sci. Rep. 5, p. 19126 (2016). [Aragoneses 2014] A. Aragoneses et al. Sci. Rep. 4, p. 4696 (2014). [Nahmias 2013] M. A. Nahmias et al. IEEE J. Sel. Top. Quant. Elect.. 19, p. 1800212 (2013). [Van der Sande 2017] G. Van-der-Sande et al. Nanophotonics 6, p. 561 (2017). [Rajendra 2018] B. Rajendra et al. arXiv preprint arXiv:1901.03690 (2018). [LeCun 2015] Y. LeCun, Y. Bengio, and G. Hinton, Nature 521, 436 (2015). [Pallipuram 2012] V. K. Pallipuram, M. Bhuiyan, and M. C. Smith, J. Supercomput. 61, 673 (2012). [Li 2016] X. Li et al. 2016 45th International Conference on Parallel Processing (ICPP) (2016, Aug.) [Goodfellow 2016] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, 2016). [Diamond 2016] Diamond, T. Nowotny, and M. Schmuker, Front. Neurosci. 9, 491 (2016).
Authors
Bhavin Shastri, Lukas Chrostowski, Sudip Shekhar