Low-voltage Systolic Array for Energy Efficient Neural Acceleration
Method to operate a systolic array and its associated memory at reduced voltages, efficiently and reliably
Following the surge of Deep Neural Networks, hardware A.I. accelerators gain popularity, stimulating a significant growth in VLSI market. Millions of newer mobile phones, incorporate a “neural accelerator” within their SoC to process neural network application in a timely manner with limited power supplies available on the gadgets. As the size of the networks increase and their popularity rises, higher computational demands are expected from the SoC. By operating the processors at Near-Threshold Voltages, up to 10x improvement in energy efficiency of computing is possible. However, high costs of Near-Threshold Computing (NTC) designs discourage their adoption. A solution is desired that enable NTC without penalizing the performance, sacrificing the reliability of the computations, or introducing excessive overheads.
The heart of Tensor Processor Unit (TPU) architectures is a relatively large systolic array dedicated to accelerate convolutional and/or matrix operations. The main advantage of systolic array is minimization of data movement and high parallelization. The systolic processor and the associated memory and peripheral components comprise large die-area of, e.g., Google TPU, and are the dominant power sinks.
This solution makes it possible to operate a systolic array and its associated memory at reduced voltages, efficiently and reliably. this solution is basically a modification to systolic array that enable detection of computational errors within the array, while the data is pushed out. The overhead from our method is negligible, i.e., less that ,e.g., for Google’s TPU it is only 3% gate-level increment. However, it guarantees 100%reliability. The architecture was used to enable Near-Threshold Voltage operation that provide up to 10x higher energy efficiency. The introduced error checking does not affect the original data flow of the array nor adds any extra memory roundtrips.
Researchers have demonstrated the method using FPGA implementation and an also by using SPICE model. As in both demonstrations we showed no extra hardware or software, or gate and device level modifications is needed. The modification to the systolic array is compatible with legacy systolic array designs. A use case for low-power neural networks acceleration was also developed and demonstrated.
Researchers at Center for Machine Vision and Signal Analysis group of University of Oulu have focused on finding solutions for enabling very low-voltage operation of high throughput computational devices, mostly employed in vision and telecom applications. The outcome of their research is a new architecture that with lowest design cost can enable “reliable” and “high-performance” computing at Near Threshold Voltage of transistor. This technology enables NTC design without any circuit-level or gate-level modifications and with minimal intervention in architectural level. The method is already demonstrated by the researchers to achieve extremely low voltage and 100%reliable computing for Neural Networks and matrix operation acceleration.
- Reduce energy efficiency by up to 10x
- Low cost in design and verification
- No need for special circuit or gate-level intervention
- Low overhead in design time and design power
- Compatible with common accelerators in the market
M. Safarpour, R. Inanlou and O. Silvén, “Algorithm Level Error Detection in Low Voltage Systolic Array,” in IEEE Transactions on Circuits and SystemsII: Express Briefs, vol. 69, no. 2, pp. 569-573, Feb. 2022, doi:10.1109/TCSII.2021.3094923.
M. Safarpour, L. Xun, G. V. Merrett and O. Silvén, “A High-Level Approach for Energy Efficiency Improvement of FPGAs by Voltage Trimming,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 10, pp. 3548-3552, Oct. 2022, doi: 10.1109/TCAD.2021.3127153.
Stage of Development
- Technology Readiness Level (TRL): 3
- Real world demonstration using an Xilinx FPGA
- Detailed circuit level analysis using SPICE model
- Patented and scientifically proven method
- Up to ~10x high energy efficiency
- Low cost of integration in terms of design and validation
- Minimal overheads in terms of power consumption and performance penalty
- Proven relaiblity (close 100%)
- Lowers thermal profile of the chip
- Longer battery life