As appeared on

January 28, 2019 | By: Tony Peng

In his 1988 IEEE paper Cellular Neural Networks: Theory, UC Berkeley PhD student Lin Yang proposed Cellular Neural Network theory, a predecessor of the Convolutional Neural Networks (CNN) that would later revolutionize machine learning. Based on this theory, Yang blueprinted a 20*20 parallel simulated circuit chip in the university lab.

Since that time the world has witnessed an unprecedented growth of deep learning techniques represented by CNN. The laboratory-tested computing chip Yang designed 31 years ago has also evolved into an industry-grade artificial intelligence accelerator.

In a 2016 report, the International Data Corporation predicted that “by 2019, 45 percent of data will be stored, analyzed, and acted on at the edge.” The promising AI chip market has attracted small but innovative startups hoping to revolutionize how chips are designed. The market’s leading startups such as UK AI chip maker Graphcore and China’s AI chip startup Cambricon each raised more than US$100 million last year.

In 2017 Yang — then a tenured professor at the elite Tsinghua University — believed it was time to put his theory into practice. He partnered with Qi Dong to found Gyrfalcon Technology Inc (GTI) in Silicon Valley. The startup’s goal was simple: ramp up the performance of AI applications with a set of dedicated processors. GTI has rolled out three flagship products — Lightspeeur 2801S, Lightspeeur 2802M, Lightspeeur 2803S — designed for deployment scenarios ranging from edge devices to cloud data centres.

Lightspeeur 2801S is an 28nm edge-based Application-Specific Integrated Circuit (ASIC) that contains about 28,000 parallel computing cores and needs no external memory for AI inference. High efficiency is its selling point: 2801S features 2.8 tera operations per second (TOPS) at 300mW and 9.3TOPS/W. The chip helped GTI secure big-name customers, including Fujitsu, LG and Samsung.

Lightspeeur 2801S

GTI’s latest partnership was unveiled at the recent CES 2019. Japanese IoT company Mtes Neural Network announced they will embed GTI chips into their smart street lights, which can effectively detect abnormal events such as an elderly person wandering alone or suspicious persons.

GTI has also packed Lightspeeur 2801S into the Laceli AI Compute Stick, a USB stick that boosts compute power when connected to laptops. Laceli is billed as 90 times more efficient than its rival, the Movidius Neural Compute Stick that Intel introduced in 2017 (0.1 TOPS/W).

Behind 2801S’s superior efficiency is GTI’s in-memory processing architecture. A technique known as APiM (AI processing in memory) significantly reduces costs associated with data exchange between storage and memory. The 2801S embeds a 9MB static random access memory (SRAM) on computing cores.

“We can just preload the network model and data as well as the activation unit onto the chip at one time. Frequent data exchange is unnecessary. We can save a lot of power consumption and that is why the performance is so great,” says Yang, who is also GTI’s Chief Scientist.

Yang at GTI’s CES press conference

Another factor contributing to 2801S’s strong performance is that the chip is built on CNN architecture and can only process convolution operations. This was a bold design choice, as 2801S compromises flexibility in exchange for its higher performance in CNN-based applications. Sats Yang, “CNN is the foundation of all AI applications today. We discovered that ResNet and MobileNet are still the dominant network models on the market, and what we need to do is to support the most mainstream models.”

Following Lightspeeur 2801S, GTI pushed ahead with in-memory processing by integrating Magnetic RAM (MRAM) in its next-generation 22nm chip Lightspeeur 2802M. With 40MB embedded memory, 2802M can deal with multiple neural network models or more complicated applications that are beyond 2801S’s ability.

MRAM is a type of memory technology that uses electron spin to store data While SRAM is still today’s dominant solution for on-chip memory, MRAM promises several key advantages, including non-volatility (not losing data without power), lower power consumption, and better density. MRAM development proceeded for decades but did not live up to expectations. Now, with the increasing need for on-chip memory on AI chips, MRAM is emerging as an attractive alternative.

GTI’s latest innovation is the 28nm Lightspeeur 2803S. A single 2803S chip can deliver 16.8 TOPS at 0.7W and support a PCIe interface. GTI has also provided data center operators with G.A.I.N. Series 2803, a multiple-chip board server integrated with 2803S chips, which can be added to existing racks. A 2803S-based 16-chip server can deliver 271 TOPS at 28W, which is 10 times more power efficient than NVIDIA’s Tesla (65 TOPS at 70 W).

While GTI has not developed its own programming software stack, it provides a set of development resources known as “DevKit” to help companies get started with new GTI chips on existing devices such as smartphones, computers or industrial equipment. The kit includes self-guiding online resources, a USB 3.0 dongle, and a small wireless WiFi accessory.

Chips will play a fundamental role in the AI-empowered future society that GTI envisions, where a garage opener or a baby monitor does not need a supercomputer to enable its AI abilities because a centimeter-sized chip can do the job. The world may not have been ready for such a revolutionary transformation back in 1988, but it is now.