We tend to think of AI as a monolithic entity, but it’s actually developed along multiple branches. One of the main branches involves performing traditional calculations but feeding the results into another layer that takes input from multiple calculations and weighs them before performing its calculations and forwarding those on. Another branch involves mimicking the behavior of traditional neurons: many small units communicating in burst of activity called spikes, and keeping track of the history of past activity.
Each of these, in turn, has different branches based on the structure of its layers and communications networks, types of calculations performed, and so on. Rather than being able to act in a manner we’d recognize as intelligent, many of these are very good at specialized problems, like pattern recognition or playing poker. And processors that are meant to accelerate the performance of the software can typically only improve a subset of them.
That last division may have come to an end with the development of Tianjic by a large team of researchers primarily based in China. Tianjic is engineered so that its individual processing units can switch from spiking communications back to binary and perform a large range of calculations, in almost all cases faster and more efficiently than a GPU can. To demonstrate the chip’s abilities, the researchers threw together a self-driving bicycle that ran three different AI algorithms on a single chip simultaneously.
Divided in two
While there are many types of AI software, the key division identified by the researchers is between what can be termed layered calculations and spiking communications. The former (which includes things like convolutional neural networks and deep-learning algorithms) use the layers of calculating units, which feed the results of their calculations into the next layer using standard binary data. Each of these units has to keep track of which other units it communicates with and how much weight to give each of its inputs.
On the other side of the divide are approaches inspired more directly by biology. These communicate in analog “spikes” of activity, rather than data. Individual units have to keep track of not only their present state, but their past history. That’s because their probability of sending a spike depends on how often they’ve received spikes in the past. They’re also arranged in large networks, but they don’t necessarily have a clean layered structure or perform the same sort of detailed computations within any unit.
Both of these approaches have benefitted from dedicated hardware, which tends to be at least as good as implementing the software on GPUs and far more energy efficient. (One example of this is IBM’s TrueNorth processor.) But the vast difference in communications and calculations between the classes has meant that a processor is only good for one or the other type.
That’s what the Tianjic team has changed with what it’s calling the FCore architecture. FCore is designed so that the two different classes of AI can either be represented by a common underlying compute architecture or easily reconfigured on the fly to handle one or the other.
To enable communications among its compute units, FCore uses the native language of traditional neural networks: binary. But FCore is also able to output spikes in a binary format, allowing it to communicate in terms that a neuron-based algorithm can understand. Local memory at each processing unit can be used either for tracking the history of spikes or as a buffer for input and output data. Some of the calculation hardware needed for neural networks is shut down and bypassed when in artificial neuron mode.
In the chip
With these and a few additional features implemented, each individual compute unit in an FCore can be switched in between the two modes, performing either type of calculation and communication as needed. More critically, a single unit can be set into a sort of hybrid mode. That means taking input from one type of AI algorithm but formatting its output so that it’s understood by another—reading spikes and outputting data, or the opposite. That also means any unit on the chip can act as a translator between two types of algorithms, allowing them to communicate with each other when they’re run on the same chip.
The FCore architecture was also designed to scale. The map of connections among its compute units is held in a bit of memory that’s separate from the compute units themselves, and it’s made large enough to allow connections to be made external to an individual chip. Thus, a single neural network could potentially be spread across multiple cores in a processor or even multiple processors.
In fact, the Tijanic chip is made up of multiple FCores (156 of them) arranged in a 2D mesh. In total, there are about 40,000 individual comput units on the chip, which implies an individual FCore has 256 of them. It’s fabricated on a 28 nanometer process, which is more than double the cutting edge process used by desktop and mobile chipmakers. Despite that, it can shift over 600GB/second internally and perform nearly 1.3 Tera-ops per second when run at 300MHz.
Despite the low clockspeed, the Tianjic put up some impressive numbers when run against the same algorithms implemented on an NVIDIA Titan-Xp. Performance ranged from 1.6 times to 100 times, depending on the algorithm. And, when energy use was considered, the performance per Watt was almost comical, ranging from 12x all the way up to over 10,000x. Other dedicated AI processors have had strong performance-per-Watt, but they haven’t been able to run all the different types of algorithms demonstrated here.
Like riding a… well, you know
On its own, this would have been an interesting paper. But the research team went beyond by showing that Tianjic’s abilities could be put to use even in its experimental form. “To demonstrate the utility of building a brain-like cross-paradigm system,” the researchers write, “we designed an unmanned bicycle experiment by deploying multiple specialized networks in parallel within one Tianjic chip.”
The bike did object detection via a convolutional neural network, a continuous attractor neural network provided target tracking to allow the bike to follow a researcher around. Meanwhile, a spiking neural network allowed the bike to follow voice commands. Something called a multilayer perceptron tracked the bike’s balance. And all of these inputs were coordinated by a neural state machine based on a spiking neural network.
And it worked. While the bike wasn’t self-driving in the sense that it was ready to take someone through the bike lanes of a major city, it was certainly good enough to be a researcher’s faithful companion during a walk around a test track that included obstacles.
Overall, this is an impressive bit of work. Either the processor alone or the automated bicycle would have made a solid paper on its own. And the idea of getting a single chip to natively host two radically different software architectures was a bold one.
But there is one caution worth pointing out, in that the researchers posit this as a route to a general intelligence AI. In a lot of ways, Tianjic does resemble a brain: the brain uses a single architecture (the neuron) to host a variety of different processes that, collectively, make sense of the world and plan actions that respond to it. To an extent, the researchers are right that being able to run and integrate multiple algorithms at once is a path towards something like that.
But this is still not necessarily a route to general intelligence. In our brain, specialized regions—the algorithm equivalent—can perform a collection of poorly defined and only vaguely related activities. And a single task (like deciding where to focus our attention) takes myriad inputs. They range from our recent history to our emotional state to what we’re holding in temporary memory to biases built up through millions of years of evolution. So just being able to run multiple algorithms is still a long way off from anything we’d recognize as intelligence.