Google has developed its second-generation tensor processor unit (TPU), a 45-teraflops chip for machine learning and artificial intelligence, and the company is bringing it to the cloud.
Typically, this work is done using commercially available GPUs, often from Nvidia - Facebook uses Nvidia GPUs as part of its Big Basin AI training servers. Inference performance of the new Cloud TPU has yet to be shared by Google. While its first generation chip was only suitable for inferencing, and therefore didn't pose much of a threat to Nvidia's dominance in machine learning, the new version is equally at home with both the training and running of AI systems.
A year ago, Google revealed the custom chip that it built in secret to accelerate machine learning tasks in its data centers.
According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia's latest Tesla V100 accelerator for FP16 half-precision computation.More news: HTC Announces New Standalone Vive VR Headset with Google Daydream Support
People outside Google will be able to rent out virtual machines (VMs) with acceleration from the second-generation TPUs. "While one eighth of a TPU pod can do the job in an afternoon".
Machine learning, as the bedrock of modern AI research, effectively means feeding an algorithm hundreds of thousands of examples to allow to learn to perform a task in a way that it was never expressly programmed to do. The effect is easily visible across Google's wide portfolio of products including Google Translate, Photos and its Go Champion program AlphaGo. They will initially be made available on the Google Compute Engine and will be fully integrated with the Google Cloud's storage, networking, and data analytics technologies. As part of this initiative, Google is creating what it calls the TensorFlow Research Cloud, which is a cluster of 1,000 cloud TPUs that it will make available to top researchers at no cost. I think the TPU is more specialized, so there are certain kinds of workloads, not necessarily machine learning ones but other kinds of workloads that run well on GPUs that don't necessarily run on a TPU.
"If we can get the time for each experiment down from weeks to days or hours, this improves the ability for everybody doing machine learning to iterate more quickly and do more experiments", Dean says. "You can run inference on a single chip, but for training, you need to think more holistically".
"Researchers given access to these free computational resources must be willing to openly publish the results of their research and perhaps even open source the code associated with that research", Dean said.