- Soumith Chintala is a creator and the lead of Facebook’s core machine learning tool, PyTorch.
- He says the extreme popularity of today’s dominant AI technique may have unintended consequences.
- Increasingly specialized hardware for Transformers may make it harder for new techniques to catch on.
Some of today’s most popular emerging AI tools, such as OpenAI’s text generation tool GPT-3, were made possible by an AI technique called Transformers.
Notably, Transformers first hit the scene in 2017, soon finding a home in popular AI programming frameworks TensorFlow (backed by Google) and PyTorch (started at Facebook). And in an industry where machine learning tools and techniques are evolving at a blistering pace, a half-decade might as well be half a century.
The enduring popularity of the Transformers model may prove to be a double-edged sword, warns Soumith Chintala, a creator of PyTorch and a distinguished engineer at Meta, Facebook’s parent company.
“I hope something else shows up,” Chintala said of Transformers in an interview with Insider. “We’re in this weird hardware lottery. Transformers emerged five years ago, and another big thing has yet to come up. So it may be that companies think ‘we should just optimize hardware for transformers.’ That results then in going any other direction being much harder.”
AI-specific hardware is big business
Chintala spoke as part of a broader announcement that Facebook would be moving PyTorch to the independent PyTorch Foundation, under the umbrella of the open source consortium The Linux Foundation. Chintala said the technical model of PyTorch is not changing as part of the move.
Transformers-based approaches in natural language processing first emerged in 2017 from the seminal research paper “Attention Is All You Need.” Since then, it’s gone on to become the foundation for powerful new natural language processing technologies like those generating images from text prompts.
Custom hardware for artificial intelligence in parallel exploded in popularity. Nvidia has traditionally held a dominant position thanks to the widespread adoption of GPUs in machine learning. But more custom and specialized pieces of technology, such as Google’s tensor processing unit or the Wafer Scale engine from Cerebras, have gained adoption in more complex machine learning tools.
Nvidia also now has an architecture called Hopper that specializes in Transformers, which Nvidia CEO Jensen Huang said on the company’s most recent earnings call would be a big part of its strategy. (Though, to be sure, Nvidia is a massive company with a wide portfolio of products that go well beyond Hopper.)
“I fully expect Hopper to be the next springboard for future growth,” he said on the earnings call. “And the importance of this new model, Transformers, can’t possibly be understated and can’t be overstated.”
Transformers has made a lot of modern AI possible
However, there are still high-profile emerging products that are based on Transformers-based techniques. OpenAI’s GPT-3 is roughly two years oldwhile the company only began opening broader access to DALL-E 2 in July.
And companies like Nvidia, while launching products that may be specialized for Transformers, offer a wide array of products to fit multiple different models—and may indeed have one ready to go for whatever new techniques emerge.
Still the increased specialization of hardware—in AI, or otherwise—runs the risk of locking in to modern use cases, rather than enabling emerging ones.
“It’s gonna be much harder for us to even try other ideas if hardware vendors end up making the accelerators more specialized to the current paradigm,” Chintala said.
Chintala also said he “rejects the notion” that PyTorch was overtaking the Google-backed TensorFlow in popularity, which has become the prevailing wisdom among influential figures in the AI industry.
“We don’t think we’re eating TensorFlow’s lunch,” he said. “We target some areas well, and TensorFlow targets other areas well. I honestly genuinely believe that we are doing different things and we are good at different parts of the in-market. If you look at the research community, we have a good market share, but that’s not true in other parts.”
Insider previously reported that JAX was increasingly becoming Alphabet subsidiary Google’s core deep learning technology, and is expected to become the backbone of its products in lieu of TensorFlow. JAX excels at splitting complex machine learning tasks across multiple pieces of hardware, drastically simplifying the unwieldy existing tools and making it easier to manage increasingly large machine learning problems.
“We’re learning from JAX, we’re adding coverage of those things into PyTorch as well,” he said. “Clearly, JAX does certain things better. I don’t have a problem with saying that. Pytorch is really good at a bunch of things, that’s why it’s mainstream, people use it for everything under the sun. But being such a mainstream framework doesn’t mean it covers everything.”