The AI Chip Arms Race: Weighing the Complex Trade-offs Ahead
SAN JOSE, Calif. — Is your company GPU-rich or GPU-poor? Most likely, your company is GPU-poor. But what does that mean, anyway? Translated: Nvidia controls most of all the GPU chips that go to the world’s biggest customers. The rest of us are GPU-poor.
So, make more of these speedy chips, right? It’s more complex. There’s a scarcity of advanced AI chips, but in no way can enough be made soon. It’s all about the supply chain. It takes years to build a foundry and get it into production.
But there’s another problem, too. There’s a paradox in the chip design and data center worlds. We can make more chips, but at some point, the water and electricity demands will become the focal issues. What matters? The ecosystems that sustain us or the foundries and data centers that feed the GPU rich and their customers?
Some communities are taking action. For example, Chile partially reversed a Google data center permit.
So development or data centers happens, and then local communities are facing either water or power restrictions, which is another big issue for data centers,” said Lauren Bridges, a fellow at the Berkman Klein Center for Internet and Society at Harvard University on The World radio program.
“And I think it’s going to leave policymakers with several problems on their hands but at this point, policymakers are tending to take a reactive rather than a proactive approach,” Bridges said.
“There have been some cases where local markets have put what we call a moratorium on data center development. So they’ve put a temporary pause on data center development and this is because either they’re consuming too much power or too much water. We’ve seen that in several places around the world, such as Hong Kong and in Frankfurt, and in the Netherlands, for example.”
The Intel View
Intel CEO Pat Gelsinger is about as gleeful as they get when he talks about making processes for chips. The GPU rich have the resources to spend billions — Gelsinger can bank on that.
Intel had a big event in late February here in Silicon Valley to announce that it has committed billions to building a foundry business. It will still make chips.
But with a foundry business, Intel has a lot of customers it can call on. Gelsinger is glad that his company will be a packager, using its manufacturing capabilities to make processes that it sells to companies like Microsoft.
Billed as one of the featured speakers at the Intel Direct Connect event, Microsoft CEO Satya Nadella appeared in a video recording to say Microsoft would build a chip design on the Intel 18A process, the new 1.8 nm process that rivals the process technology made by TSMC.
“To achieve this vision we will need a reliable supply of the most advanced high-performance and high-quality semiconductors,” Nadella said. “And all of us at Microsoft are committed to supporting Intel’s efforts to build a strong supply chain right here in the United States.
“That’s why we’re so excited to work with Intel foundries services and why we have chosen a chip design that we plan to produce on Intel’s 18A process.”
That mention of the United States — we’ll get back to that.
What’s Ahead: High Demand for Resources
Nadella is arguably one of the kings of the GPU rich. Microsoft, Amazon Web Services, and Google are now making their chips. Microsoft has already deployed its Maia 100 AI Accelerator and the Cobalt 100 CPU. According to reports, Microsoft will use the 18A in future updates.
The Maia chip hints at what we can expect from OpenAI CEO Sam Altman, who spoke (in person) at Direct Connect. Gelsinger had a laugh with Altman about the $7 trillion that the Wall Street Journal said is what Altman wants to raise to reshape the chip business.
Altman pays attention to the supply chain and hopes that reshaping it will help meet demand. According to Agam Shah, writing for The New Stack, hardware is hot these days as it can’t keep up with software advances.
Altman’s appearance at Direct Connect shows how OpenAI could fit into the mix with Intel. Microsoft’s first custom chip, Maia, is meant in part for OpenAI and the generative AI technology it has become known for in the past 18 months. Microsoft built the Cobalt chip with help from ARM, whose CEO, Rene Haas, also appeared on stage at Direct Connect.
As for Altman — the tight fit with Microsoft and now Intel tells a bit about their intersecting interests: generative AI built by OpenAI, Azure Cloud for generative AI and cloud workloads, partnerships with ARM, and foundries operated by Intel.
The water and electricity needed will be way too much to fathom if Altman, Gelsinger and Nadella realize their visions of foundry buildouts and data centers running AI chips. Gelsinger says Intel will build the biggest AI systems in the world — using a lot of energy.
When asked by The New Stack about the unimaginable resources needed to make the processes for these advanced chips,
“We’re just way ahead of everybody else in the industry,” Gelsinger said. “I encourage you go look at it (Intel’s Climate Transition Action Plan) and if you find deficiencies there I want to know, right, because this is an area of great strength, you know, for us now for many decades, right, chemicals used, etc, etc.
“Moore’s Law was always faster, less power and lower cost. And driving down the power per unit computing, right? It’s a big piece that we see [on] a roadmap that we’re driving.”
Balancing Progress and Responsibility
So there you go. But what gets Gelsinger’s energy levels going?
Winning.
He wants Intel to climb to No. 2 in the foundry market, behind TSMC. Again, more power, more water.
“We want to help build Nvidia chips and AMD chips and TPU chips for Google and inferential chips for Amazon, period,” Gelsinger said at the conference in response to our question about the challenge of building AI chips with the cost on the environment.
“We want to help them and give them the most power performance efficient technologies for them to develop their systems.”
What is required to run high-performing AI chips like the ones that will get built on the 18A? Well, think about the accelerators that now run in data centers.
A data center rack may consume 7 kW to run enterprise applications, according to a Network World story based on data from AFCOM, a data center industry organization. And data center power consumption has grown only modestly in the last few years.
However, AI applications are different beasts. A single eight-GPU Nvidia H100 node is rated at 10.2 kilowatts. Scale that up to the 350,000 GPUs Meta claims it’s going to deploy this year and you’re looking at a considerable amount of power.”
And here’s what happens to data centers when they are filled with chips to run AI and machine learning services: Their energy usage increases up to five times more than the energy usage of all the data centers currently running worldwide.
Then, consider the ramifications of widely used services such as Google Search, as noted in a story published by Data Center Dynamics. Apply AI to Google Search, and energy use will increase to unprecedented amounts.
Scholars such as Alex de Vries maintain that by 2027, data centers will consume as much energy as a small country such as The Netherlands or Sweden. But it gets unwieldy when the requirement for inference outpaces what is consumed to train AI systems.
De Vries observes that applying AI to Google’s search could, in a worst-case scenario, boost its energy use to around that of Ireland, or 29TWh per year. But he notes that the sheer cost of doing this could discourage Google from implementing it.
Energy usage will escalate in the coming years, as reported in a post from Device42, which provides asset management services to data center operators and providers. Data centers use 240 to 340 terawatts of energy. However, with “generative AI and other processing-intensive workloads, data center energy consumption is slated to rise to 2967 terawatts by 2030.”
The game is on. Data center operators know they have to reduce emissions.
More energy-conscious approaches to foundries and data center buildouts will continue. They have to — Gelsinger made it personal in his opening remarks.
“You know, one time my daughter-in-law was given me a bad time: ‘Hey, you’re sort of proud of all the great advances that you drove, huh?'” Gelsinger said. “And then she said, ‘What if you have to bankrupt my planet to do so?’
“I think all of us are left with that same question of, are we going to be responsibly turning over our planet to the next generation and in the face of things like this explosion of AI capabilities? It isn’t just doing good things. It’s doing the right things for tomorrow as well.”
‘Bringing Back the Silicon to Silicon Valley’
This leads to many questions, not just for technology executives. World leaders and lawmakers must get a grip on matters.
But that may be the most challenging dilemma of all. Gelsinger and industry colleagues are making alliances with the political establishment like never before. Chip foundries are geopolitical hotspots. However, governments still want to build chip foundries, as illustrated by Gelsinger and the U.S. Secretary of Commerce Gina M. Raimondo, who spoke in a video feed at Direct Connect.
She made her point with an element of political theater (think China and calls for jobs). Note: Raimondo did not say anything about water scarcity or the electricity demands of chip development.
“The fact that we are so overly dependent on a couple of countries in Asia, to access semiconductor chips that we need for life-saving medical equipment, cars, every piece of technology — showed us we gotta get to work,” Raimondo said. “We need to get back to work making more chips in America.
“Our generation, this is our moment, and it’s making more chips in America, bringing back the silicon to Silicon Valley, if you will. But it isn’t just that. It’s then getting a massive flywheel going in the United States of America from research and development, all the way through to advanced packaging, and everything in between.”
So, then, what? Who will get the water, how will the electricity be enough? Then there are all the questions about demand. Who will get these chips made in the USA?
Right now, it’s just waiting for policy to emerge.
There you go. Welcome to the new world of the rich and poor.