Microsoft’s experiment with liquid cooling in a data center is a big deal

Drives production workloads. But a server cabinet immersed in engineering fluid inside Microsoft’s Quincy data center in Washington state is still a bit of a science project, similar in purpose to Project Natick, a hermetically sealed computer-filled capsule that the company’s researchers worked on the ocean floor off the Orkney coast. in Scotland.

Like Natick, running real production software on dozens of servers inside a low-boiling liquid bath in Quincy is a way to answer the initial set of basic questions before expanding to test the impact of design on reliability.

This phase is intended to test basic functionality and operability, Christian Belady, vice president of Microsoft’s Advanced Data Development Centers group, told DCK. Does immersion cooling affect server performance in any way? How easily can a data center technician adapt to working with submersible servers? These are the kinds of questions his team wants to answer at this stage.

Gene Twedt for Microsoft

Ioannis Manousakis, chief software engineer at Azure, removes the server blade from a two-phase immersed cooling tank in a Microsoft data center.

It’s just one rack, a much smaller deployment than Natick’s latest experiment, but what’s at stake here is nothing less than the future trajectory of computing in scope. Chip manufacturers are no longer able to double the speed of the processor every few years without increasing its power consumption by charging more, smaller transistors to a silicon matrix of the same size. Belady and his colleagues are trying to see if they can take advantage of Moore’s Law by inserting multiple processors into a single data center.

“Moore’s infrastructure law,” he said. “How can we continue to follow the scaling of Moore’s law in its entirety? [data center] footprint? “

If you follow this space, you may be tempted to consider this development and Google’s implementation of liquid-cooled AI hardware a few years ago as part of the same trend. This is true only to a small extent. The difference in purpose reduces any similarity. Microsoft does not require liquid cooling for a subset of the most powerful computers that work with the most demanding workloads. He sees this as a way to continue to increase the capacity of its data centers to process any workload at the same rate as it was when Moore’s Law was in force.

“We no longer have the luxury of counting on a performance chip [improvements] from year to year, ”Belady said.

The technology used in the Microsoft implementation is one of several types of liquid cooling available to computer designers. In a two-phase immersion cooling system, a synthetic liquid designed to boil at a low temperature – in this case 122F or 90F lower than the boiling point of water – is converted to steam in contact with a hot processor, removing heat by turning into gas bubbles traveling to the surface, where , in contact with the cooled condenser in the tank lid, converts the gas back into a liquid that rains back down to repeat the cycle.

Belady cautiously pointed out that Microsoft was still “agnostic” towards the type of liquid cooling technology it would choose for extensive application. He and his colleagues, including Husama Alissa, chief engineer, and Brandon Rubenstein, senior director of server and infrastructure development management and engineering, began working with liquid cooling many years ago. Noticing the trends in processor design, they wanted to be sufficiently familiar with the available server cooling alternatives by the time the power consumption of an individual chip reached the limit of what air cooling technology could handle. “We’re not reaching the limits yet,” Belady said, “but we see it coming soon.”

Gene Twedt for MicrosoftIoannis Manousakis, Chief Software Engineer at Azure (left) and Husam Alissa, Chief Hardware Engineer at Microsoft’s Advanced Data Center Development Team (right).

Ioannis Manousakis, Chief Software Engineer with Azure (left) and Husam Alissa, Chief Hardware Engineer from Microsoft’s Advanced Data Center Development Team (right), inspect the inside of a two-phase submersible cooling tank in Microsoft’s data center.

If not in five, then in 10 years, we’ll see fully liquid-cooled data centers become mainstream, not a niche phenomenon seen only in the world of supercomputers and bitcoin mining, he estimates. Even if in five years all servers are available with liquid cooling, you will still have to wait a few years for the old, air-cooled, to age.

Alissa and Rubenstein presented the results of their experiments with multiple liquid cooling technologies at the 2019 OCP Summit, the annual hardware and infrastructure design conference of the Open Compute Hardware and Infrastructure Design Project in San Jose. Their presentation included two-phase immersion, single-phase immersion (where hydrocarbon fluid circulates between the hardware and the heat exchanger) and a cold plate (where the traditional motherboard heatsink is replaced by a flat rectangular part of thermal conductivity metal containing tiny tubes that enter and exit the coolant). refrigeration distribution unit shared by all servers in the rack).

They found a lot they liked about both immersion and cold plate design, Belady said. Both allow you to run the server much warmer than it cools the air, and both allow you to get rid of server fans. One area where immersion really wins is the range of computer muscle thickening that makes it possible. “It allows us to really thicken,” he said. “More circles per volume.”

But, “we’re kind of agnostic still in the direction and we see a future where the two will exist together.” The data center-level infrastructure that supports all of this would be the same. The important thing here is that instead of struggling to squeeze every last drop in efficiency out of air-based cooling – a struggle that has now crossed the threshold of reducing returns, Belady admitted – computer designers are just beginning to exploit the cooling capacity of liquids.

What is the PUE of the tank, we asked? “Oh, it’s close to 1,” Belady replied.