The Automated, Self-Contained, Liquid Immersed Data Center: TMGcore’s OTTOby Dr. Ian Cutress on November 19, 2019 8:00 AM EST
Immersion cooling of servers is always fun, and it has evolved in the 20 years or so since I first saw it with $300/gallon special 3M liquids. In 2019, at every enterprise trade show, we see a few servers that use this cooling in data centers, despite the different infrastructure needs they have. In order to simplify adoption, TMGcore have developed fully self-contained and physically dense server containers. Not only that, but ‘OTTO’ is supposed to be better for the environment too.
The traditional picture we all have of data centers are racks upon racks of servers perhaps built into hot channels and cold channels for airflow, a big HVAC system, a ton of noise, and a ton of networking and power cables everywhere. Over the last few years, we are seeing more and more efforts to bring the power efficiency of these date centers into more reasonable numbers, and one of those methods has been through two-phase immersion cooling: rather than using air, you put the whole server/rack into a liquid with a low boiling point and use the phase transition along with convection as your heat removal system.
Managing the infrastructure needs for two-phase immersion cooling is different to a traditional data center. There are the liquids, the heat exchangers, the power, the maintenance, and the fact that not a lot of people are used to having big expensive hardware dipped in what looks like water. This is why an immersion demonstration at a trade show usually draws a crowd – despite seeing it year on year, there are plenty of people that haven’t. How TMGcore have solved most of these issues is to remove the infrastructure and maintenance requirements completely.
The OTTO is a self-contained, automated, two-phase liquid immersed data center unit. All a datacenter needs to add is a connection to its power, network, and water lines. The family of products from TMGcore, built with partners, is designed so that once the hardware is installed, it doesn’t need adjusting by the person buying it. Units come in different sizes, and customers can scale their needs simply by ordering more units. Hardware hotswapping is either done locally or remotely by the internal system, energy is reused by the heat exchangers, and the typical ‘PUE’ metric that describes the power efficiency of a datacenter is only 1.028, compared to 1.05/1.06 for some of the most efficient air-cooled data centers. This means that for every megawatt of HPC compute done, TMGcore claims that their OTTO systems only need 1.028 megawatts of energy.
Another claim from TMGcore is compute density, up to 3.75 kW per square foot. This means that the three main feature sizes of Otto, 60 kW, 120 kW, and 600 kW, come in self-contained sizes of 16 sq ft, 32 sq ft, and 160 sq ft. The goal here is to provide compute capacity when space requirements are low. The units can also be stacked where required, or retained in portable containers where facilities exist. Customers with specific requirements can request unique builds as required.
Each unit is fitted with TMGcore’s own blade infrastructure, aptly named an ‘OTTOblade’. An example of one blade that the company provides is a dual socket Intel Xeon with dual 100G Ethernet, 512 GB of DRAM, eight SSDs, and 16 V100 GPUs, for 6 kW. 10 of these can go into one of the 60 kW units, affording 160 V100 GPUs in 16 sq ft.
Obviously one of the key criticisms for self-contained, sealed, automated hardware is that it’s a pain when hardware fails and it needs changing. One of the ideas behind two-phase cooling is that the temperature of the hardware can be closely monitored to extend its lifespan. For other out-of-the-box failures, some of it can be managed by the automated systems, while others will require engineers on site. The idea is that because these units are a lot easier to manage, operational expenses will be severely reduced regardless.
TMGcore is working with partners for initial deployments, and we’re hoping to see one in action this week at Supercomputing. I have an open offer to visit the R&D facility next time I’m in Dallas.