TACC Frontera: Targeting 210W Next-Gen Xeons and Extreme Performanceby Ian Cutress on November 12, 2018 8:00 AM EST
The Frontera supercomputer is the next generation high performance machine set to debut at the Texas Advanced Computing Center (TACC) in 2019. As part of Intel’s HPC Forum, being held just before the annual Supercomputing conference, a number of disclosures about the design of Frontera (Spanish for ‘Frontier’) were made. One of which is certainly worth highlighting: this is not a supercomputer that is going to worry about performance per watt – this is all about the peak performance.
Supercomputer Procurement Goals
When building and investing in a supercomputer, there are several different limits to be mindful of: total cost of ownership, physical space, cooling capacity, workload demands (memory intensive vs compute intensive), administration challenges, user base, storage, and expected life time of the installation. Boiled down, most high performance computer builds focus on performance per watt scaling for performance. This is why we often see single socket and dual socket systems with processors found in the middle of what the processor manufacturer offers. There are plenty of supercomputers deployed globally that are built upon Xeon E5-2640 type processors, or more recently, mid-placed Xeon Gold processors.
Don Stanzione, the Executive Director at TACC, went into some detail about the focus of Frontera, its next generation supercomputer cluster it intends to deploy in 2019. TACC already has a number of deployments, such as Stampede (6400 nodes, Xeon CPU + Xeon Phi + Omnipath), Lonestar (12-core Xeon + some K40), and Maverick (132 nodes, 10-core Xeon + K40). Frontera is going to be the new TACC flagship compute system, involving high performance Xeons, Mellanox interconnect, liquid and oil cooling, and optimized storage systems.
High Power Next-Gen Xeons
At the heart of Frontera, it will be solely Xeon based. We were told that the goal for the system is to provide 35-40 PetaFLOPS, with the exact number dependent on what frequencies Intel will use in its next generation (read Cascade Lake) Xeons. There will be a few NVIDIA nodes for single precision computing, but we were told that the users for Frontera are not that interested about learning new computing paradigms to take advantage of the computing resources: what they want is more of the same, but just faster. The best way to do this, we were told, was to keep the system architecture the same (AVX-512 based compute processors), but more cores and higher frequencies with similar node-to-cluster connectivity. As a result, Frontera is built with the high TDP Xeon processors in mind, the 205-210W parts, rather than the 145W parts.
In this situation, users can develop programs with less node-to-node communication, and it should offer a faster speedup of legacy code without refactoring code. The easiest way to improve performance is more frequency, regardless of efficiency - adding more CPUs and more nodes is a complex way to improve performance, and the goal of Frontera is to make this 'easy'.
Because of the high thermal requirements, water cooling will be used for the majority of the nodes. Because the nodes are coming from DellEMC, and DellEMC works with CoolIT, the liquid cooling will be from CoolIT. Some of the nodes will use oil immersion techniques, from GRC, although it wasn’t stated what sort of nodes this will be. No mention was made if the liquid cooling will be warm liquid cooling, but we were told that this scheme is put in place not because it is the most power efficient, but because it gives the userbase the best speedup for the least amount of work.
Other information about the deployment includes the interconnect, using Mellanox HDR and HDR-100, using a Fat Tree topology and 200 Gb/s links between switches. Storage will be split depending on the user and the workload: there will be four different storage environments, three based on general storage, and one on very fast connectivity using peak IO as the main metric. Users interested in using the peak IO storage will need to be pre-approved. The storage implementation will be developed by DataDirect Networks, who has previous relationships with TACC, and the global storage should be north of 50+ PB with around 3PB of NAND Flash and around 1.5TB/sec of storage connectivity.
The full Frontera will have above 8000 compute nodes, and have a peak power around 6 MW, which is well within the capability of TACC’s current infrastructure. Each rack will be around 65 KW, which comes out at around 100 racks in total. Cores per rack were not disclosed, although 65 KW/rack and 210W/CPU would mean a maximum of 310 CPUs, although that doesn’t give any overhead or include the storage. If you wanted to multiply out all the data, and assume that every CPU is the 28-core versions, we’re looking at an upper bound of 850,000 cores, with the actual number being much lower due to the infrastructure. The supercomputer will use part solar power, and around 1/3 of the power from wind power credits and wind power production.
Frontera is also being planned with a second phase deployment in mind in 2024-2025. TACC will deploy a number of development systems, including FPGA systems, quantum simulators, Tensor core systems, and even Optane deployed next-gen Xeon systems. The idea is that these systems will be available for development work and hopefully direct the future growth of Frontera. Phase 2 is expected to be a factor 10x faster for compute.
Frontera will enter production in the summer of 2019, and is funded by a $60m grant from the National Science Foundation. This is compared to the $30m grant used for the Xeon-Phi based Stampede2 system that currently runs at 18 PetaFLOPs.