Russell Tipton,Adam Wise,Srdan Mutabdzija,Craig Shores,Devdatta Kulkarni
标识
DOI:10.1115/ipack2024-141206
摘要
Abstract Applying cold plates to primary heat sources of high-performance compute (HPC) or Artificial Intelligence (AI) servers and pumping refrigerant through them is known as pumped two-phase (P2P) liquid cooling. Refrigerant to air (RA) and refrigerant to liquid (RL) versions of P2P direct to chip liquid cooling of artificial intelligence / machine learning (AI/ML) IT equipment have been developed and their performance was evaluated at commercial scale. P2P captures heat from the silicon and transports it to outdoor air. Matching silicon power map to the cold plate is critical and its design is in the IT manufacturer’s domain. Therefore, collaboration with transport system designer is necessary for optimized P2P performance. The transport system includes working fluid conveyance from coolant distribution unit (CDU) pump to an array of two-phase cold plates, return to CDU condenser, and back to CDU pump. The condenser transfers heat from the secondary fluid, refrigerant, to primary fluid which is either the whitespace air of an air-cooled data center or facility cooling system of an economized chiller. Important performance parameters demonstrated in this paper include distribution and flow regulation of necessary refrigerant flow to each set of cold plate array; stability of CDU operation over the full range of zero to 100% IT load and during transient and asymmetric IT loading; CDU stable operation at startup as server population varies from zero to full population and while hot-swapping servers from racks. Demonstration of safe operation during abnormal conditions including pump switch over and loss of heat rejection is discussed. Lessons learned are included from commissioning P2P CDU and flow conveyance including charging, flushing, and start-up. Advantages of P2P in achieving sustainability goals and insights into a TCO analysis of RA and RW comparing 1P to 2P LC are discussed.