Revolutionize Investigation with AI-Powered Hercule
In an era where artificial intelligence (AI) is increasingly pivotal across various sectors, Amazon Web Services (AWS) and NVIDIA have collaborated to initiate Project Ceiba, an ambitious endeavor to establish the world’s largest AI supercomputer in the cloud. This innovative partnership aims to advance AI technology significantly, facilitating substantial progress across multiple domains while ensuring the highest levels of security for sensitive data.
Transforming AI Research and Development
At the core of Project Ceiba is NVIDIA’s dedication to innovation. By utilizing the powerful capabilities of this supercomputer, NVIDIA’s research and development teams aim to make significant advancements in various fields of AI, including:
- Large Language Models (LLMs)
- AI-Driven Graphics and 3D Generation
- AI-Powered Digital Biology and Climate Modeling
- Robotics, Self-Driving Vehicles, and Smart Machines
Project Ceiba, with its capability to process 414 exaflops of artificial intelligence data, represents a significant advancement in computational power. It is 375 times more powerful than the current leading supercomputer, Frontier. This groundbreaking initiative is expected to transform the field of generative AI and its numerous related applications.
NVIDIA DGX Cloud: Scalable and Flexible AI Infrastructure
Project Ceiba’s infrastructure is built on NVIDIA DGX Cloud, a state-of-the-art AI platform tailored for developers seeking scalable, high-performance computing.
Key Features of DGX Cloud:
- 20,736 NVIDIA Blackwell GPUs:
These GPUs utilize GB200 Grace Blackwell Superchips, combining the strengths of the Grace CPU and Blackwell GPU. - NVL72:
Project Ceiba features 20,736 NVIDIA Grace Blackwell Superchips and is built on NVIDIA’s NVL72 technology, a liquid-cooled system with fifth-generation.It connects 20,736 Blackwell GPUs to 10,368 NVIDIA Grace CPUs, achieving an incredible processing power of 414 exaflops of AI—about 375 times stronger than the current fastest supercomputer, Frontier. To put it in perspective, the combined supercomputing capacity of the world is less than 1% of this power, equivalent to over 6 billion advanced laptops working together. Moreover, if every person on Earth performed one calculation per second, it would take over 1,660 years to match what Project Ceiba can do in a single second.
Massive GPU Scaling:
The DGX Cloud enables a significant ramp-up in computational power, increasing from 8 GPUs to 72 GPUs per mesh. This expansion enhances flexibility and processing speed, accommodating the demanding requirements of modern AI workflows.High-Bandwidth Networking with AWS Elastic Fabric Adapter (EFA):
Leveraging fourth-generation EFA, Project Ceiba achieves an impressive 1,600 Gbps per super chip, ensuring low-latency, high-throughput data transfers.
AWS EC2 UltraClusters
Amazon Elastic Compute Cloud (Amazon EC2) UltraClusters can help you scale to thousands of GPUs or purpose-built ML accelerators, such as AWS Trainium, to get on-demand access to a supercomputer. They democratize access to supercomputing-class performance for machine learning (ML), generative AI, and high performance computing (HPC) developers through a simple pay-as-you-go usage model without any setup or maintenance costs. The backbone of Project Ceiba’s scalability lies in AWS EC2 UltraClusters, which provide a robust and flexible high-performance computing (HPC) environment. UltraClusters deliver:
- Petabit-scale Networking: Connecting thousands of GPUs at high speeds for distributed training.
- Dynamic Scalability: Allowing developers to scale workloads elastically based on real-time demands.
UltraClusters integrates seamlessly with the DGX Cloud, ensuring maximum performance and reliability for large-scale AI workloads.
Nitro System with Elastic Fabric Adapter (EFA)
The Nitro System is a collection of hardware and software components built by AWS that enable high performance, availability, and security.AWS’s Nitro System, integrated with the Elastic Fabric Adapter (EFA), networking, provides an unprecedented 1,600 Gbps per super chip of low-latency, high-bandwidth networking throughput, enabling lightning-fast data transfer and processing. further optimizes performance by allowing kernel bypass, which reduces overhead and latency for distributed workloads. This synergy enhances GPU-to-GPU communication, ensuring faster and more efficient data transfers across nodes.
Eco-Friendly Efficiency with Liquid Cooling
Liquid cooling has been utilized for many years, but before the launch of Project Ceiba, AWS opted for air cooling over liquid cooling due to cost-effectiveness. To tackle the challenges of power density and provide exceptional computing power in Project Ceiba, AWS has innovatively implemented large-scale liquid cooling in data centers. This approach offers more efficient and sustainable solutions for high-performance computing.
Built with Security at the Core
In an era where data security is of utmost importance, Project Ceiba incorporates cutting-edge security features meticulously designed to safeguard sensitive artificial intelligence data. The NVIDIA Blackwell GPU architecture includes capabilities that facilitate secure communication among GPUs, in conjunction with AWS’s Nitro System and Elastic Fabric Adapter (EFA) technologies. Notable security features include:
- End-to-End Encryption: Data integrity is ensured through secure, encrypted transfers for generative AI workloads, maintaining the confidentiality of the data.
- Cryptographic Validation: Using the Nitro System, customers can validate their applications with the AWS Key Management System (KMS), ensuring that only verified applications can access and process sensitive AI data.
Thanks to advanced security measures, Project Ceiba guarantees that organizations can use their data without compromising privacy or security.