Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
Managing external storage for GPU-accelerated AI workloads can be complex—especially when ensuring that storage volumes are provisioned correctly, isolated per tenant, and automatically mounted to the right compute nodes. With aarna.ml GPU Cloud Management Software (GPU CMS), this entire process is streamlined through seamless integration with VAST external storage systems.
End-to-End Automation with No Manual Steps
With aarna.ml GPU CMS, end users don’t need to manually log into multiple systems, configure storage mounts, or worry about compatibility between compute and storage. The VAST integration is fully automated—allowing users to simply specify:
The desired storage size.
The bare metal node where the storage should be mounted.
Everything else—from tenant-aware provisioning to storage policy enforcement and automatic mount point creation—is handled seamlessly by aarna.ml GPU CMS in the background.
Simple and Efficient Flow
The process starts with the NCP admin (cloud provider admin) importing the compute node into the system and setting up a new tenant. Once the tenant is onboarded, the tenant user can allocate a GPU bare-metal instance and request external storage from VAST.
The tenant simply provides:
The desired storage size.
The specific compute node where the storage should be mounted.
Once these inputs are provided, aarna.ml GPU CMS handles all interactions with VAST, including:
Configuring storage volumes.
Assigning tenant-specific quotas.
Creating the mount point.
Ensuring the mount point is immediately available on the compute node.
This zero-touch integration eliminates any need for the tenant to interact with the VAST portal directly.
Real-Time Validation Across Systems
To ensure transparency and operational assurance, the NCP admin or tenant admin can view all configured storage volumes directly within aarna.ml GPU CMS. For additional verification, they can also cross-check the automatically created tenants, networks, policies, and mount points directly in the VAST admin portal.
This two-way visibility ensures that:
The tenant’s allocated storage matches the requested size.
The network isolation policies (north-south overlays) are correctly applied.
All configurations are performed via APIs with no manual intervention.
Full Tenant Experience
Once the storage is provisioned, the tenant user can log directly into their allocated GPU compute node and immediately access the mounted VAST storage volume. Whether for large-scale AI training data or model checkpoints, this automated mount ensures data is available where and when the user needs it.
To further validate, the tenant can create and save files to the external storage—confirming that the VAST integration is complete and the storage is fully accessible from their compute instance.
Key Benefits
End-to-End Automation: No manual steps—just specify size and compute node, and aarna.ml GPU CMS handles everything else.
Single Pane of Glass: Both compute and storage provisioning are managed from a single interface.
Full Tenant Isolation: Each tenant’s storage is isolated with tenant-specific quotas and network policies.
Real-Time Observability: Both admins and tenants can view and validate storage allocations directly within the aarna.ml GPU CMS portal.
API-Driven Consistency: All configurations—from mount points to network overlays—are performed through automated APIs, ensuring accuracy and compliance with tenant policies.
Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
Managing billing in a multi-tenant AI cloud environment can be complex — especially when handling diverse customers, varying resource usage patterns, and multiple service plans. With aarna.ml GPU Cloud Management Software (GPU CMS), this process is simplified through seamless integration with Monetize360, offering a single pane of glass experience for both cloud providers and tenants.
Integrated Billing
With aarna.ml GPU CMS, AI cloud providers and their customers do not need to switch between multiple portals to manage infrastructure and view billing information. Instead, all billing-related functions from Monetize360 are directly accessible within the aarna.ml GPU CMS interface, ensuring a smooth, uninterrupted user experience.
From the initial catalog pricing definition to tenant-level resource consumption tracking, invoice generation, and invoice visibility for tenant users, the entire billing lifecycle is integrated into the same UI that manages the cloud infrastructure. This eliminates the confusion caused by fragmented workflows and makes billing fully transparent.
Multi-Level User Experience
The billing integration supports different user personas, ensuring each user type gets the right level of visibility and control.
NCP Admin (Cloud Provider Admin) defines the pricing catalog, creates tenants, and configures billing preferences.
Tenant Admin manages tenant-specific resources and can view invoices for specific billing periods.They can download invoices directly from the aarna.ml GPU CMS portal, ensuring full visibility into usage and costs without needing to access Monetize360 separately.
Tenant Users view their allocated resources and usage metrics.
Automated Usage Tracking and Invoice Generation
The process starts when the NCP Admin sets up the service catalog, defining available AI compute instances and their hourly rates. When tenants allocate resources, all usage metrics are automatically collected and passed to Monetize360 through the integrated pipeline.
At any time, the NCP Admin can trigger invoice generation for the desired billing period. The system queries all resource usage data, generates the invoices in Monetize360, and makes them visible within aarna.ml GPU CMS for tenant users. The downloadable invoices follow a standard format with full breakdowns of allocated resources, rates, and total charges.
Real-Time Transparency for Tenants
Tenant users have direct access to their billing information without needing to rely on the NCP Admin or manually request invoices. Through the same portal where they manage their AI workloads, they can:
View current and historical invoices.
Check detailed usage and charges.
Download invoices for offline review or accounting purposes.
This transparent, self-service billing experience not only simplifies financial operations but also enhances trust between cloud providers and their customers.
Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
India is on the cusp of a transformative AI revolution, driven by the ambitious IndiaAI initiative. This nationwide program aims to democratize access to cutting-edge AI services by building a scalable, high-performance AI Cloud to support academia, startups, government agencies, and research bodies. This AI Cloud will need to deliver on-demand AI compute, multi-tier networking, scalable storage, and end-to-end AI platform capabilities to a diverse user base with varying needs and technical sophistication.
At the heart of this transformation lies the management layer – the orchestration engine that ensures smooth provisioning, operational excellence, SLA enforcement, and seamless platform access. This is where aarna.ml GPU Cloud Management Software (GPU CMS) plays a crucial role. By enabling dynamic GPUaaS (GPU-as-a-Service), aarna.ml GPU CMS allows providers to manage multi-tenant GPU clouds with full automation, operational efficiency, and built-in compliance with IndiaAI requirements.
Key IndiaAI Requirements and aarna.ml GPU CMS Coverage
The IndiaAI tender defines a comprehensive set of requirements for delivering AI services on cloud. While the physical infrastructure—hardware, storage, and basic network layers—will come from hardware partners, aarna.ml GPU CMS focuses on the management, automation, and operational control layers. These are the areas where our platform directly aligns with IndiaAI’s expectations.
The diagram below illustrates the IndiaAI requirements and the components supported by aarna.ml GPU CMS.
Service Provisioning
aarna.ml GPU CMS automates the provisioning of GPU resources across bare-metal servers, virtual machines, and Kubernetes clusters. It supports self-service onboarding for tenants, allowing them to request and deploy compute instances through an intuitive portal or via APIs. This dynamic provisioning capability ensures optimal utilization of resources, avoiding underused static allocations.
Operational Management
The platform delivers end-to-end operational management, starting from infrastructure discovery and topology validation to real-time performance monitoring and automated issue resolution. Every step of the lifecycle—from tenant onboarding to resource allocation to decommissioning—is automated, ensuring that GPU resources are always used efficiently.
SLA Management
SLA enforcement is a critical part of the IndiaAI framework. aarna.ml GPU CMS continuously tracks service uptime, performance metrics, and event logs to ensure compliance with pre-defined SLAs. If an issue arises—such as a failed node, misconfiguration, or performance degradation—the self-healing mechanisms automatically trigger corrective actions, ensuring high availability with minimal manual intervention.
AI Platform Integration
IndiaAI expects the AI Cloud to offer end-to-end AI platforms with tools for model training, job submission, and model serving. aarna.ml GPU CMS integrates seamlessly with MLOps and LLMOps tools, enabling users to run AI workloads directly on provisioned infrastructure with full support for NVIDIA GPU Operator, CUDA environments, and NVIDIA AI Enterprise (NVAIE) software stack. Support for Kubernetes clusters, job schedulers like SLURM and Run:AI, and integration with tools like Jupyter and PyTorch make it easy to transition from development to production.
Tenant Isolation and Multi-Tenancy
A core requirement of IndiaAI is ensuring strict tenant isolation across compute, network, and storage layers. aarna.ml GPU CMS fully supports multi-tenancy, providing each tenant with isolated infrastructure resources, ensuring data privacy, performance consistency, and security. Network isolation (including InfiniBand partitioning), per-tenant storage mounts, and independent GPU allocation guarantee that each tenant’s environment operates independently.
Admin Portal
The Admin Portal consolidates all these capabilities into a single pane of glass, ensuring that infrastructure operators have centralized control while providing tenants with transparent self-service capabilities.
Compliant. Additionally, aarna.ml GPU CMS supports native PaaS for scheduling a diverse set of jobs and models.
6.7 Admin Portal
• User registration/account creation
• Service catalogue and prices
• Capacity dashboard
• Utilization monitoring
• Incident management
• Service Health Dashboard
• Ability to customize dashboard for the subsidy workflow
Compliant. aarna.ml GPU CMS supports various user personas, granular RBAC, APIs, and workflows for all the Admin portal requirements.
6.8 Service Provisioning
• Online, on-demand instances that can be scaled up/down
• Management portal
• Public internet access with VPN
• Support for BMaaS and VMs
• MTTR SLAs and recovery
• User notifications
• Data destruction (so it cannot be forensically recovered)
Compliant. aarna.ml GPU CMS provides end users a GUI and API-based interface to dynamically create, manage, monitor, and terminate their instances.
6.9 Operational Management
• Patch management
• OS images with latest security patches
• Root cause analysis and timely repairs
• System usage
Compliant
6.12 SLA Management
• SLA measurement and MTTR improvement to meet incident management SLA (99.95% or higher)
• Service availability measurement
Compliant. aarna.ml GPU CMS provides observability, monitoring, and auto-resolution features to improve availability and meet SLAs.
Conclusion
The IndiaAI initiative requires a sophisticated orchestration platform to manage the complexities of multi-tenant GPU cloud environments. aarna.ml GPU CMS delivers exactly that—a robust, future-proof solution that combines dynamic provisioning, automated operations, self-healing infrastructure, and comprehensive SLA enforcement.
By seamlessly integrating with underlying hardware, networks, and AI platforms, aarna.ml GPU CMS empowers GPUaaS providers to meet the ambitious goals of IndiaAI, ensuring that AI compute resources are efficiently delivered to the researchers, startups, and government bodies driving India’s AI innovation.
Milind Jalwadi
Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
As AI workloads become more demanding, Network Cloud Providers (NCPs) and AI Cloud Providers face significant challenges in scalability, resource efficiency, and multi-tenancy management. Traditional static GPU allocations lead to underutilized resources, increased operational overhead, and a lack of flexibility in handling diverse workloads like LLM training, batch inference, and real-time AI inferencing.
To address these challenges, aarna.ml GPU Cloud Management Software (CMS) introduces a comprehensive, automated, and scalable reference architecture (RA) that enables multi-tenant GPUaaS (GPU-as-a-Service) along with additional capabilities such as PaaS, Job Submission and Model Serving. This RA blueprint provides GPU cloud providers with seamless integration with their GPU environment leading to efficient GPU orchestration, workload isolation, and intelligent scheduling. This ensures maximum efficiency, flexibility, and cost optimization in AI infrastructure.
The Key Challenges in Multi-Tenant AI Cloud Infrastructure
Unified Multi-Tenancy Management
Multi-tenancy in AI infrastructure is often fragmented across compute, networking, storage, and PaaS layers, making tenant onboarding complex and inefficient. Without a unified framework, providers struggle with:
Inconsistent isolation policies between different infrastructure stacks.
Manual intervention required for onboarding and resource allocation.
Lack of automation, leading to operational inefficiencies.
aarna.ml GPU CMS provides a cohesive multi-tenancy model that ensures:
Multi tenancy across compute infrastructure, networking fabric (ethernet and InfiniBand), external storage and PaaS.
Seamless tenant onboarding with automation.
Granular workload isolation at compute, storage, and networking layers.
Role-based access control (RBAC) for security and governance.
Supporting Diverse AI Workloads on a Single Platform
Modern AI cloud infrastructure needs to support a wide range of workloads, including:
Static Bare-Metal GPU allocations for high-performance computing.
Kubernetes-based PaaS solutions for scalable AI workloads.
Job Submission & Model Serving for AI inference and automation.
The aarna.ml CMS RA enables:
Flexible GPUaaS offerings, including IaaS and PaaS models.
Kubernetes-native orchestration to dynamically allocate workloads.
Integration with AI model deployment frameworks (NVIDIA NIM, Hugging Face, vLLM).
Maximizing GPU Utilization with Intelligent Orchestration
AI workloads such as Large Language Model (LLM) training, Small Language Model (SLM) fine-tuning, DL training, batch inference, real-time inferencing, and Retrieval-Augmented Generation (RAG) have highly varied GPU utilization patterns. Without dynamic orchestration, GPU resources remain underutilized, leading to higher costs and lower efficiency.
aarna.ml GPU CMS provides a native job scheduling capabilities along with options to integrate with 3rd party job schedulers like Run.ai and SLURM to maximize GPU utilization. Along with supporting various job schedulers, aarna.ml GPU CMS also utilizes MIG (Multi-Instance GPU) to further enhance GPU utilization. NCPs and AI cloud providers can also register their spare GPU capacity with NVIDIA Cloud Functions (NVCF) using aarna.ml GPU CMS to utilize the GPUs in instances such as low traffic on inference workloads.
The aarna.ml CMS RA enables:
Dynamic scaling of GPU resources based on workload demand.
Optimizing infrastructure efficiency with intelligent workload scheduling.
Enabling per-tenant GPU isolation using MIG or full GPU partitioning.
aarna.ml GPU CMS Reference Architecture (RA)
The aarna.ml GPU CMS RA is designed as a modular, scalable, and API-driven framework that enables providers to build and manage a multi-tenant AI cloud platform. In just a few weeks, the aarna.ml GPU CMS can be integrated with the GPUaaS provider’s GPU environment. The RA consists of several core components that work together to ensure high efficiency, seamless orchestration, and security in AI cloud operations. aarna.ml GPU CMS adheres to NVIDIA Reference Architecture for NCP, NCP Telco, Spectrum-X Compute Network Fabric, Quantum-2 InfiniBand Platform, High Performance Storage Design, and Common Networking.
Tenant Management & Isolation
One of the most critical aspects of a multi-tenant AI cloud is ensuring secure tenant isolation while enabling seamless access to shared GPU resources. The aarna.ml GPU CMS RA introduces a hierarchical multi-tenancy model that enables cloud providers to:
Create isolated virtual environments per tenant using NVIDIA MIG (Multi-Instance GPU) or full GPU allocation, ensuring that different customers or workloads do not interfere with each other.
Implement automated tenant provisioning with built-in role-based access control (RBAC) and dynamic policy enforcement for secure AI workload execution.
Enable per-tenant Ethernet and InfiniBand segmentation for ultra-low-latency communication in AI/ML workloads, ensuring high performance while maintaining isolation.
Create a secure and isolated per tenant storage volumes in external high performance storage arrays along with network isolation.
With these capabilities, AI cloud providers can offer secure, scalable, and fully automated multi-tenant GPU environments that meet enterprise requirements.
GPU Orchestration & Scheduling
To maximize GPU utilization and efficiency, aarna.ml GPU CMS integrates dynamic GPU scheduling and workload orchestration mechanisms. Instead of relying on static GPU allocations, this RA allows for:
Native job scheduling with Kubernetes support, where workloads are dynamically assigned GPU resources based on real-time demand.
Seamless integration with SLURM, Run: AI, and NVIDIA NVCF, enabling intelligent scheduling and GPU auto-scaling for training, inference, and fine-tuning workloads.
Dynamic GPU provisioning, allowing workloads to request GPU resources on demand while ensuring optimal allocation efficiency.
These capabilities eliminate GPU underutilization by ensuring that resources are only allocated when needed and freed when workloads complete, reducing costs and improving efficiency.
Infrastructure as a Service (IaaS) & Platform as a Service (PaaS)
To support a diverse range of AI workloads, the aarna.ml GPU CMS RA provides:
Bare-metal and VM-based infrastructure for traditional GPU compute workloads.
Kubernetes-based GPUaaS, allowing enterprises to deploy and scale AI models dynamically.
Self-service AI job submission, enabling data scientists and AI developers to access GPU resources through APIs and dashboards.
Seamless AI model deployment and serving, integrating with NVIDIA NIM, Hugging Face, and vLLM for real-time inference workloads.
By offering both IaaS and PaaS models, AI cloud providers can cater to enterprise users, researchers, and AI startups alike.
Monitoring, Security, and Billing
With multi-tenant AI clouds, monitoring, security, and billing become critical aspects of operations. The aarna.ml GPU CMS RA includes:
Real-time GPU utilization monitoring and analytics, providing insights into workload performance, GPU allocation efficiency, and cost tracking.
Built-in security policies and RBAC, ensuring that AI workloads remain isolated and protected from unauthorized access.
Integration with a 3rd party billing product for GPU and token usages.
With these capabilities, AI cloud providers can maintain full control, visibility, fault management, and security over their GPUaaS offerings.
By adopting aarna.ml GPU CMS, AI cloud providers can:
Maximize infrastructure ROI through dynamic GPU orchestration.
Scale AI workloads seamlessly across IaaS and PaaS models.
Enable per-tenant workload isolation for secure multi-tenancy.
Enhance operational efficiency with automation-driven management.
Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
HPC (High-Performance Computing) and AI / ML (Artificial Intelligence / Machine Learning) workloads are integral to addressing some of the most challenging problems in science, engineering, and business. HPC workloads enable precise simulations of natural and physical phenomena, while AI/ML workloads empower systems to learn, adapt, and make predictions from data.
Efficient execution of these tasks relies on robust job scheduling, optimized resource utilization, and effective coordination between CPUs and GPUs. While HPC workloads benefit from traditional, deterministic scheduling, the diversity of AI/ML workloads calls for more dynamic and workload-specific approaches. This blog explores the intricacies of HPC and AI/ML workloads, the critical role of job scheduling, and a detailed mapping of the best suitable schedulers for specific workload types.
2. Understanding the nature of various HPC and AI/ML Workloads
HPC workloads revolve around solving deterministic problems through simulations or computational models. These tasks often run for extended durations and require vast computational power distributed across CPUs and GPUs. The HPC jobs often need an uninterrupted execution and the use cases do not mandate any real time responses.
AI/ML workloads are fundamentally data-driven and probabilistic, encompassing a diverse set of tasks. These workloads range from training and fine-tuning machine learning models to deploying them for inference and decision-making.
Let’s delve into various categories of AI/ML workloads:
Large Language Models (LLMs), such as GPT or BERT, are a foundational AI workload. Training LLMs involves optimizing billions of parameters over massive datasets to enable models to understand and generate human-like responses. The workload requires extensive GPU resources for long term duration spanning several months.
Small Language Model (SLM) training focuses on smaller, domain-specific models that are optimized for particular applications. While GPUs dominate the training process, SLM models need GPU infrastructure for lesser duration, which would mean frequent set-up and tear down of infrastructure.
Fine-tuning is a process where pre-trained models are adapted to specific datasets or domains. This allows businesses to customize AI capabilities for their unique needs. Fine-tuning relies on GPUs for computationally intensive retraining tasks, but needs lesser GPU cycles as compared to SLMs. For this reason, these jobs might be even more dynamic than SLMs.
Batch inference processes large datasets using pre-trained models in a non-real-time manner. Based on the size of data to be processed, the GPU usage time for such workloads could be predicted. These tend to be of even shorter duration than Fine-tuning.
Retrieval-Augmented Generation (RAG) is a technique that combines a generative model (e.g., GPT, BERT) with an external knowledge retrieval system to generate responses. RAG is characterized by bursty usage of GPUs mainly for inferencing and minimal training. Unlike the other workloads above, RAG is a transactional workload that is “always on” with dynamic scaling to adapt to the load.
Real-time inference is another vital AI/ML workload, used in latency-sensitive applications. This workload is also transactional and relies on GPUs for performing the necessary computations with minimal delay. Of course, not all RAG and real-time inference jobs need GPUs, some are perfectly fine with CPUs.
Among all the above AI/ML workload categories, RAG and real-time inferencing need quick response times and the traffic for these is bursty with peaks and valleys.
To summarize, below table captures comparison between all the AI/ML workloads.
3. Why Is Job Scheduling Critical?
Efficient job scheduling is vital for handling the diverse demands of HPC and AI/ML workloads. It ensures that compute resources, whether CPUs or GPUs, are utilized optimally, preventing bottlenecks and idle time. Scheduling also allows for dynamic adaptation to fluctuating workloads, ensuring that resources are allocated in real-time based on priority and demand.
After having looked at various HPC and AI workloads characteristics, let’s take a look at some of the prominent job scheduling options.
SLURM (Simple Linux Utility for Resource Management) is a robust open-source job scheduler designed for High-Performance Computing (HPC) environments. It efficiently manages and allocates resources, such as CPUs, GPUs, and memory, across large clusters of nodes. SLURM supports features like job prioritization, pre-emption, dependency handling, and partitioning, making it ideal for running computationally intensive, batch-oriented workloads.
Ray is a distributed open-source computing framework for executing parallel and distributed processing of AI/ML workloads. Ray's job scheduling capabilities enable distributed model training, fault tolerance, task pre-emption and scaling, which results in efficient resource utilization. Ray enables developers to harness multi-node GPU processing capability by defining tasks that can be executed in parallel across the distributed GPU clusters spanning on-premises, cloud, or hybrid environments.
Kubernetes (With NVIDIA GPU operator) enables scheduling of GPU-accelerated workloads, managing pod-level resource allocation, and ensuring seamless scaling for AI/ML applications. It supports a wide range of use cases, including containerized training pipelines, inference deployments, and batch processing, particularly in multi-tenant or cloud-native environments. By combining Kubernetes’ powerful orchestration features with NVIDIA's GPU optimizations, this solution simplifies the execution of AI/ML workloads in scalable, containerized environments while providing flexibility for both batch and real-time processing tasks. Kubernetes can also be used for transactional workloads with soft isolation through different namespaces. We call it soft since it is not 100% secure, but good enough for an organization for sharing across departments.
Run.ai is a proprietary Kubernetes-native platform focused on optimizing GPU resource utilization and scheduling for AI/ML workloads by extending the native K8s scheduling capabilities specifically for AI/ML workloads. It enables dynamic GPU allocation, pooling, and partitioning, allowing multiple jobs to share GPUs efficiently. Run:AI’s job scheduling capabilities include pre-emption, priority-based scheduling, and resource quotas, ensuring fair allocation in multi-tenant environments.
The below table gives a brief comparison between features offered by the above stated job schedulers.
4. Finding a best fit job scheduler
Determining the best fit job scheduler depends on the short term and long term nature of AI workloads that are planned to be scheduled on the available GPU infrastructure.
For longer duration LLM training that span months, bare metal nodes serve the purpose without any need for a job scheduler. The only requirement here would be to have a sophisticated infra manager that automates bare metal provisioning such that it could be quickly reprovisioned for any type of new LLM training requirement.
For any other category of the AI/ML workloads it would be prudent to use some job scheduler.
The below table depicts how various job schedulers fare for supporting the different categories of AI/ML workloads.
As evident from the chart above, no single job scheduler perfectly addresses the diverse requirements of all use cases. To bridge this gap, at aarna.ml, we have developed a comprehensive GPU Cloud Management Software (CMS) that enhances existing job, instance, and model scheduling solutions by introducing robust multi-tenancy, hard isolation, and seamless support for a wide range of workloads, including bare metal, VMs, containers, and Kubernetes pods.
5. Conclusion
Efficient job scheduling is essential for maximizing the performance of HPC and AI workloads. By understanding the unique strengths and limitations of these schedulers—and aligning them with specific workload demands—organizations can achieve optimal resource utilization, reduce costs, and unlock the full potential of their computational infrastructure.
Milind Jalwadi
Milind has over two decades of experience in telecom domain and has been engaged with many tier 1 & tier 2 telecom operators in helping them defining their solution stack and delivering the projects requiring mix of telecom & technology expertise. Currently in Aarna he is playing role of principal consultant and is mostly involved in technical side of the go to market.
Deploying large-scale AI infrastructures comes with significant complexity for NVIDIA Cloud Providers (NCPs) who need to validate intricate, multi-tier network architectures built with NVIDIA’s state-of-the-art GPU and networking technologies.
In large-scale deployments, NCPs manage thousands of GPU nodes connected through multi-layered networking that includes leaf, spine, and super spine switches. Nvidia also recommends a Reference Architecture (RA) for NCPs to ensure that these configurations help achieve optimal throughput and low latency. However, implementing this design and validating the configurations is a challenge because of existing hardware limitations in testing environments, risking service reliability and deployment timelines.
The need for robust validation is crucial. The complexity of these topologies, coupled with multi-tenancy requirements, makes a reliable and scalable validation solution a necessity. The alternatives are performance degradation or network downtime, both of which are catastrophic in terms of revenue loss or SLA violation penalties.
To tackle these challenges, aarna.ml presents an innovative Digital Twin solution, which works seamlessly with NVIDIA Air to simplify network validation and streamline operations. This blog will provide an in-depth look at the importance of a Digital Twin, the common challenges faced by NCPs, and how the aarna solution can transform network deployment and management.
Typical Large Scale AI Cloud deployment
The diagram below depicts a high level topology of a large scale deployment.
Typical deployments comprise multiple nodes grouped under scalable units (SU). GPUs within these SUs are then connected through multi-tier switches comprising leaf and spine and core switches, such that GPU to GPU communications across the complete data center is optimized with minimal hops and thereby ensuring high performance and low latency. Please note that topology specified above is only for reference and does not indicate any recommended deployment configurations.
Challenges in Current AI Cloud Deployments and Operations
As we have seen above, the sheer scale of deployment, lack of adequate hardware resources for testing, absence of automation tools makes it difficult for NCPs to ensure that actual deployment matches the intended designed deployment. The current challenges that NCPs need to address could be categorized under day 0; day 1 activities and day 2 activities are also a key consideration and are detailed below.
Day 0 and Day 1 Design and Validation:
Manual Setup: Traditional methods involve time-consuming manual configuration of underlay networks and the testing of network cabling and server deployments. The entire lifecycle to make the deployment live could span a few months.
RA Compliance: Ensuring that the deployment aligns with NVIDIA’s RA specifications is challenging without a standardized validation tool and test and configuration scripts, increasing the potential for errors.
Hardware Limitations: Testing expansive, multi-tier topologies in lab settings is constrained by limited resources, leading to incomplete validation.
Day 2 Operations and Management:
Configuration management: Ensuring synchronized and versioned configurations across hundreds of switches is a complex task. Configuration drifts can lead to inconsistent network behavior and potential service issues.
Tenant Life Cycle Management (LCM): Allocating and deallocating nodes and virtual resources for tenants need overlay configurations to be performed on several switches and nodes. Manual approach and validating these on production set-up requires a careful design and implementation. Errors can be costly.
Topology Changes: Routine maintenance, switch replacements, and topology updates necessitate quick, error-free configuration updates to maintain network stability.
Reducing MTTR: Identifying and correcting GPU related errors is time consuming because of manual steps and configurations. This can cause long mean-time-to-repair (MTTR) durations.
Introducing the Digital Twin Solution
Aarna.ml’s Digital Twin solution is a transformative tool that works with NVIDIA Air to create a comprehensive digital replica of physical network infrastructure. This enables NCPs to simulate their network topologies, test various deployment scenarios, and validate configurations before moving to live production, greatly reducing the risk of errors.
Key Features and Capabilities
Complete Network Simulation: The Digital Twin allows NCPs to specify and simulate their desired network topologies, including multi-tier switching, to ensure compliance with NVIDIA RA standards. This simulation environment supports detailed testing of both underlay and overlay configurations.
Automated Day 0 Configurations: The solution automates the initial setup of underlay networks, minimizing manual errors and significantly reducing the time required for validation.
Dynamic Tenant Overlays: Support for dynamic overlay configurations ensures that tenant-specific requirements are met, enabling seamless management of both east-west (GPU-to-GPU) and north-south (GPU-to-storage) traffic flows.
Simulation of Day 2 Operations: The Digital Twin simulates day 2 operations such as configuration changes, switch replacement, topology updates, GPU errors etc. This ensures that all such scenarios are tested and the deployment scripts and configurations are generated that could be used by NCPs for their production setup.
Benefits of aarna.ml Solution
Accelerated Deployment: Reduces the certification process from months to weeks by automating validation and configuration tasks.
Enhanced Reliability: Ensures comprehensive validation of network configurations to prevent errors before they reach production.
Efficient Day 2 Operations: Simplifies tenant-specific changes, configuration drift corrections, ongoing topology management and reduces MTTR by automating GPU related fault corrections.
Improved ROI: Maximizes operational efficiency, saving time and reducing the potential for costly network issues.
Summary
The aarna.ml solution, which utilizes NVIDIA Air, is an essential capability for NCPs looking to streamline their deployment processes, reduce risks, and maintain optimal performance in NVIDIA-based networking infrastructures. By automating testing and operational tasks, this solution empowers NCPs not only to validate complex deployments but also extend its usage for production set-up.
How to Engage
Engage aarna.ml for a Digital Twin Professional Service. Complete your day 0 and day 1 provisioning of GPU infrastructure within 2 weeks.
Explore Your Options: Learn more about how Aarna Networks’ Digital Twin technology can transform your network validation and management strategies. Contact us at info@aarna.ml today to integrate this innovative solution into your operations and unlock the full potential of your NVIDIA-based infrastructure.