" "
Aarna.ml

Resources

resources

Blog

Amar Kapadia

The Emerging GPU-as-a-Service Provider Industry
Find out more

The introductory blog introduced the concept of GPU-as-a-Service (GPUaaS). The next blog classifies GPUaaS providers and describes factors that are driving GPU demand. 

Key GPUaaS Provider Classification: 

GPUs are offered either as bare metal (one or more physical machines) or as a virtual machine (VM) to be consumed via APIs. The bare metal or VM instance may optionally have a Container-as-a-Service (CaaS) layer on top managed by Kubernetes. 

We have classified key GPUaaS players to help end users choose the right provider: 

  1. Hyperscalers like Google, AWS, Azure, Oracle and newer entrants like Lambda Labs, Coreweave, Digital Ocean etc, who provide bare metal, VMs or CaaS solutions, often packaged with their own PaaS or SaaS layers like LLM models, Pytorch, LLMOps/MLOps/RunOps. 
  2. Traditional Telcos and Data centers with large commitments to buying GPUs to build “Sovereign AI Clouds” in their host countries (this is discussed later in the blog), If they are based on Nvidia, these Telcos and Data centers are called as “Nvidia Cloud Partners” (NCP) if their GPU commitment is sufficiently large. There are numerous NCPs consuming GPUs in the billions or tens of billions of dollars. The primary use case is LLM training and for this reason, they tend to favor bare metal instances. 
  3. Small, regional or edge data centers with smaller commitments and a focus on other use cases beyond LLM training. They tend to offer bare metal and VM instances optionally with a CaaS layer. 
  4. Startups, particularly those that may have started with a crypto currency use case or have significant capital deployed in the acquisition of GPUs – these players may also choose to build “industry clouds”. Their offerings include bare metal and VM instances, but often go beyond by offering PaaS or SaaS layers. If there is a vertical industry orientation, these PaaS or SaaS layers tend to be industry specific e.g. Fintech, life science, and more. 

What is driving GPU demand?  :  

The primary use case for the massive growth in GPUs is LLM training. Massive data sets (at internet scale) across languages are driving an insatiable demand to lock up the latest GPUs to train state-of-the-art models. Goldman Sachs 2023 report had predicted training to drive most of Nvidia’s revenues in 2024 and 2025. 

AI Compute Revenue Opportunity

However, contrary to Goldman Sach’s opinion, we expect inference use cases to show up much earlier and drive the next wave of GPU growth because there is no choice; models will need to be deployed for end user applications in order to generate ROI on the initial LLM investment. By design we also expect inferencing to be at a scale that's an order of magnitude greater than learning (5x to 40x) of learning. It is anticipated that the inference use case will be fragmented across Nvidia, AMD, Qualcomm and emerging startups like Groq and Tenstorrent (additional reading here and topic for another blog). Over time, we also expect model fine tuning and Small Language Models (SLM) to drive additional GPU growth.

Another source that will continue to drive demand for GPUs is “Sovereign AI”. Sovereign AI as defined by Michael Dell in his recent blog is a nation’s capability to produce artificial intelligence using its own infrastructure and data. We expect countries’s governments and public sector to embrace this idea. Most will be reluctant to use hyperscaler AI Clouds for their AI initiatives.  

In summary, we expect a step change in the way the market works – newer entrants including Telcos and Data Center companies will shape the industry as they spot a unique window of opportunity to win local AI Cloud business away from Hyperscalers. GPU demand and the number of GPUaaS providers will grow significantly in the next few years. New use cases will further contribute to this trend. 

The next blog will cover additional GPUaaS and NCP topics – both business and technical.! 

About us : Aarna.ml is an NVIDIA and venture backed startup building software that helps GPU-as-a-service companies build hyperscaler grade cloud services with multi-tenancy and isolation. 

Amar Kapadia

GPU-as-a-Service Blog series : An Introduction
Find out more

The Gen AI industry has spurred massive growth in the GPU market. NVIDIA, the most valuable company in the world, is at the forefront of this explosion. When it comes to GPU consumption models, a number of factors affect this industry – GPU supply (shortage vs. glut), cost vs. performance in the midst of the explosion of the various LLM models, complexity in the underlying technology choices, and the need for enterprises to experiment (do PoCs) as opposed to making long term commitments. 

The lack of certainty means enterprises and startups prefer to “rent” GPUs as opposed to buying dedicated hardware. This has created a new industry of “GPU-as-a-Service” or “AI Cloud” providers who rent GPUs to customers – often bare metal GPUs, but sometimes integrated with sophisticated softwares and services packaged for customers. This nascent GPU-as-a-Service market is forecasted to grow 16x to $80B over the next decade. 

To date, a small set of hyperscalers, crypto mining companies, and startups offer GPU-as-a-Service (GPUaaS). Moving forward, the number is expected to grow massively. Sequoia Capital, the world's leading venture capital firm recently published a blog that likened the GPU Capex buildout to the erstwhile railroad industry – “build the railroad and hope they will come”. 

If you have decided to be in the GPUaaS business, you are looking at a great business opportunity. However, as with any attractive business, there is no free lunch. As of July 2024, there are 600 new competitors, intense margin pressure, and a complex tech stack to deal with. Bare metal GPU instances are already at 2 dollars and 30 cents per hour. Supply pressure easing, these prices are likely to drop further.

Offering only bare metal-as-a-service is not a prudent ROI option for most AI Cloud providers. While it works for very large and long term workloads like training LLMs, the majority of the market does not need such large capacities locked up for extended durations. Inferencing, fine tuning and training of smaller deep learning models account for a much larger market with “bursty” and dynamic requirements. 

Given the rapid shifts in the GPU market, what should an AI Cloud provider do? If your customers are startups or enterprises building products that require training smaller models, inferencing or fine tuning existing off the shelf models, how should they go about it? These are some of the questions participants in the AI value chain are seeking answers to.

The lack of clarity in this emerging industry has given us at Aarna.ml an opportunity to provide an independent point of view to the GPUaaS providers. Over the next few weeks we will be publishing a series of posts on the GPU-as-a-service industry. We hope enterprises, startups, data center companies and AI Cloud providers can find our observations and opinions useful. 

About us : Aarna.ml is an NVIDIA and venture backed startup building software that helps GPU-as-a-service companies build hyperscaler grade cloud services with multi-tenancy and isolation. 

Milind Jalwadi

Webinar Recap: Dynamically Orchestrating RAN and AI Workloads on a Common GPU Cloud
Find out more

In the recent webinar, "Dynamically Orchestrating RAN and AI Workloads on a Common GPU Cloud," the presenters highlighted innovative strategies for optimizing GPU infrastructure. The focus was on leveraging dynamic orchestration to manage both RAN and AI workloads efficiently, maximizing resource utilization and ROI for Mobile Network Operators (MNOs).

Key Takeaways:

  1. RAN and AI workloads on the same GPU cloud: some text
    • 5G RAN L1 layer acceleration is achieved by using the GPUs for processing. The same GPU infrastructure could also be used for running the AI workloads. Thus by configuring the same GPU infrastructure for both types of workloads, MNOs can significantly improve utilization rates.
  2. Dynamic Scaling:some text
    • The webinar demonstrated how dynamic scaling techniques allow RAN and AI workloads to scale in and out based on real-time traffic demands. This automation ensures optimal use of resources, reducing operational costs and enhancing performance.
  3. Monetizing Unused Capacity:some text
    • Traditional RAN infrastructure is provisioned for peak hours and hence is often underutilized during off-peak periods. This causes revenue loss especially because of costly GPU computes. MNOs can capitalize on their infrastructure by selling unused GPU cycles as spot instances for running the AI workloads. This additional revenue stream can significantly shorten the ROI period for existing investments.
  4. Automation and Efficiency:some text
    • Automating the orchestration of RAN and AI workloads minimizes manual intervention, leading to greater efficiency and consistency. This approach also simplifies management and operational tasks, allowing MNOs to focus on strategic initiatives.

Demo Highlights:

The webinar included a live demonstration of dynamic orchestration in action, showcasing real-world applications and benefits. Attendees were able to see firsthand how automated scaling and resource management can transform infrastructure utilization.

Conclusion:

The integration of RAN and AI workloads on a common GPU cloud represents a significant advancement for MNOs, offering enhanced efficiency, reduced costs, and new revenue opportunities. As the telecom industry continues to evolve, adopting such innovative solutions will be crucial for staying competitive and maximizing infrastructure investments.

For those who missed the live session, you can watch the recorded webinar here.

Amar Kapadia

IaaS vs. PaaS vs. SaaS: NCP Productization Options for a GPU-as-a-Service AI Cloud Offering
Find out more

Are you a data center provider, telco, NVIDIA Cloud Partner (NCP) or startup that has decided to offer a GPU-as-a-Service (GPUaaS) AI cloud? You need to rapidly decide what your offering is   going to look like. With multiple technical options, the ultimate decision depends on your customer requirements, the type of competition you are facing and your desired differentiation in an increasingly commoditized service.

Some first level decision points are whether your offering will be  Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) or Software-as-a-Service (SaaS). Of course, these are not mutually exclusive, you may choose to offer a combination. Let’s dig into some details.

IaaS

IaaS largely means offering compute instances with GPUs to end users. This is probably the most common offering today. The sizing of these instances will vary based on the GPU capability, vCPU count, memory and storage sizing, and network throughput. Even with IaaS, there are some sub-options:

  • BMaaS or Bare-Metal-as-a-Service. A server like NVIDIA HGX or MGX could be offered as a service with a simple operating system. The benefit for a user is to be able to get the instance on-demand using a self service mechanism. The user has full control of that bare metal server and can release the instance when they are done, without incurring any CAPEX.
  • VM: If your customers need instances smaller than a single bare metal server (e.g. for inferencing), you will need to turn to virtualization. With virtual machines, you can offer fractional servers. With the VMware cost increases and OpenStack increasingly becoming a legacy technology, your choice is realistically limited to Kubernetes (see Kata containers). 
  • Clustered instances: If your customers are interested in model training, then they will need multiple GPUs clustered into a single instance. For example, multiple HGX servers will have to be clustered together and offered as a single instance to your customers.

Of course with IaaS, you will encounter challenges like multi-tenancy and isolation, self-service APIs, and on-demand billing that will need to be solved to be able to offer a complete solution to customers.

PaaS

With PaaS,  complexities of the underlying infrastructure are hidden and the offering is a higher level abstraction. The options range from a GPU based Kubernetes cluster optimized to run NVIDIA NIM, LLMOps/MLOps, fine-tuning-as-a-Service, vector-database-as-as-Service, GPU spot instance creation (to sell excess unused capacity), among other services. A move from IaaS to PaaS instantly creates more value around your offering but requires additional technical sophistication and instrumentation.

Saas

The next level of sophistication is to offer managed software directly to users in the form of SaaS. This could include LLM-as-a-Service (similar to what OpenAI and the hyperscalers provide), RAG-as-a-Service, and more. This layer adds even more value than IaaS or PaaS.

To compete you will need to move up the value chain, leaving the low level “boring” infrastructure orchestration & management to Aarna.ml so that you can focus on building your differentiation.The Aarna Multi Cluster Orchestration Platform (AMCOP) orchestrates and manages low level infrastructure to achieve network isolation, Infiniband isolation, GPU/CPU configuration, OS and Kubernetes orchestration, storage configuration and more. Once the initial orchestration is complete, AMCOP monitors and manages the infrastructure as well. If you would like to slash your time-to-market, and build a differentiated and sustainable GPUaaS please get in touch with us for an initial 3-day architecture and strategy assessment.

Pavan Samudrala

Introducing AMCOP v4.0.1: Revolutionizing Private 5G Edge Orchestration
Find out more

We're excited to announce the release of AMCOP 4.0.1, the latest version of our Private 5G Edge Orchestrator. This is the first release based on Nephio/Kubernetes. Nephio Project, initiated by Google and backed by the Linux Foundation, is dedicated to providing carrier-grade, user-friendly, open, Kubernetes-based cloud-native intent automation.

The project aims to offer common automation templates for seamless deployment and management. Designed to simplify the complexities of enterprise edge and private 5G networks, AMCOP 4.0.1 introduces powerful enhancements, including Bare Metal Provisioning and OAI (OpenAirInterface) End-to-End Orchestration, that redefine the landscape of edge computing and network automation.


AMCOP Private 5G Edge Orchestrator serves as a comprehensive platform for orchestration, lifecycle management, real-time policy enforcement, and closed-loop automation of 5G network services and edge computing applications. By enabling zero-touch orchestration of edge infrastructure, applications, and network services at scale, AMCOP provides organizations with a unified management experience through a single pane of glass.

In AMCOP 4.0.1, the Orchestrator functionality is further enhanced with the introduction of Bare Metal Provisioning capabilities. This feature streamlines the process of setting up and managing infrastructure resources, and has options of including VM fleet management on platforms like VMware and Kubernetes cluster creation. With Bare Metal Provisioning, organizations can effortlessly deploy and manage their private 5G infrastructure, ensuring optimal performance and reliability.

Additionally, AMCOP 4.0.1 introduces OAI End-to-End Orchestration, enabling seamless integration with OpenAirInterface (OAI) technologies for end-to-end management of 5G network services. This integration facilitates the deployment and operation of various types of workloads, including Kubernetes workloads, with provisions of integrating VMs as Kubernetes objects using Kubevirt, and VMs on hypervisors like ESXi. With OAI End-to-End Orchestration, organizations can create a seamless connection across data resources and processes, driving efficiency and innovation in their edge computing environments.

Furthermore, AMCOP 4.0.1 continues to excel in lifecycle management, configuration management, KPI monitoring, and service assurance capabilities. The platform ensures seamless orchestration of containerized workloads across diverse clusters, simplifying deployment, updating, scaling, and monitoring of applications. With a centralized interface providing a "single pane of glass" view, administrators can efficiently manage the entire system, simplifying monitoring, troubleshooting, and management tasks.

In conclusion, AMCOP 4.0.1 represents a significant leap forward in private 5G edge orchestration, empowering organizations to unlock the full potential of their edge computing environments. With enhanced capabilities for Bare Metal Provisioning and OAI End-to-End Orchestration, AMCOP continues to lead the way in revolutionizing network automation and edge computing.

Join us in embracing the future of enterprise edge and private 5G networks with AMCOP 4.0.1. Experience it through our User Experience Kit.

Sandeep Sharma

Unlocking the Potential of Nephio R2 at Nephio India Meetup
Find out more

At the recent Nephio India Meetup held in Bangalore, Sandeep Sharma, Principal Architect at Aarna.ml, shared exciting advancements in Nephio R2. This update crucially supports multi-vendor orchestration across diverse 5G network components.

Challenges and Solutions

Managing networks across multiple vendors can be complex, demanding integrated strategies for various network functions. Nephio R2 introduces a topology controller (experimental) and enhanced automation capabilities to simplify the deployment, configuration, and monitoring of complex network setups. This tool enables network architects to define high-level intents for network configurations, making operations across heterogeneous environments more straightforward.

Watch Sandeep’s full discussion on the transformative capabilities of Nephio R2.

Benefits and Practical Applications

Nephio R2 is particularly beneficial in environments requiring specific configurations for network functions such as User Plane Functions (UPF), allowing for more efficient network management and reduced operational costs. The development of Nephio R2 signifies a significant advancement towards more adaptive, resilient, and efficient network management frameworks, supporting the rapidly evolving demands of modern telecommunications.

Engage with Us

Explore how Nephio R2 can optimize your network operations. For a deeper understanding or to discuss how Aarna.ml can assist in your digital transformation journey, contact us.