Insights on AI Cloud Orchestration

Amar Kapadia

Mar 20

·

read

RAN-in-the-Cloud: Why must a successful O-RAN implementation run on a cloud architecture?

Find out more

A new radio area network standard called Open RAN (O-RAN) promises to accelerate the disaggregation of 5G networks. Until recently, the RAN was completely closed, creating vendor lock-in and allowing a handful of vendors to command high prices. Not only did this cause the cost of building out a RAN site to go up, but it also created inflexible and closed networks that did not allow new monetizable services. Mobile Network Operators (MNOs) and governments decided that this was not an ideal situation which led to the formation of the O-RAN Alliance, a standards development organization (SDO) that creates detailed specifications for the various internal interfaces of a RAN, thus allowing for its disaggregation. Disaggregation is expected to foster innovation, result in new monetizable services, and reduce costs.

What does the future hold for O-RAN? I think there are three possibilities:

O-RAN is a failure
O-RAN gets a hollow victory
O-RAN is a true success

Let us evaluate each scenario.

Scenario#1 – O-RAN is a failure: This could happen if O-RAN is unable to meet or exceed existing proprietary RAN solutions on key performance, power, and cost metrics. I think the probability of this outcome is relatively low. Technologies such as NVidia Aerial and alternatives from other semiconductor vendors will ensure that O-RAN performs just as well on performance as proprietary RAN, at similar or lower price points. Nevertheless, we cannot eliminate this possibility yet as we need to see more proof points.

Scenario#2 – O-RAN gets a hollow victory: If O-RAN solely matches proprietary RAN on key metrics and solely provides disaggregation as the differentiator, there is a significant danger that incumbents will “O-RAN-wash” their products and the status quo will persist for 5G. The incumbents will call their vertically integrated products O-RAN compliant while in reality they will only support a few open interfaces. Interoperability with third parties will be suboptimal, forcing MNOs to purchase a vertically integrated stack. In this case, there simply won’t be enough leverage to force the incumbent vendors to truly open up nor will there be enough incentive for MNOs to try out a new vendor.

Scenario#3 – O-RAN is a true success: For this possibility, O-RAN based implementations must provide greater value than proprietary RAN. Let’s now explore this possibility.

Embracing Cloud Architecture will be a Game Changer

For O-RAN based implementations to provide more value than proprietary RAN, they must use an end-to-end cloud architecture and be deployed in a true datacenter cloud or edge environment; hence the term “RAN-in-the-Cloud”. The simple reason is that a cloud can run multiple workloads, meaning cloud hosted O-RAN can support multi-tenancy and multiple services on the same infrastructure. Since RAN implementations are built for peak traffic, they are underutilized, typically running at <50% utilization. In a traditional architecture that uses specialized acceleration or an appliance like implementation, nothing can be done to improve this utilization number. However, in a RAN-in-the-Cloud implementation, the cloud can run other workloads during periods of underutilization. An O-RAN implementation built by fully embracing cloud principles will function in a far superior manner to proprietary RAN as the utilization can be optimized. With increased utilization, the effective CAPEX and power consumption will be significantly reduced. The RAN will become flexible and configurable i.e., 4T4R, 32T32R or 64T64R or TDD/FDD on the same infrastructure. As an added benefit, when the RAN is underutilized, MNOs can pivot their GPU accelerated infrastructure to other services such as edge AI, video applications, CDN, and more. This will improve the monetization of new edge applications and services. Overall, these capabilities will provide MNOs with the leverage they need to force the incumbents to fully comply with O-RAN and/or try out new and innovative O-RAN vendors.

To be considered RAN-in-the-Cloud, the O-RAN implementation must use:

● General purpose compute with a cloud layer such as Kubernetes

● General purpose acceleration, for example NVidia GPU, that can be used by non-O-RAN workloads such as AI/ML, video services, CDNs, Edge IOT, and more

● Software defined xHaul and networking

● Vendor neutral SMO (Service Management and Orchestration) that can perform the dynamic switching of workloads from RAN→non-RAN→RAN; the SMO^[1] also needs the intelligence to understand how the utilization of the wireless network varies over time. The Aarna.ml Multi Cluster Orchestration Platform SMO is a perfect example of such a component.

You can see an example of this architecture presented during the upcoming session at GTC this week: “Big Leap in VRAN: Full Stack Acceleration, Cloud First, AI and 6G Ready [S51797]”. In my view, this reference architecture will drive O-RAN to its full potential and is the type of architecture MNOs should be evaluating in their labs.

References:

● NVIDIA Blog: https://developer.nvidia.com/blog/ran-in-the-cloud-delivering-cloud-economics-to-5g-ran/

● Video: https://www.youtube.com/watch?v=FrWF1L8jI8c

● Solution Brief: https://www.youtube.com/watch?v=FrWF1L8jI8c

‍

^[1] Strictly speaking, the SMO as defined by the O-RAN Alliance is only applicable for the RAN domain. However, we are using the term SMO more broadly to include orchestration of other domains such as edge computing applications, transport, and more.

‍

Ankit Goel

Mar 16

·

read

Deploying Juniper cRPD on Ubuntu nodes with BGP peering

Find out more

Are you looking to deploy Juniper cRPD docker containers on Ubuntu nodes with BGP peering? Based on our recent experience, we've set up this comprehensive wiki page to share our learnings with step-by-step instructions and helpful tips to ensure a successful deployment.

This detailed guide walks you through the entire process, from installing Docker on two different ubuntu20.04 nodes to configuring the cRPD containers and setting up and verifying BGP peering.

Give this a shot a let us know what you find out.

Read the Wiki

Brandon Wick

Mar 9

·

read

New Edge Native Application Principles Whitepaper Available

Find out more

The growth of the cloud native ecosystem has been explosive, unprecedented, and far reaching. The definition of “cloud native” along with the development and consistent application of cloud native principles provided the foundation upon which developers have built a software cathedral beyond anything imaginable even just 10 years ago. One look at the CNCF landscape shows us how far we have come in a short time and provides a model for what’s possible in the future.

“Cloud Native” of course arose to meet the demands and possibilities of cloud computing – a paradigm shift, transformative force, and catalyst for parallel and supporting innovations, models and ecosystems. Edge Computing is intricately woven into the cloud, while expanding the frontiers of compute, storage, and networking to any edge device that can run Kubernetes. Perhaps not surprisingly, the cloud edge is perhaps the fastest growing area of edge computing, driven by multi-cloud networking, machine learning, storage repatriation and more.

But edge computing occurs in different environments than cloud computing – constrained by smaller form factors deep in public and private networks. In edge environments, compute, connectivity, storage, and power are all limited, necessitating new approaches and a new set of edge native principles.

It is in this spirit that discussions began in the IOT Edge Working Group in the middle of 2022. Our goal was to explore what it means to be edge native, the differences and similarities between “cloud native” and “edge native”, and to proffer an initial set of principles for the industry to interpret, apply, and iterate upon. We landed on the following five broad principles:

Resource and Design Aware (includes 3 sub-principles)
At Scale Management (includes 3 sub-principles)
Spanning
Resource Usage Optimization
Applications are Portable and Reusable (within limits)

The IoT Edge Woking group debuted the Edge Native Application Principles Whitepaper draft on Github and on the stage at KubeCon + CloudNativeCon North America 2022 and incorporated community feedback into this current draft. We hope to evolve the whitepaper over time as well as develop tangential papers as needed to give developers the guidance needed to accelerate development in the edge frontier. We hope that you will read the whitepaper, try out these principles in practice, and share your feedback with the IOT EDGE working group.

DOWNLOAD THE WHITEPAPER HERE

Brandon Wick

Mar 7

·

read

Aarna Showcases 3 O-RAN Demos at MWC and Much More

Find out more

Aarna.ml presented a number of industry leading initiatives at Mobile World Congress in Barcelona, Feb 27 - Mar 2, in partner booths and select meetings. If you didn't have the chance to meet us in Barcelona, he's a recap.

Booth Demos:

O-RAN orchestration on O-Cloud Optimized for hybrid environments (Aarna.ml, Rebaca Technologies, Red Hat, VoerEir AB). See it in the O-RAN Virtual Showcase.

Orchestration and management of RAN elements using SMO over O1 interface (Aarna.ml, Capgemini Engineering): Zero-touch provisioning (ZTP) of Small cells. See it in the O-RAN Virtual Showcase.

Core Network (CN) for Non-public Networks (NPN) 5G using O-RAN SMO and Automation (Aarna.ml, HFCL)

Other Highlights:

Learn more in the press release.

Didn't have a chance to meet with us Barcelona? Contact us to book a meeting: https://lnkd.in/erBGgPMk

Amar Kapadia and Subramanian Sankaranarayanan in the Red Hat Booth at MWC 2023

Sandeep Sharma

Mar 1

·

read

Edge relocation triggered by UE mobility

Find out more

In the world of 5G, edge computing plays a critical role in ensuring low-latency and high-performance applications. However, with the increased mobility of users, there is a need for seamless migration of the Application Function (AF) and User Plane Function (UPF) to another edge cluster. In this use-case, we will simulate UE mobility and explore the AF and UPF migration to another edge cluster, using Aarna’s AMCOP platform In place of real UE (User Equipment) devices, we use a software simulator (UE Sim).

First, it is important to note that the 5G Core and MEC applications must be deployed in the target cluster, which can communicate with the 5GC via the NEF. Additionally, AMCOP should have at least two target clusters onboard, and the UE simulator must have a PDU session established with UPF in one of the edge clusters. All 5GC functions in this use case belong to the same PLMN.

The following diagram describes the use-case at a high level.

Assuming that the AF (and UPF) are relocated to the other edge cluster, the AF has to now initiate the following procedure in 5GC.

When the UE moves to another location, it triggers the relocation of the AF and UPF to another edge cluster. After the relocation, the AF in the source cluster calls traffic influence APIs exposed by the NEF in 5GC. The NEF should expose the following APIs at the minimum for this to occur:

Nnef_TrafficInfluence_Create/Update/Delete
Nsmf_EventExposure_Notify
Nnef_TrafficInfluence_Notify

The AF then uses the traffic influence API to notify the SMF in the 5GC about the change in traffic patterns. The SMF, in turn, acts upon the traffic influence rules initiated by the AF and updates the PDU session.

It is important to note that AF requests targeting an individual UE by a UE address are routed to an individual PCF using the BSF. This routing ensures that the AF can accurately identify the UE, which is necessary for proper traffic influence.

In summary, with the help of the NEF and SMF, the AF can seamlessly migrate to another edge cluster, ensuring uninterrupted service to the end-users. This use case highlights the importance of proper traffic routing and communication between 5GC functions to ensure a smooth and efficient network experience.

Rajendra Mishra

Feb 24

·

read

HA deployment of AMCOP and dealing with hardware failures

Find out more

High availability (HA) is an important aspect of any production deployment. In the context of Kubernetes, HA is achieved by deploying multiple nodes for workers as well as masters. This ensures that in case of node failures, the workload can be distributed to other nodes, ensuring high availability.

In the case of AMCOP deployment on Kubernetes, HA is essential to ensure that all services are still reachable in the event of node failures. To validate this, we deployed AMCOP on a multi-node cluster and simulated a graceful shutdown of nodes. During this process, we ran continuous tests that accessed various services to ensure they were still available, including:

Cluster automation: This continuously adds and removes clusters to/from AMCOP. The idea here is that the script will ensure the availability of all cluster management services in the EMCO module of AMCOP.
CDS Workflow: This continuously tests CBA (Controller Blueprint Archive) in a loop. This ensures that CDS, Camunda and mariadb pods are live and responding.
SMO: Similar tests to run basic configuration operations on CU/DU simulators.

To achieve HA, we recommend the following configuration:

Deploy multiple worker nodes: This ensures that workload can be distributed across multiple nodes and avoids a single point of failure.
Deploy multiple master nodes: This ensures that if a master node fails, there are other nodes available to take over the workload.
Use a load balancer: This ensures that requests are distributed evenly across all nodes, preventing any one node from becoming overwhelmed.

It's important to note that while k8s has built-in resilience to handle node failures, there are certain cases where administrator intervention is needed, particularly for stateful applications and persistent volumes. In these cases, it's important to have a disaster recovery plan in place to minimize downtime and ensure data integrity.

In conclusion, HA deployment on Kubernetes is crucial to ensure high availability of services and to minimize downtime in the event of node failures. Continuous testing and monitoring can help ensure that all services are still reachable, and a disaster recovery plan can help minimize the impact of any hardware failures. By following these best practices, AMCOP deployments can ensure a high level of reliability and availability.

‍

Resources

resources

Blog

Embracing Cloud Architecture will be a Game Changer

Read the Wiki

DOWNLOAD THE WHITEPAPER HERE