Kubernetes Operators: When not to create one

Over the past few years in the realm of DevOps and Kubernetes in particular, Kubernetes Operator pattern has been a trending topic. In my personal experience, while working with different projects for different applications, it was evident that some of the software development teams, or organizations were too quick to jump on the Kubernetes Operator bandwagon, without analyzing the real-world problem they were trying to solve through the implementation of a Kubernetes Operator.

More often than not, the implementation of a kubernetes operator was done only because it was understood as the most trending topic or the implementation pattern in the kubernetes domain at that point. I have seen certain software development teams going to the extent where they implement different wrapper CRDs, or controllers around an open source kubernetes operator, so that certain organizational practices and standards were encapsulated to these wrapper controllers while giving the organizations the ability to deploy and use the open-source Kubernetes Operator in their application stack. If simply put, all that cost, time and effort from the software developers were invested in these kinds of implementations only to give the ability to use a specific, well known Kubernetes Operator in their application stack, while they had completely forgotten why anyone should really have a Kubernetes Operator or what actual problems a Kubernetes Operator is supposed to solve fundamentally in your application.

The purpose of this article is not to criticize the use of Kubernetes Operator pattern or manifest it as a bad practice. It is in fact quite the opposite. Kubernetes Operator pattern is a highly useful concept in a Kubernetes stack, which helps devops engineers or software developers to tackle certain complexities of the application lifecycle while deployed on a Kubernetes platform. What we rather intend to illustrate through this article are the actual real world use cases that Kubernetes Operator pattern is supposed to solve, by first discussing the use cases which might not be applicable for it.

Let us first look at what is meant as an operator pattern in Kubernetes. There are three key points we should aim to suffice through an operator implementation.

  1. It should be a piece of software that automates a repeatable task of a stateful application and replaces the necessity of a human operator.
  2. It should be a software extension to the Kubernetes API that makes use of a Custom Resource.
  3. It should follow the Kubernetes principals, mainly the Control Loop.

While all three points above have an equal importance, we think the most important point out of three is the first one.

Now let us jump into our main topic, when should we not create an operator, or if we try to rephrase it in a more diplomatic way, “When should we rethink our decision to write an operator”.

Is my application a stateful application?

Even though official documentation for Kubernetes on operator pattern does not strictly mention about the application you are building the operator for, to be a stateful application, what we should understand is, an application which does not have a state will not really require some controller to handle its deployment lifecycle. Notice how we have used the term “controller” here instead of “operator”. We will discuss this in a moment.

The simple reason being, if a particular application does not have a state, what it really means is you can pretty much use the native features of Kubernetes such as a deployment controller to handle the full life cycle of that application. But when an application is a stateful application then there is a possibility that you cannot freely replace a given replica of that particular application instance with a new replica (like it is supposed to happen in the kubernetes world all the time).

There is a chance some additional work such as leader elections, handling checkpoints, managing quorums, or restoring a backup is there that must be done when bringing up a new replica. Now, some of these tasks may not essentially be a part of the application code itself, maybe these are some manual steps that a human operator must execute. So, this is where the real use of an operator on kubernetes can pay off well. You can write a piece of software to encapsulate all that domain / application specific logic and then combine it to the Kubernetes API as an extension.

Where do the CRDs and Control Loop come into play in this context? It is quite simple, CRDs pave the way for the end user to declare the desired state of the application that the operator has to maintain. End user will create / update a CR declaring the spec of the desired state, and then through the control loop, the operator will try to bring the application to that desired state. It will also report back the status of the application to the same CR.

Does this mean, we cannot write a similar operator to a stateless application? You absolutely can. However, in that case this should rather be considered as a controller not an operator. Also, if you are thinking to write such a controller to manage the lifecycle of a stateless application, it would rather be an overkill because there should be other means to achieve the same thing just using the native Kubernetes API without having to extend it with a whole new set of CRDs. Or this could even mean that your actual problem must be lying somewhere else, and you are misusing the operator pattern simply because you either want to use a Kubernetes object to handle a non-kubernetes resource in your application or you are trying to fix a configuration management problem through it.

Any code is a liability

There are many frameworks available now to make the implementation of an operator an easy task. However, it still requires a certain effort from the developers to understand the complexity of the application and the actual business requirement you need to address through the operator’s control loop.

It is also seen now-a-days that certain teams, organizations develop operators for 3rd party open-source applications that are not owned by these teams or organizations. The logic written into an operator is an imperative workflow that couples your application’s business logic into the Kubernetes control loop. This may initially not look concerning to anyone, however the team that develops the operator will eventually be responsible to maintain the operator codebase to support the potential changes in the actual application that can impact its deployment lifecycle. Even though writing an operator is not a difficult task now with the availability of different frameworks, it will still be something more complex than writing a piece of configuration to support deployment lifecycle with the use of native Kubernetes resources.

Also, unless you have a clear understanding of the potential changes coming into the particular application you are writing the operator for, you are putting yourself in a position where you have to invest a dedicated time and effort just to maintain the operator codebase to adhere to those changes from time to time, as they come.

This is the reason why it is recommended to leave the decision of writing an operator to the application owners or at least get enough understanding of the future changes the application may face, before starting writing an operator yourself for that application.

In either case, it is better to make yourself liable for a piece of configuration that you can easily modify than to make yourself liable for an entire codebase of an operator. So, it is a wise decision to seek alternative ways that you can handle complexities in the lifecycle of an application rather than writing a piece of code that works as an operator and making yourself liable to it, especially if you do not own that application.

Resource Concerns: Operators are not exactly part of your actual workload

In certain cases, you may have to run your application in a resource critical cluster. When you have an operator to manage the lifecycle of this application, the operator itself will require a certain level of resources (cpu, memory, network) allocated towards it from the same cluster where you run the workload.

What we expect from an operator is to maintain the lifecycle of a given number of application instances, by reconciling their state to a desired state, as specified by the user through the custom resource. Therefore, operators are a part of your control plane rather than the workload itself. Now imagine a situation where you must provision a large number of application instances.

This means a few things,

  1. You will have to create an equal number of custom resource objects.
  2. The operator or operator instances will be running reconciliation loops to handle the state of all the application instances represented by each individual custom object. This could be resource intensive operation.
  3. You are using the Kubernetes etcd store to keep track of all of the custom objects and your operator will be communicating with Kubernetes API quite frequently.
  4. On all the other occasions where your custom objects are in the desired state, the operator will sit idle, but it still requires some resources from the cluster.

As you can see, having an operator to manage such a large number of custom objects could impact your cluster resource wise, in multiple ways.

Therefore, when you want to run your stateful application in a Kubernetes stack, writing an operator may sound appealing but you must remember the impact it may have on your cluster resources which is primarily meant for running your workload.

Security Concerns: Is it worth running an operator with elevated privileges for the duration of your application?

This is one of the reasons why writing an operator should be your last resort. An operator is a highly privileged entity compared to your actual application.

If we take a step back and consider what your operator does, all it does is maintain the state of your application instances to a desired state as specified by the CR. For doing so, it requires a certain level of privileges to your Kubernetes API. Now depending on the type of resource objects the operator is supposed to manage, you can grant permission to specific Kubernetes resources in either “namespace” scope or “cluster” scope. In most cases, what we have seen in certain existing open-source operators is that they sometimes get deployed with RBAC permission to your cluster resources than they require.

Nevertheless, what is important here is that the operator will be running with all those permission to your Kubernetes cluster for the duration of your application, even when an actual reconciliation of the custom resources happens occasionally. So, the amount of time the Kubernetes Operator will require these permissions to carry out its functionality is only a fraction compared to the time it will actually be running.

Considering these aspects of an operator, if you are thinking of writing one that will most probably do one-off tasks or tasks that will happen in a less frequent window (e.g., backup, restore), it could be worthwhile to analyze the possibility of using jobs or cron-jobs available in the standard Kubernetes API than investing all your effort to build a complex piece like a Kubernetes operator.

Configuration Management: Operators should not be a solution to your configuration management problem

Something that we have seen in common in most operator implementations is that it helps end-users to manage the application configurations through a well-structured object like a custom resource. A custom resource has a predefined schema, managed through a custom resource definition (CRD). This CRD will be a dedicated one for the application, so validating the inputs a user can provide to configure the application state is more controlled and streamlined. When you consider the standard ways that a user can pass application configurations, it is either using a configmap or a kubernetes secret which are more generic approaches.

A frequent implementation pattern that we have seen is that certain operators are sometimes implemented to just make use of this structured configuration that can be achieved using a CRD. These operators mainly target to expose the application configurations via a CRD, so there is a control over the user inputs. They should rather be called as controllers than operators because they do not essentially do anything specific to handle any application state related activities during the application lifecycle. While anyone is free to write a piece of software that is meant for handling configurations of an application through a CRD, it also may be an overkill. Because there is much more generic tooling available in the Kubernetes ecosystem, to achieve the same thing. For example, for someone using helm to manage a deployment and lifecycle of an application, certain features such as “--verify”, or json-schema validation are available to validate the user inputs which can eventually be mapped to a generic resource such as a configmap.

Question we should ask here really is, “is it really worth writing an application specific piece of software to manage the configuration, when there is much more generic tooling available to specifically address configuration related issues of applications deployed on Kubernetes?”

These are the key areas that we would like to think a devops engineer or a software developer should consider, before starting to write a Kubernetes operator. We would like to end this article with the following note, “The fact that it is possible to write a Kubernetes operator as a solution to a given problem does not always mean you should write one.

References:

https://kubernetes.io/docs/concepts/extend-kubernetes/operator/

https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/

https://thenewstack.io/kubernetes-when-to-use-and-when-to-avoid-the-operator-pattern/

https://sdk.operatorframework.io/docs/best-practices/best-practices/

Written by Nayanajith Chandradasa.


Introduction to Private 5G Networks

A Private 5G Network is an enterprise-dedicated network tailored to deliver the latest advancements of 4G LTE and 5G technology and drive the digital transformation of businesses and organisations across industries.

Private 5G provides secured communications with high bandwidth, low latency, and reliable coverage to connect people, machines, and devices. Private 5G solutions are ideal for IoT-intensive applications like intelligent manufacturing and sensitive environments like ports and banks.

More securely and efficiently than public 5G and LTE networks, private 5G offers solutions that provide connections to authorised users only within the enterprise and processes the generated data locally, in isolation from the public network. Easy to deploy, operate, and scale to meet all operational needs.

Spectrum, Coverage and Speed

The range of a private 5G network can vary from a few thousand square feet to thousands of square kilometres, depending on the power of the radio transmitter, the band used, and the user's requirements. A typical 5G radio that operates on low, middle, and high bands provides the following frequency ranges:

Deployment Scenarios

The 5G system is disaggregated into independent components known as "network functions" (NF) that communicate through a standard API. These NFs are accountable for the operation of mobile networks, including authentication, IP address allocation, policy control, and user data management. Service-Based Architecture makes the 5G system very flexible and able to add new services and applications to meet the needs of any industry.

5G Architecture

Figure 1 - 5G Architecture

Control and User Plane Separation adopted in the 5G architecture allow operators to separate the 5G control function from the data forwarding function. For example, the control plane can be deployed centrally, whereas the user plane function (UPF) could be deployed flexibly at any location within the network to accommodate the various data processing requirements.

The private 5G network architecture can be deployed in different scenarios to meet each customer's needs.  We can categorise the deployment based on the level of isolation and integration with the public network into three scenarios as follows:

Isolated Private 5G                     Shared Private 5G Network

Figure 2 - Isolated Private 5G                                               Figure 3 - Shared Private 5G Network
1 - Isolated Private 5G Network

Enterprise hosts and operates the 5G network (complete set: gNB, UPF, 5GC CP, UDM, MEC), where the network is physically isolated from the public network. Despite the high cost associated with deploying this scenario, it guarantees complete data security, reduces the likelihood of a data breach, and provides ultra-low latency connections.

2 - Shared Private 5G Network

A shared private 5G network scenario uses an operator's public network to reduce installation costs. Based on the business needs, the customer can choose the proportion of components they host and manage locally, and the elements will share with the mobile carrier.

MEC and UPF may be deployed on-site on premises like smart factories, stadiums, and cinemas, enabling a private 5G architecture with minimal latency and future changes. In addition, the business owner can control the radio access network locally to allow quick and reliable connection (RAN).

3- Private 5G Over Slicing

Depending on the application’s requirements, a Radio Access Network (RAN) may be installed on-site and connected to the public network via a dedicated data slice that provides private 5G service.

Private 5G Use Cases

Smart ports and manufacturing facilities:
Connect autonomous cranes, robots, drones, legacy machines, or edge gateways within facilities to achieve industrial network service level metrics.
Enable security applications:
Connect video cameras, fingerprint scanners, face detection, and automatic licence plate readers to private networks with guaranteed and dedicated bandwidth.
Enable smart operations:
Intelligent and secured operations are enabled using various private-tailed applications such as geofencing, digital twins, mobility prediction, and task scheduling.
Zinkworks and Private 5G

Zinkworks provides a Networked OT Orchestration platform purpose-built for Industry 4.0 and private 5G. Customers can use various automation solutions and ML-based prediction models developed by Zinkworks to monitor the network's performance and manage network resources more efficiently and securely. In addition, customers can create policy and service profiles with customised bandwidth, latency, and quality of service (QoS) to meet every application's needs.

Written by Mohamed Ibrahim.


RxJava Reactive Streams

Introduction to Reactive Streams

Before we talk about what RxJava is and fully understand it we must first comprehend some concepts and principles that are behind the creation of the API. In reality, RxJava is just part of a broader project called ActiveX which applies the concepts that will be explained here not only for Java but also to other platforms such as Python, Go, Groovy, C#, and many others. It is worth mentioning that ActiveX is not the only one to implement these ideas. Spring Boot Framework also has its own implementation and is called Spring WebFlux (result of the Spring Project Reactor).

Reactive Stream is an initiative to provide a standard for asynchronous stream processing with non-blocking backpressure.

— reactive-streams.org

RxJava as well as WebFlux are implementations of the Reactive Streams. But what exactly does this statement above mean? Traditional methods when called normally get blocked until it finishes whatever it needs to do. If the method is doing only mathematical calculations or checking some logic out of their arguments the non-blocking nature of the asynchronous stream processing will not make much difference, but if we start talking about accessing the file system, save a file to some device, read information from service, or communicate to a microservice remotely that is when things start to get interesting.

A scenario particularly interesting for Reactive Streams is in a microservices environments such as cloud environments. In such architectures we have many services talking to each other and every time this communication takes place the service that initiated it will need to wait for some time until it takes some action. On top of that, the agent providing the service usually does not respond to a single client but to multiple ones. It is in this sequence of events that Reactive Streams excel!

Reactive Streams solves the problem effectively by using something called Event Loop. Every time a new request comes to the Reactive Streams the thread used by the method does not get blocked. Just after it executes the request it goes and does something else, it does not wait. Only when request is done the Event Loop adds to the queue this new event and the next available thread is processed which means, no wasted resources! The usage time of every thread is used to the fullest.

Fig. 1 – Reactive Event Loop

The no-reactive method needs to instantiate a new thread every time a new request is made which means that if you have too many simultaneous requests you can end up with multiple threads sitting there just waiting, doing nothing and consuming resources.

Last but not least, reactive streams must support backpressure. This means that the receiver (Subscriber) of a reactive stream can control the number of events it is able to process. This is useful in cases where the sender (Publisher) produces more events than the receiver can handle, and backpressure is a mechanism to allow the sender to slow down the event generation in order to allow the receiver properly to process them.

Reactive Streams can be considered an evolution of the well-known observer pattern plus the addition of functional paradigm bringing to the mix a very powerful API. This API allows for the creation of a chain of methods bringing a very declarative style of programming as well as abstracting out low-level threading, synchronization, thread-safety, concurrent data structures, etc.

Reactive Reference Implementation

As mentioned, it is not only the ReactiveX project, more specifically RxJava, that implements the Reactive Streams standards which means that you are going to find similar structures and elements in different projects although using distinct names depending on each project.

At a very high level every Reactive Stream implementation has a Publisher,  the entity that produces the data to be consumed by a Subscriber.  Another important architectural element is the Subscription. The Subscription represents the message control link between Publishes and Subscribers itself by which it gives the Subscriber the capability to inform the Publisher how much data it can handle, in other words the entity that makes backpressure possible. In addition to that, between Publisher and Subscriber, we normally also have a chain of functions, known as function chain. It is through this chain of functions where all sorts of operations are applied over the streams such as Map, Filter, FlatMap, and many more.

Fig. 2 – Reactive Streams Base Classes

Keep in mind that Reactive Streams has its style on the bases of the Functional Paradigm, and therefore, having the knowledge of concepts such as immutability, pure functions, high-order functions, etc., is essential to fully understand the RxJava and properly use its API.

Some RxJava Code at Last

I know there is a lot of information to absorb before the first lines of code, but trust me, what I presented before will save you from a lot of trouble when developing a Reactive Functional Programming API such as  RxJava.

Hello World

package rxjava.examples;

import io.reactivex.rxjava3.core.*;

public class HelloWorld {
    public static void main(String[] args) {
        Flowable.just("Hello world").subscribe(System.out::println);
    }
}

Looking at this simple hello-world code might seem odd for someone used to working with traditional Object-oriented programming (OOP) only, but now that we have set the scene for the Reactive Functional Programming on the previous sections it will be much easier to understand what is going on here.

The first thing to note is the use of the Class Flowable. It is important to remember that here everything is a constant infinite flow of data or stream, and the Flowable class represents exactly that. Even to print a single String you need to somehow provide it through a stream. In such cases, Flowable gives the just method that gives an Observable object with just one item. You can think of Observable as a Publisher class mentioned before. This means that you need to subscribe to a Subscriber to read what is coming from the streamer. Here the subscriber simply prints out whatever is coming from the stream.

This API has much more power and flexibility than shown in this simple hello-world example, but it is when dealing with millions of data that Reactive Streams approach really shines.

Of course, there is a lot more to talk about regarding RxJava I have barely scratched the surface here. Apart from Flowable and Observable base classes there are also Single, Completable and Maybe base classes to deal with more specific situations that I haven’t even touched on here in this article.

Talking about everything RxJava is able to do would take many more pages, not a simple article like this one. The goal here is to just give a high-level overview of RxJava, the main concepts behind any Reactive Streams application as well as about Reactive Functional Programming paradigm.

Final Thoughts

I hope to further explore the RXJava API but this article explains the basics for any Reactive Stream which should enable the reader to quickly understand any implementation of the Reactive Streams.

Also, this article does not present examples on how powerful Reactive Streams standard is over the traditional blocking approach. To give the readers of this article an insight of its power I conducted a small experiment where I implemented a very simple REST service using Reactive Streams versus a traditional blocking one, and the result was pretty impressive.

For closure, I will leave the reader to take their own conclusions based on the result graphics of this experiment:

Fig. 3 – Traditional Blocking API Results

Fig. 4 – Reactive Streams API Results

Written by Berchris Requiao.