KeyCloak High Availability

Previously, I've discussed how I used KeyCloak in my application, and how I integrated it as part of the devops process. As this setup moved towards UAT, I realized that high availability doesn't work without some configurations, and causes weird issues with logins unless correctly configured. Let's look at the details.

So, for KeyCloak deployed on Kubernetes, you cannot simply set replicaCount of the deployment to greater than 1 and expect it to work out of the box. As I delved deeper into the KeyCloak documentation, it was clear how mature KeyCloak is, and how much thought has been put into its design.

When you have multiple containers running KeyCloak, they obviously have to connect to the same database. If there had been no cache or in-memory aspect, this would have worked OOTB. But as I came to know, KeyCloak uses Infinispan cache, an open source in-memory distributed data store. KeyCloak uses this cache for storing things like client and user sessions. So unless this cache is configured properly for a multi-container setup, you will see strange things like redirects not working. What would happen is the different calls would hit different containers each time, and each container would not be aware of the previous calls, cached data etc. and you would end up seeing the login screen again and again.

Infinispan uses something called JGroups to discover nodes, and this is based on certain "ping" protocols. There are many like DNS_PING, JDBC_PING and KUBE_PING, and with trial and error, I found KUBE_PING worked for me well.

Here's how the deployment file changed, along with the other k8s objects that needed to be created.

Add a service account for the deployment, which has specific permissions to get and list pods in the namespace. This is required so that the cluster can be setup by discovering other pods.
Setup the ping protocol and other parameters like cache owner count.

spec:
      serviceAccountName: jgroups-kubeping-service-account
      ...
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.global.containerRegistry.url }}/{{ .Values.image.repository }}:{{ template "image.version" . }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
        env:
          - name: PROXY_ADDRESS_FORWARDING
            value: "true"
          - name: DB_VENDOR
            value: "mssql"
          - name: DB_ADDR
            value: server_name
          - name: DB_DATABASE
            value: database_name
          - name: DB_USER
            value: user
          - name: DB_PASSWORD
            value: password
           ...
          - name: JGROUPS_DISCOVERY_PROTOCOL
            value: kubernetes.KUBE_PING
          - name: CACHE_OWNERS_COUNT
            value: "2"
          - name: KUBERNETES_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: {{ .Values.containerPort }}
          protocol: TCP

As you can see, both the service account and the ping protocol have been added in the deployment.yaml file.

Here're the service account, and the cluster role and bindings:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: jgroups-kubeping-service-account
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: jgroups-kubeping-pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: jgroups-kubeping-api-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: jgroups-kubeping-pod-reader
subjects:
- kind: ServiceAccount
  name: jgroups-kubeping-service-account
  namespace: {{ .Release.Namespace }}

As you start your deployment, monitor the logs of the pods. You would see that they are able to find and connect to other instances of the deployment.

Reference: https://www.keycloak.org/2019/05/keycloak-cluster-setup.html