Previously, I've discussed how I used KeyCloak in my application, and how I integrated it as part of the devops process. As this setup moved towards UAT, I realized that high availability doesn't work without some configurations, and causes weird issues with logins unless correctly configured. Let's look at the details.
So, for KeyCloak deployed on Kubernetes, you cannot simply set replicaCount of the deployment to greater than 1 and expect it to work out of the box. As I delved deeper into the KeyCloak documentation, it was clear how mature KeyCloak is, and how much thought has been put into its design.
When you have multiple containers running KeyCloak, they obviously have to connect to the same database. If there had been no cache or in-memory aspect, this would have worked OOTB. But as I came to know, KeyCloak uses Infinispan cache, an open source in-memory distributed data store. KeyCloak uses this cache for storing things like client and user sessions. So unless this cache is configured properly for a multi-container setup, you will see strange things like redirects not working. What would happen is the different calls would hit different containers each time, and each container would not be aware of the previous calls, cached data etc. and you would end up seeing the login screen again and again.
Infinispan uses something called JGroups to discover nodes, and this is based on certain "ping" protocols. There are many like DNS_PING, JDBC_PING and KUBE_PING, and with trial and error, I found KUBE_PING worked for me well.
Here's how the deployment file changed, along with the other k8s objects that needed to be created.
- Add a service account for the deployment, which has specific permissions to get and list pods in the namespace. This is required so that the cluster can be setup by discovering other pods.
- Setup the ping protocol and other parameters like cache owner count.
spec:
serviceAccountName: jgroups-kubeping-service-account
...
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.global.containerRegistry.url }}/{{ .Values.image.repository }}:{{ template "image.version" . }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
allowPrivilegeEscalation: false
privileged: false
env:
- name: PROXY_ADDRESS_FORWARDING
value: "true"
- name: DB_VENDOR
value: "mssql"
- name: DB_ADDR
value: server_name
- name: DB_DATABASE
value: database_name
- name: DB_USER
value: user
- name: DB_PASSWORD
value: password
...
- name: JGROUPS_DISCOVERY_PROTOCOL
value: kubernetes.KUBE_PING
- name: CACHE_OWNERS_COUNT
value: "2"
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
ports:
- name: http
containerPort: {{ .Values.containerPort }}
protocol: TCP
As you can see, both the service account and the ping protocol have been added in the deployment.yaml file.
Here're the service account, and the cluster role and bindings:
apiVersion: v1
kind: ServiceAccount
metadata:
name: jgroups-kubeping-service-account
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jgroups-kubeping-pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jgroups-kubeping-api-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jgroups-kubeping-pod-reader
subjects:
- kind: ServiceAccount
name: jgroups-kubeping-service-account
namespace: {{ .Release.Namespace }}
As you start your deployment, monitor the logs of the pods. You would see that they are able to find and connect to other instances of the deployment.