Rook (Ceph)

Summary

To provide resilient ephemerial storage for kubernetes cluster there are several different approaches. Some are fairly simple like hostPath but have limitations on scheduling pods. Others are more involved but offer abstraction and RAID-like functions.

Rook is an overlay to RedHats Ceph and provides distributed storage across physical nodes and disks. These are presented to the k8s cluster as PVs that Pods can claim and use.

The main benefit of Rook and Ceph is that it is self-healing and will automatically include new disks into the Ceph Cluster automatically.

The build-cluster-foundation Ansible playbook will install Ceph Operator (rook-ceph) and Ceph Cluster (rook-ceph-cluster) used to hold emphermial storage PVs.

It is also possible to reset Ceph deployment (and destroy all associated data) by using the build-cluster Ansible playbook with variable reset_cluster = True. Note, this will destroy the entire cluster as well as the Ceph deployments and data-stores.

Deployment

There are two parts to the deployment in the cluster:

Ceph Operator (rook-ceph)
Ceph Cluster (rook-ceph-cluster)

The Ceph Cluster will automatically look for nodes that have /dev/sdb unformatted device and use it. There must be at least three nodes that have such a device present for Ceph Cluster to come into service.

If you only want specific nodes to be used by Rook then you must include placement values section that essentially uses node-affinity and tolerations to select which nodes are candidates for block storage.

Note, if you change anything about a CephCluster CRD such as name or block storage devices things can get confusing very quickly and only some of the Ceph components will come up. See uninstall notes for more details on cleaning up an existing Ceph Cluster before attempting to re-deploy one.

Block Storage (sdb)

In this project, it is configured to be block-level and use an unformatted secondary disk on k8s nodes which should be unformatted (no partitions) and presented as /dev/sdb by the host node operating system.

The build-cluster Ansible Playbook will prepare those nodes that have a /dev/sdb device ahead of deploying Ceph. It assumes any device mounted as /dev/sdb can be formatted and used for this purpose.

StorageClass - ceph-block

Deployment of rook-ceph-cluster will create three k8s storageclass types: ceph-block, ceph-bucket and ceph-filesystem. The ceph-block being the k8s default storageclass.

These will be used by any other application deployment that needs ephemerial storage.

For example, Grafana Loki log storage PVs…

kubectl -n rook-ceph-cluster get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM    

pvc-00ae6c79-101b-441c-a182-1b49fefcc370   10Gi       RWO            Delete           Bound    monitoring/data-loki-read-0                                    ceph-block                                       136m
pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23   10Gi       RWO            Delete           Bound    monitoring/data-loki-write-2                                   ceph-block                                       136m
pvc-31ae8277-de95-447d-a7b6-195b54e678e7   10Gi       RWO            Delete           Bound    monitoring/data-loki-read-1                                    ceph-block                                       136m
pvc-3ece7f87-9a74-4b71-a4ac-d4b06b8287cc   10Gi       RWO            Delete           Bound    monitoring/data-loki-read-2                                    ceph-block                                       136m
pvc-513621af-1f7d-403d-9afe-5db5deabefce   10Gi       RWO            Delete           Bound    monitoring/data-loki-write-1                                   ceph-block                                       136m
pvc-94ee5ea9-fb0d-420f-b84a-ca99e942f38d   10Gi       RWO            Delete           Bound    monitoring/data-loki-write-0                                   ceph-block                                       136m

There are several Loki Pods that would be in “Pending” state without ceph-block storageclass (or some other default storageclass being available.)

pod/loki-read-0              0/1     Running   0              150m
pod/loki-read-1              1/1     Running   0              150m
pod/loki-read-2              1/1     Running   0              150m
pod/loki-write-0             1/1     Running   0              150m
pod/loki-write-1             1/1     Running   0              150m
pod/loki-write-2             1/1     Running   0              150m

Example…

kubectl -n monitoring describe pod loki-write-2

Name:             loki-write-2
Namespace:        monitoring
Priority:         0
Service Account:  loki
...
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-loki-write-2
    ReadOnly:   false
...

The data-loki-write-2 PVC is part of the rook-ceph-cluster.

kubectl -n rook-ceph-cluster describe pv pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23

Name:            pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
                 volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
                 volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph-cluster
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    ceph-block
Status:          Bound
Claim:           monitoring/data-loki-write-2
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            rook-ceph.rbd.csi.ceph.com
    FSType:            ext4
    VolumeHandle:      0001-0011-rook-ceph-cluster-0000000000000001-7ad8698b-75ac-4273-99ab-ea8e48dd8de6
    ReadOnly:          false
    VolumeAttributes:      clusterID=rook-ceph-cluster
                           imageFeatures=layering
                           imageFormat=2
                           imageName=csi-vol-7ad8698b-75ac-4273-99ab-ea8e48dd8de6
                           journalPool=ceph-blockpool
                           pool=ceph-blockpool
                           storage.kubernetes.io/csiProvisionerIdentity=1681419256637-8081-rook-ceph.rbd.csi.ceph.com
Events:                <none>

Ceph Mgmt Console

From the Home Network, you can reach the Ceph Cluster dashboard at http://ceph.cluster.home/dashboard.

TBD - Mgr Console expects HTTPS only… despite flags in chart. NGINX ingress seems to not recognise upstream HTTPS even with annotations and ignores the ingress rule and gives back 404.

Login credentials

kubectl -n rook-ceph-cluster get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

The Ceph Cluster (rook-ceph-cluster) mgmt dashboard is created once (by the first Rook object-store!) and can be seen in the example below running in HTTP mode.

kubectl -n rook-ceph-cluster get service rook-ceph-mgr-dashboard

NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
rook-ceph-mgr-dashboard   ClusterIP   10.98.93.6   <none>        8448/TCP   60m

These are backed by usually 2 pod replicaset (eg: rook-ceph-mgr-a-fc5cc7cc6-bktfv and rook-ceph-mgr-b-776f8b7dc5-t2l9v).

The Ceph Cluster (rook-ceph-cluster) mgmt dashboard is available outside the cluster via k8s ingress (nginx-igress)

kubectl -n rook-ceph-cluster describe ingress

Name:             rook-ceph-cluster-dashboard
Labels:           app.kubernetes.io/managed-by=Helm
Namespace:        rook-ceph-cluster
Address:          192.168.57.200
Ingress Class:    nginx
Default backend:  <default>
Rules:
  Host               Path  Backends
  ----               ----  --------
  ceph.cluster.home  
                     /dashboard(/|$)(.*)   rook-ceph-mgr-dashboard:http-dashboard (10.244.122.67:8448)
Annotations:         meta.helm.sh/release-name: rook-ceph-cluster
                     meta.helm.sh/release-namespace: rook-ceph-cluster
Events:
  Type    Reason          Age               From                      Message
  ----    ------          ----              ----                      -------
  Normal  AddedOrUpdated  3s (x6 over 53m)  nginx-ingress-controller  Configuration for rook-ceph-cluster/rook-ceph-cluster-dashboard was added or updated

Troubleshooting

No Dashboard / Mgr Pod Present

Problem: Ceph Cluster seems to have some but not all components, specifically the Management Dashboard is missing.

Solution: Ceph documentation notes that IMPORTANT: Please note the dashboard will only be enabled for the first Ceph object store created by Rook.. If you change the Ceph Cluster deployment (such as name or namespace) then some but not all components may be missing. In particularly, the Management Dashboard is only launched once by the first object-store. To completely start over, see uninstall notes.

Ceph Dashboard SSL

It seems whatever you set in the Ceph Cluster chart (ssl: false), the underlying Ceph Manager that provides the dashboard still expects to come up with SSL support.

Example log entry in Ceph Manager pod…

debug 2023-04-12T20:37:02.120+0000 7f6b66efa700  0 [dashboard INFO root] server: ssl=yes host=:: port=8443
debug 2023-04-12T20:37:02.124+0000 7f6b66efa700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured

TBD - Mgr Console expects HTTPS only… despite flags in chart. NGINX ingress seems to not recognise upstream HTTPS even with annotations and ignores the ingress rule and gives back 404.