Rook (Ceph)
Summary
To provide resilient ephemerial storage for kubernetes cluster there are several different approaches. Some are fairly simple like hostPath but have limitations on scheduling pods. Others are more involved but offer abstraction and RAID-like functions.
Rook is an overlay to RedHats Ceph and provides distributed storage across physical nodes and disks. These are presented to the k8s cluster as PVs that Pods can claim and use.
The main benefit of Rook and Ceph is that it is self-healing and will automatically include new disks into the Ceph Cluster automatically.
The build-cluster-foundation
Ansible playbook will install Ceph Operator (rook-ceph
) and Ceph Cluster (rook-ceph-cluster
) used to hold emphermial storage PVs.
It is also possible to reset Ceph deployment (and destroy all associated data) by using the build-cluster
Ansible playbook with variable reset_cluster = True
. Note, this will destroy the entire cluster as well as the Ceph deployments and data-stores.
Deployment
There are two parts to the deployment in the cluster:
- Ceph Operator (rook-ceph)
- Ceph Cluster (rook-ceph-cluster)
The Ceph Cluster will automatically look for nodes that have /dev/sdb
unformatted device and use it.
There must be at least three nodes that have such a device present for Ceph Cluster to come into service.
If you only want specific nodes to be used by Rook then you must include placement
values section that essentially uses node-affinity and tolerations to select which nodes are candidates for block storage.
Note, if you change anything about a CephCluster
CRD such as name or block storage devices things can get confusing very quickly and only some of the Ceph components will come up. See uninstall notes for more details on cleaning up an existing Ceph Cluster before attempting to re-deploy one.
Block Storage (sdb)
In this project, it is configured to be block-level and use an unformatted secondary disk on k8s nodes which should be unformatted (no partitions) and presented as /dev/sdb
by the host node operating system.
The build-cluster
Ansible Playbook will prepare those nodes that have a /dev/sdb
device ahead of deploying Ceph. It assumes any device mounted as /dev/sdb
can be formatted and used for this purpose.
StorageClass - ceph-block
Deployment of rook-ceph-cluster
will create three k8s storageclass
types: ceph-block
, ceph-bucket
and ceph-filesystem
. The ceph-block
being the k8s default storageclass
.
These will be used by any other application deployment that needs ephemerial storage.
For example, Grafana Loki log storage PVs…
kubectl -n rook-ceph-cluster get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
pvc-00ae6c79-101b-441c-a182-1b49fefcc370 10Gi RWO Delete Bound monitoring/data-loki-read-0 ceph-block 136m
pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23 10Gi RWO Delete Bound monitoring/data-loki-write-2 ceph-block 136m
pvc-31ae8277-de95-447d-a7b6-195b54e678e7 10Gi RWO Delete Bound monitoring/data-loki-read-1 ceph-block 136m
pvc-3ece7f87-9a74-4b71-a4ac-d4b06b8287cc 10Gi RWO Delete Bound monitoring/data-loki-read-2 ceph-block 136m
pvc-513621af-1f7d-403d-9afe-5db5deabefce 10Gi RWO Delete Bound monitoring/data-loki-write-1 ceph-block 136m
pvc-94ee5ea9-fb0d-420f-b84a-ca99e942f38d 10Gi RWO Delete Bound monitoring/data-loki-write-0 ceph-block 136m
There are several Loki Pods that would be in “Pending” state without ceph-block
storageclass (or some other default storageclass being available.)
pod/loki-read-0 0/1 Running 0 150m
pod/loki-read-1 1/1 Running 0 150m
pod/loki-read-2 1/1 Running 0 150m
pod/loki-write-0 1/1 Running 0 150m
pod/loki-write-1 1/1 Running 0 150m
pod/loki-write-2 1/1 Running 0 150m
Example…
kubectl -n monitoring describe pod loki-write-2
Name: loki-write-2
Namespace: monitoring
Priority: 0
Service Account: loki
...
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-loki-write-2
ReadOnly: false
...
The data-loki-write-2
PVC is part of the rook-ceph-cluster
.
kubectl -n rook-ceph-cluster describe pv pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23
Name: pvc-2317fd97-f19b-44cd-be28-f4f2c57b0d23
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph-cluster
Finalizers: [kubernetes.io/pv-protection]
StorageClass: ceph-block
Status: Bound
Claim: monitoring/data-loki-write-2
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: rook-ceph.rbd.csi.ceph.com
FSType: ext4
VolumeHandle: 0001-0011-rook-ceph-cluster-0000000000000001-7ad8698b-75ac-4273-99ab-ea8e48dd8de6
ReadOnly: false
VolumeAttributes: clusterID=rook-ceph-cluster
imageFeatures=layering
imageFormat=2
imageName=csi-vol-7ad8698b-75ac-4273-99ab-ea8e48dd8de6
journalPool=ceph-blockpool
pool=ceph-blockpool
storage.kubernetes.io/csiProvisionerIdentity=1681419256637-8081-rook-ceph.rbd.csi.ceph.com
Events: <none>
Ceph Mgmt Console
From the Home Network, you can reach the Ceph Cluster dashboard at http://ceph.cluster.home/dashboard.
TBD - Mgr Console expects HTTPS only… despite flags in chart. NGINX ingress seems to not recognise upstream HTTPS even with annotations and ignores the ingress rule and gives back 404.
Login credentials
kubectl -n rook-ceph-cluster get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
The Ceph Cluster (rook-ceph-cluster
) mgmt dashboard is created once (by the first Rook object-store!) and can be seen in the example below running in HTTP mode.
kubectl -n rook-ceph-cluster get service rook-ceph-mgr-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr-dashboard ClusterIP 10.98.93.6 <none> 8448/TCP 60m
These are backed by usually 2 pod replicaset (eg: rook-ceph-mgr-a-fc5cc7cc6-bktfv
and rook-ceph-mgr-b-776f8b7dc5-t2l9v
).
The Ceph Cluster (rook-ceph-cluster
) mgmt dashboard is available outside the cluster via k8s ingress (nginx-igress)
kubectl -n rook-ceph-cluster describe ingress
Name: rook-ceph-cluster-dashboard
Labels: app.kubernetes.io/managed-by=Helm
Namespace: rook-ceph-cluster
Address: 192.168.57.200
Ingress Class: nginx
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
ceph.cluster.home
/dashboard(/|$)(.*) rook-ceph-mgr-dashboard:http-dashboard (10.244.122.67:8448)
Annotations: meta.helm.sh/release-name: rook-ceph-cluster
meta.helm.sh/release-namespace: rook-ceph-cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddedOrUpdated 3s (x6 over 53m) nginx-ingress-controller Configuration for rook-ceph-cluster/rook-ceph-cluster-dashboard was added or updated
Troubleshooting
No Dashboard / Mgr Pod Present
Problem: Ceph Cluster seems to have some but not all components, specifically the Management Dashboard is missing.
Solution: Ceph documentation notes that IMPORTANT: Please note the dashboard will only be enabled for the first Ceph object store created by Rook.
. If you change the Ceph Cluster deployment (such as name or namespace) then some but not all components may be missing. In particularly, the Management Dashboard is only launched once by the first object-store. To completely start over, see uninstall notes.
Ceph Dashboard SSL
It seems whatever you set in the Ceph Cluster chart (ssl: false), the underlying Ceph Manager that provides the dashboard still expects to come up with SSL support.
Example log entry in Ceph Manager pod…
debug 2023-04-12T20:37:02.120+0000 7f6b66efa700 0 [dashboard INFO root] server: ssl=yes host=:: port=8443
debug 2023-04-12T20:37:02.124+0000 7f6b66efa700 0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured
TBD - Mgr Console expects HTTPS only… despite flags in chart. NGINX ingress seems to not recognise upstream HTTPS even with annotations and ignores the ingress rule and gives back 404.