Skip to content

Cluster Temperature

This section walks through setting up monitoring the Cluster Enclosure temperature using Prometheus and USB temperature sensors.

Summary

There are various off the shelf USB Temperature sensors and using one or more it is possible to sample temperature periodically of the Cluster enclosure. This may be of interest if you dont want your hardware life span to shorten due to excessive heat stress.

The install-cluster-foundation Playbook will install and configure all nodes to have the necessary drivers installed.

To actually sample the sensor a k8s DaemonSet is used for those nodes with the usb-temperature label indicating it has a sensor. any node that has a USB Temperature sensor plugged in.

digitemp

In this example, a digitemp compatible DS18B20 1–wire sensor using the pl2303 driver is placed within the cluster enclosure. Note, the pl2303 driver is no longer present in RaspberryPi Raspbian distributions. It is available in Ubuntu.

Device: digitemp_DS9097 https://usbtemp.com/ (Install Guide PDF)

Initialise the sensor…

digitemp_DS9097 -i -s /dev/ttyUSB0

Fetch a reading…

digitemp_DS9097 -t 0
DigiTemp v3.7.2 Copyright 1996-2018 by Brian C. Lane
GNU General Public License v2.0 - http://www.digitemp.com
Apr 26 00:55:25 Sensor 0 C: 17.88 F: 64.18

The digitemp software has many options and modes but in its simplest form we may want a timestamp and temperature reading (eg: in oC) that we can save to a file.

digitemp_DS9097 -t 0 -o "%N,%C" -q

Example sample with epoch seconds and temperature in Centigrade

1682470657,17.875000

In this example, we in fact request a temperature reading and present the results as a Prometheus metrics endpoint.

Sensor Application

A small script that retrieves the data from the sensor using digitemp Python library pydigitemp and then presents it on a Prometheus compatible metrics endpoint.

This is deployed as a k8s DaemonSet assuming one sensor per node using the install-cluster-foundation Ansible Playbook onto every node as a system service. The sampling application Pod will not be scheduled if there is not usb-temperature node label present on the node.

Example Custom Prometheus Metrics Exporter script using port 9000. There is one custom Prometheus Gauge metric.

import os
import time
from digitemp.master import UART_Adapter
from digitemp.device import TemperatureSensor
from digitemp.exceptions import DeviceError
from prometheus_client import start_http_server, Gauge

temperature_gauge = Gauge(
    "cluster_enclosure_temperature",
    "Cluster Enclosure Temperature",["scale","location"]
)

temperature_gauge.labels("celsius", USB_LOCATION)
temperature_gauge.labels("fahrenheit", USB_LOCATION)

POLL_INTERVAL_SECS = os.getenv("POLL_INTERVAL_SECS", "3600")
USB_DEVICE = os.getenv("USB_DEVICE", "/dev/ttyUSB0")
USB_LOCATION = os.getenv("USB_LOCATION", "")
METRICS_PORT = 9000

def celsius_to_fahrenheit(reading):
        return (reading * 9/5) + 32

def process_request():
    """Poll USB Temperature Sensor if present"""
    try:
        if os.path.exists(USB_DEVICE):
            sensor = TemperatureSensor(UART_Adapter(USB_DEVICE))
            temp_value = round(sensor.get_temperature(),1)
            temperature_gauge.labels("celsius", USB_LOCATION).set(temp_value)
            temperature_gauge.labels("fahrenheit", USB_LOCATION).set(celsius_to_fahrenheit(temp_value))

    except DeviceError as device_error_exception:
        raise device_error_exception

    time.sleep(int(POLL_INTERVAL_SECS))


if __name__ == "__main__":
    start_http_server(METRICS_PORT)
    while True:
        process_request()

Note

The above script is a stripped down example and not very robust if the temperature sensor is not present or returns garbage. There should be some configuration, device and range checking.

Sampling Application Pod

The python script is packaged into a Docker Container. This is built when the install-docker-registry Playbook is run and the Container stored in the local Docker Registry…

ansible-playbook -i hosts build-docker-registry.yaml -K -t usb_temperature_container

Deploy the usb-temperature DaemonSet in the monitoring namespace…

ansible-playbook  -i hosts build-cluster-foundation.yaml -K -t install-temp

Pod Security Context

The nodes that do have the /dev/ttyUSB0 device present will look something like the following. Note, the group dialout is important.

root@usb-temperature-4z9sx:/# ls -l /dev/ttyUSB0
crw-rw---- 1 root dialout 188, 0 Apr 26 06:10 /dev/ttyUSB0

To allow the Pod access to the /dev/ttyUSB0 device it is necessary to define the appropriate Pod Security Context. In a production setting you should not have privileged Pods running but this is the easiest way to allow the Pod access to the sensor.

securityContext:
  privileged: true

Node Label

The Sampling Application Pod will have a k8s toleration for usb_temperature_sensor=digitemp_DS9097

Those nodes that have this sensor attached can be labelled.

kubectl label node k8s-node7 usb-temperature=digitemp_DS9097

Startup Script

By default, the entrypoint.sh script shown below is packaged in the usb-temperature container will initialised and then poll temperature sensor.

if [[ -z "${USB_DEVICE}" ]]; then
    USB_DEVICE="/dev/ttyUSB0"
fi

if [ -e "$USB_DEVICE" ]; then
    digitemp_DS9097 -i -s $USB_DEVICE
    python /code/usb_temp_client/usb-temperature-client/main.py
else
    echo "$USB_DEVICE not found - exit"
fi

To inspect the sampling application logs the following kubectl command can be used…

kubectl -n monitoring logs usb-temperature-bsl5t

Example output of the usb-temperature Pod…

DigiTemp v3.7.2 Copyright 1996-2018 by Brian C. Lane
GNU General Public License v2.0 - http://www.digitemp.com
Turning off all DS2409 Couplers
Wrote .digitemprc
.
Searching the 1-Wire LAN
28AF0000FECF00B3 : DS18B20 Temperature Sensor
ROM #0 : 28AF0000FECF00B3
2023-04-28 00:56:48,200 [__main__    ] INFO     USB Device: /dev/ttyUSB0
2023-04-28 00:56:48,201 [__main__    ] INFO     Prometheus Metrics Port: 9000
2023-04-28 00:56:48,202 [__main__    ] INFO     Reading sensor /dev/ttyUSB0
2023-04-28 00:56:50,124 [__main__    ] INFO     Reading 17.875000
2023-04-28 00:57:05,227 [__main__    ] INFO     Reading sensor /dev/ttyUSB0
2023-04-28 00:57:07,166 [__main__    ] INFO     Reading 17.875000
...

Prometheus Integration

Metrics Endpoint

Prometheus will scrape a metrics endpoint and these can be discovered using Prometheus Service Monitors.

kubectl port-forward -n monitoring pods/usb-temperature-s7g6x 9000:9000

Then fetch the current prometheus metrics…

wget http://localhost:9000

Example Custom Prometheus Metrics Exporter with temperature value…

# HELP cluster_enclosure_temperature Cluster Enclosure Temperature
# TYPE cluster_enclosure_temperature gauge
cluster_enclosure_temperature 17.875

Tip

If you don’t care about three decimal places you can round this up and save some disk space as Prometheus only records changes in metrics values.

In deployment, a Prometheus Service Monitor is also used to discover and help Prometheus scrape usb-temperature pods.

USB Temperature Service

This is an example the usb-temperature Service providing access to each usb-temperature Pod.

      apiVersion: v1
      kind: Service
      metadata:
        name: usb-temperature
        namespace: monitoring
        labels:
          app: usb-temperature
      spec:
        ports:
        - name: metrics
          port: 9000
          targetPort: metrics
          protocol: TCP
        selector:
          app.kubernetes.io/name: usb-temperature

USB Temperature Service Monitor

In this project, the kube-prometheus-stack charts are used to deploy Prometheus into the monitoring namespace. The name of the prometheus CRD is kube-prometheus-stack.

When defining the Service Monitor it is important to include the label release: kube-prometheus-stack so that Prometheus will add it to the list of Service Monitors.

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: usb-temperature
        namespace: monitoring
        labels:
          app: usb-temperature
          release: kube-prometheus-stack
      spec:
        namespaceSelector:
          matchNames:
          - monitoring
        selector:
          matchLabels:
            app: usb-temperature
        endpoints:
        - port: metrics
          interval: 1800s

Prometheus by default will include any Service Monitor in the cluster that is labelled with the Prometheus release. This allows you to have multiple Prometheus stacks, listening for their own Service Monitors. However, in this example, there is only one Prometheus stack.

You can write more specific Service Monitor filters too but by default, any Service Monitor with the same release label as the Prometheus stack is included.

These Service Monitors will appear under “Status / Service Discovery” section of the Prometheus Console at http://prometheus.cluster.home/prometheus/service-discovery.

Prometheus Queries

As with all Prometheus queries you get some context as to which Pod and Service recorded the metric. You can add custom labels to the sampling application such as a sensor location (eg: inside enclosure) or type (eg: digittemp_DS9097).

cluster_enclosure_temperature{container="usb-temperature-client", endpoint="metrics", instance="10.244.189.22:9000", job="usb-temperature", location="inside-top ", namespace="monitoring", node_name="k8s-node7 ", pod="usb-temperature-hqfkh", scale="celsius", sensor="digitemp_DS9097 ", service="usb-temperature"}

Grafana Temperature Graph

You can tune the incoming metrics to round up or you can use PromQL. The following example is a few minutes with no rounding.

Alertmanager

TBD