Skip to content

Cluster Network

For DNS within the cluster.home domain, DNSMasq is used along with the /etc/hosts file on the Gateway Device. It is provisionable but the final running state is “locked down” for simplicity.

Cluster IP Address Ranges

The dnsmasq service will only hand out DHCP addresses to known MAC addresses.

Known MAC addresses can be adjusted in the gateway-install Ansible Playbook. (Installation / Gateway)

k8s Node Range

These fall within a small range 192.168.57.31-41 and restrict the cluster to a maximum of 10 nodes.

(more info: Installation / Gateway)

k8s Load Balancer Range

The Cluster also has a load-balancer that has a non-overlapping range of subnet addresses which are used to host k8s ingresses. This range is 192.168.57.200-254.

(more info: Installation / Gateway)

Static Services

Other services within the cluster subnet are statically assigned such as the Gateway Device IP Address within the cluster subnet.

(more info: Installation / Gateway)

Provisioning

These ranges are documented in the Ansible Playbook that provisions the Gateway Server host (ie: the Gateway Device) dnsmasq service. To add a new k8s node, you have to pre-provision it to be within the k8s node IP range. Becareful not to re-use IP addresses unless you have successfully detached and removed the old node from the k8s cluster.

Cluster Node Hostname from DHCP

The dnsmasq service can hand out hostnames and this is used in this project.

DHCP Client can set the hostname

Reference: “Note that when using Ubuntu 18.04 the tie-in scripts are no longer necessary. If the hostname of the install is set to localhost in /etc/hostname the DHCP client will set the hostname automatically at startup using the name issued by DHCP, if present. When running hostnamectl it will list localhost as the permanent hostname, and whatever DHCP issues as a transient hostname.” (link)

You can confirm whether a Cluster Node is using the hostname it is given by using the following hostnamectl command.

On a Cluster Node that has yet to be commissioned as a k8s node.

ubuntu@localhost:~$ sudo rm /etc/hostname

ubuntu@localhost:~$ hostnamectl
   Static hostname: n/a                             
Transient hostname: localhost
         Icon name: computer-desktop
           Chassis: desktop
        Machine ID: cfcffdc19e104b30847fe3a29a07e588
           Boot ID: 5b95d543e66945aeb28d8a73fcfa0a87
  Operating System: Ubuntu 22.04.1 LTS              
            Kernel: Linux 5.15.0-58-generic
      Architecture: x86-64

The line “Static hostname: n/a” means /etc/hostname does not exist.

Troubleshooting

If you cannot troubleshoot from the Gateway Device, you can attach to the Cluster subnet directly via the Layer 2 switch. In this case, you will want to either add your MAC to the /etc/dnsmasq.d/allowed_hosts.config or assign a static address beneath 192.168.57.30.

You can use dig on any node to check DNS is being resolved properly. Here are some examples.

dig grafana.cluster.home

This shows that the Grafana dashboard UI is available at 192.168.1.200 which is one of the possible External Load Balancer IP address for the Cluster.

kube@k8s-control-plane-node:~$ dig grafana.cluster.home

; <<>> DiG 9.18.1-1ubuntu1.3-Ubuntu <<>> grafana.cluster.home
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54884
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;grafana.cluster.home.      IN  A

;; ANSWER SECTION:
grafana.cluster.home.   0   IN  A   192.168.57.200

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Feb 16 15:54:02 UTC 2023
;; MSG SIZE  rcvd: 65
dnsmasq logs

The dnsmasq logs are quite helpful if you have malformed configuration or it is intentionally ignoring incoming DHCP requests. You can view the MAC addresses in the log and then copy them into the respective Ansible Playbook variables.

#> sudo service dnsmasq status
● dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-02-16 16:43:02 GMT; 24h ago
    Process: 49672 ExecStartPre=/etc/init.d/dnsmasq checkconfig (code=exited, status=0/SUCCESS)
    Process: 49680 ExecStart=/etc/init.d/dnsmasq systemd-exec (code=exited, status=0/SUCCESS)
    Process: 49689 ExecStartPost=/etc/init.d/dnsmasq systemd-start-resolvconf (code=exited, status=0/SUCCESS)
   Main PID: 49688 (dnsmasq)
      Tasks: 1 (limit: 779)
        CPU: 15.394s
     CGroup: /system.slice/dnsmasq.service
             └─49688 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -r /run/dnsmasq/resolv.conf -7 /etc/dnsma>

Feb 17 03:26:04 raspberrypi dnsmasq-dhcp[49688]: DHCPREQUEST(eth0) 192.168.57.33 f4:4d:30:61:65:54
Feb 17 03:26:04 raspberrypi dnsmasq-dhcp[49688]: DHCPACK(eth0) 192.168.57.33 f4:4d:30:61:65:54 k8s-node3
Feb 17 03:28:24 raspberrypi dnsmasq-dhcp[49688]: DHCPREQUEST(eth0) 192.168.57.34 f4:4d:30:61:61:d8

Cluster Node IP Address Assignment

The following diagram illustrate static IP Address allocation to nodes. In this project, the dnsmasq service statically provisions IP Addresses in the cluster based on their MAC address. It will not provision an unknown node. This is to avoid accidentally performing a PXE Ubuntu Installation unintentionally.

sequenceDiagram
  autonumber
  Ansible Playbook->>Gateway Device: Provision new node
  Cluster Node-->>Gateway Device: DHCP Request
  Gateway Device->>Gateway Device: Check dnsmasq.d/allowed_hosts.conf
  Gateway Device->>Cluster Node: Assign Static IP Address and hostname

PXE Installer Support

When a cluster node initiates Network Boot using Legacy PXE it first makes a DHCP request. The Gateway Device is configured to read /etc/dnsmasq.d/pxe.conf that will include PXE boot information in that DHCP Response. (This is ignored if the cluster node is not attempting a PXE network boot.) In the DHCP Response there is information to reach out to a Gateway Server and initiate the network boot.

Example PXE dnsmasq configuration for x86_64 EFI nodes (extract from /etc/dnsmasq.d/pxe.conf)

#--location of the pxeboot file
dhcp-boot=/bios/pxelinux.0,pxeserver,192.168.57.30

#--Detect architecture and send the correct bootloader file
dhcp-match=set:efi-x86_64,option:client-arch,7 
dhcp-boot=tag:efi-x86_64,grub/bootx64.efi

Details: PXE Network Installer

Cluster DNS Resolution

The cluster operates under cluster.home domain. The dnsmasq service provides the DNS service. The dnsmasq service is provisioned to use the Gateway Device /etc/hosts file where additional well-known hosts are defined as well as all the known cluster nodes. This allows other nodes and services withint he clsuter to use node hostnames and services by domain name.

The cluster does not use the upstream DNS service. If it cannot resolve a domain name it will use Google 8.8.8.8. This is to limit dependencies on the cluster subnet and ensure that there are no upstream DNS dependencies.

Therefore it is assumed all resources are within the Cluster Network or online. If the NAS is on the Home Network, then it will not be reachable unless you use an Home Network IP Address.

sequenceDiagram
  autonumber
  Ansible Playbook->>Gateway Device: Provision well-known hosts in /etc/hosts
  Cluster Node-->>Gateway Device: DNS Request
  Gateway Device->>Gateway Device: Check /etc/hosts
  Gateway Device->>Gateway Device: Check 8.8.8.8
  Gateway Device->>Cluster Node: DNS Response

Example hosts file contents on Gateway Device (extract from /etc/hosts)

127.0.0.1   localhost
::1     localhost ip6-localhost ip6-loopback
ff02::1     ip6-allnodes
ff02::2     ip6-allrouters

127.0.1.1       raspberrypi
# BEGIN ANSIBLE MANAGED BLOCK
192.168.57.30 gateway web-proxy nfs
192.168.57.200 services grafana prometheus flamenco
192.168.57.31 k8s-control-plane-node registry
192.168.57.32 k8s-node2      
192.168.57.33 k8s-node3      
192.168.57.34 k8s-node4      
192.168.57.35 k8s-node5      
# END ANSIBLE MANAGED BLOCK

Name Expansion

The dnsmasq service is configured to expand hostnames within the cluster. This means that k8s-control-plane-node becomes k8s-control-plane-node.cluster.home automatically. Similarly, other services, such as prometheus become prometheus.cluster.home and are resolved within the cluster sub domain.