Roadmap

This is a random set of TBDs that should be addressed to make the project more flexible and support more configuration options.

Must Haves

Things required and yet to be fully working…

Flamenco Manager
- Need to figure out how to support HTTPS and WSS (HTTPS for Websockets) with the NGINX Ingress Controller.
- Deregister workers (tear down workers deployment) whenever cluster is powered down

Active Development

Flamenco Manager API query number of jobs and use Keda to scale up Workers when jobs are enqueued.
Benchmarking using the blender open-data benchmarking scripts locally/offline
argo workflow for building and testing flamenco from source

Future Enhancements

The playbooks are currently very focused on Ubuntu 22.04 and Intel GPU. This list increases scope to NVIDIA and potentially non-Ubuntu OS/platforms to build a cluster on. There are also scaling/performance considerations that are not 100% required but would make the project more flexible.

Ansible Playbook Additions

UPnP support to detect Flamenco Workers on Home Network
UPnP support within cluster to allow for dynamic Worker registration
cleanly remove node from cluster (deprovision)
lets encrypt for docker ssl cert
lets encrypt for gateway device nginx
multiple blender version support… flamenco assumes one version
grafana status page to show node status
k8s gpu nvidia dcgm support
document strategies for k8s upgrades
break up gateway playbook into smaller plays
why does building a docker container take much longer and huge amount of disk space via playbook
create some blender benchmark test renders
GPU metrics from Intel i915 into Prometheus.
support different types of flamenco/blender deployments (1 per node, 1 per CPU, 1 per GPU)
if Manager restarts, re-register all workers (or restart them all)
co-locate ambelic (baked) files nearer or on the render node (rsync jobs to local PV?)
Time of Day rendering… to use off-peak electricity for example
Green/Blue and dev vs prod workspaces
Alerting (k8s and flamenco) using Alertmanager and some third party (eg: Google Chat, Signal, Pagerduty etc)
why tls verify failed on metrics-server and had to override defaultArgs to skip it.
minio “Prometheus URL is unreachable” on tenant metrics
rook mgr console HTTP->HTTPS and problem with mgr only working with HTTPS/SSL
pod security context for usb sensor should be more restrictive.
minio HTTPS
minio tenant creds and bucket creation via Ansible (currently manually done)