Skip to content

Vijfpas bootstrap implementation plan

This document defines the practical bootstrap order for the current platform state.

1. Scope and constraints

  • Outbound from 10.x networks is deny-by-default.
  • Local infra services (DNS, DHCP) stay local; NTP is provided by host Chrony with explicit upstream allow-list until a dedicated internal NTP tier exists.
  • NFR and guest-admin access is restricted to two approved admin IPs.
  • Persistent service data must live on a second disk/filesystem and survive image rebuilds.
  • Debian templates are rebuilt from code (Terraform + Ansible), not patched in place.

Terminology note:

  • nfr-mgmt is substrate-only.
  • environment segments use the <environment>-<tier> naming model.
  • the live shared-platform service tiers now use the canonical pfm-svc, pfm-core, pfm-egress, and pfm-bck names.
  1. Network controls first
  2. Keep default-deny outbound and infra/admin allow-list rules enabled (nfr-mgmt, prd-admin, dev-admin).
  3. Keep change-backed temporary exception path for bootstrap package access.
  4. Base image pipeline
  5. Build Debian golden template on each Proxmox node.
  6. Validate cloud-init, SSH hardening, local DNS, Chrony sync, and config-management bootstrap.
  7. Terraform state backend bootstrap
  8. Start with short-lived local backend only for initial foundation apply.
  9. Move state to HA backend after PostgreSQL HA is up.
  10. PostgreSQL HA foundation on Proxmox C/D/E
  11. Deploy PostgreSQL HA first; multiple services depend on it.
  12. Nexus deployment
  13. Deploy Nexus against PostgreSQL HA and internal blob storage.
  14. Configure proxy repos so platform nodes consume packages through Nexus.
  15. Cut consumers to internal package path
  16. Point Debian/containers/tools to Nexus mirrors/proxies.
  17. Issue one named Nexus machine credential per VM or service identity; do not share one credential across teams or hosts.
  18. Start with Debian APT on platform VMs using root-only auth.conf.d credentials and separate APT repository entries for base, updates, and security.
  19. Remove temporary outbound bootstrap exceptions.
  20. Bootstrap internal GitLab on dev-svc + dev-core
  21. Deploy gitlab-dev-svc on dev-svc plus gitaly-dev-core on dev-core.
  22. Use external PostgreSQL and a single external Gitaly node with local repository storage on dbpool.
  23. Pull Debian packages and the GitLab CE package from Nexus only; no direct internet egress from the GitLab/Gitaly pair.
  24. Run the shared-platform K3s bootstrap cluster on the live pfm-svc tier
  25. The old placeholder k8s-platform-* cluster was intentionally cleared before first use and replaced with the plain-name cluster.
  26. The live cluster now uses k8s-server01/02/03-pfm-svc plus k8s-worker01/02-pfm-svc on the shared-platform service tier.
  27. The canonical API VIP is k8s.pfm-svc.vijfpas.be on 10.0.43.155.
  28. The live base-cluster stack is kube-vip, flannel, MetalLB, standalone Traefik, cert-manager, then the first ceph-csi RBD wave.
  29. bootstrap-k3s-artifacts.yml seeded the current live shared-platform Nexus estate with the k3s-artifacts raw repository, the vijfpas-k3s-bootstrap-read role, and one machine user per K3s node before bootstrap.
  30. Debian packages, K3s airgap artifacts, and kube-vip image tarballs are served from Nexus only; there is no direct internet egress from the cluster.

2.1 Bootstrap sequence map (Mermaid)

flowchart LR
  A[1. Network controls first] --> B[2. Build Debian base image pipeline]
  B --> C[3. Bootstrap temporary Terraform state]
  C --> D[4. Deploy PostgreSQL HA on C/D/E]
  D --> E[5. Deploy Nexus with PostgreSQL backend]
  E --> F[6. Cut package consumers to Nexus]
  F --> G[7. Bootstrap internal GitLab]
  G --> H[8. Redeploy shared-platform K3s cluster on live pfm-svc]
  H --> I[Keep control-plane and cluster bootstrap air-gapped except for approved internal services]

2.2 Implemented base image pipeline (repository automation)

Pipeline path:

  • vijfpas/infra-live/base-image-pipeline

Pipeline phases:

  1. build: Ansible rebuilds Debian template on each Proxmox node.
  2. validate: Terraform creates one ephemeral validation VM per node; Ansible validates baseline controls.
  3. destroy-validation: Terraform removes ephemeral validation VMs.

Orchestrator:

  • scripts/pipeline.sh all

Validation baseline implemented by code:

  • cloud-init completion (/var/lib/cloud/instance/boot-finished)
  • SSH hardening (PasswordAuthentication no, PermitRootLogin no)
  • ops sudo user with key-based access
  • local DNS resolver assignment on management NIC
  • Chrony synchronization checks (chronyc tracking, chronyc sources -v)
  • qemu-guest-agent enabled/running

3. Nexus design baseline

3.1 Placement

  • Current internal Nexus instance: nexus-pfm-egress with service placement on pfm-egress.
  • The shared artifact path is nexus.pfm-egress.vijfpas.be.
  • Routine SSH/admin for this VM is on the pfm-egress IP only, sourced from approved nfr-admin hosts.
  • Current internal service DNS is nexus.pfm-egress.vijfpas.be.
  • Direct internal A record should target the pfm-egress service IP until a dedicated internal VIP exists.
  • Anonymous access is disabled; all UI and repository clients require credentials.
  • Do not dual-home the current Nexus VM across multiple service segments.

3.2 Dependencies

  • PostgreSQL database (external backend on postgresql.pfm-core for the shared artifact estate).
  • Blob storage path for repositories (persistent volume).
  • Local reverse proxy/TLS termination on the Nexus VM (nginx on 443 to Nexus on 127.0.0.1:8081).
  • DNS record (nexus.pfm-egress.vijfpas.be; the former nexus.core-egress.vijfpas.be overlap alias is retired; future VIP/reverse-proxy target if introduced).
  • Credentials for all clients; anonymous access is disabled.
  • Optional SMTP/OIDC integrations.
  • PostgreSQL client allow-lists (pg_hba.conf) must be updated whenever the Nexus service IP/subnet changes.

3.3 HA model

  • Nexus Repository Community Edition does not provide true active-active multi-node HA.
  • Current live state:
  • single-node Nexus on nexus-pfm-egress
  • no Nexus service VIP today
  • external PostgreSQL backend
  • Treat any HA expansion as a separate design and migration change, not part of the current implementation baseline.

3.4 Security/scanning note

  • Nexus is a repository/proxy manager, not a complete malware/CVE control plane by itself.
  • Keep separate scanner controls (for example image/dependency scanning and malware checks) in CI/admission policy.

3.5 Consumer credential baseline

  • Anonymous access is disabled.
  • Human UI access uses named human accounts.
  • VM/service package consumers use named machine credentials, one per VM or service identity.
  • Initial consumer rollout starts with Debian APT on platform VMs.
  • APT consumers should use root-only credentials in /etc/apt/auth.conf.d/ and separate Nexus APT repository definitions for base, updates, and security.
  • Do not reuse the built-in admin account for package managers or automation consumers.

3A. Internal GitLab bootstrap baseline

3A.1 Placement

  • Current bootstrap target for internal delivery tooling:
  • gitlab-dev-svc (VMID 146) on proxmox-e, dev-svc, 10.0.37.146
  • gitaly-dev-core (VMID 147) on proxmox-c, dev-core, 10.0.32.147
  • gitlab-dev-svc keeps its root disk on shared ceph-vmdata.
  • gitaly-dev-core keeps both its root disk and repository data disk on shared ceph-vmdata.

3A.2 Dependencies

  • PostgreSQL database and role on the current canonical dev PostgreSQL writer alias postgresql.dev-core.vijfpas.be.
  • Internal TLS certificate for the current live service FQDN gitlab.dev-svc.vijfpas.be.
  • Nexus-served Debian packages and GitLab CE package artifacts.
  • Internal DNS records:
  • gitlab.dev-svc.vijfpas.be -> 10.0.37.146
  • gitaly.dev-core.vijfpas.be -> 10.0.32.147
  • Older platform-gitlab.* and platform-gitaly.* overlap aliases are retired from the current implementation baseline.

3A.3 Initial service model

  • GitLab application tier runs on gitlab-dev-svc.
  • Gitaly runs as a single external node on gitaly-dev-core.
  • Internal GitLab -> Gitaly RPC uses token-authenticated tcp/8075 on dev-core.
  • The GitLab/Gitaly pair should not use general internet egress; Debian packages and the GitLab CE package should come from nexus.pfm-egress.
  • This is a bootstrap baseline, not a final HA model. Repository HA later means either Gitaly Cluster/Praefect or a deliberate DR design, not ad hoc storage sync.

3.6 Current Nexus recovery baseline

Current Nexus implementation is single-node only:

  • one live Nexus VM on nexus-pfm-egress
  • no Nexus service VIP today
  • external PostgreSQL backend
  • persistent blob/data path on the same VM

If higher availability is introduced later, document that as a separate design and migration track instead of treating it as part of the current implementation baseline.

4. Current PostgreSQL service baseline

4.1 VM placement and storage

Current live PostgreSQL pair:

  • postgresql-prim-dev-core on proxmox-c
  • postgresql-sec-dev-core on proxmox-e
  • both use a replaceable root disk plus a persistent node-local dbpool data disk

4.2 Current replication model

  • primary + secondary
  • PostgreSQL service traffic stays on dev-core
  • routine SSH/admin lands on the same dev-core IPs from approved dev-admin sources

4.3 Current data protection baseline

  • PostgreSQL base backup plus WAL archive is required before treating the service as stable
  • backup retention and WAL expiry must be driven by the backup tool, not by indefinite local archive growth on the database VM

4.5 Delivery ownership and handoff to Nexus

Execution model for this bootstrap phase:

  • Owner: platform data team owns PostgreSQL OS/service bootstrap, replication, backup policy, and restore drill.
  • Control path: routine SSH/admin comes from approved dev-admin source hosts to the workload IPs on dev-core (10.0.32.130/133).
  • Workload path: PostgreSQL client traffic stays on dev-core (10.0.32.130/133, tcp/5432).
  • Legacy route baseline: the explicit return route (10.0.20.0/24 via 10.0.23.1 dev eth1) existed only for the older platform-dev-mgmt bootstrap path and should remain retired.

Nexus step (section 3 and bootstrap step 5) can start only after PostgreSQL exit criteria are met:

  1. current primary is reachable on 10.0.32.130:5432
  2. replication between primary and secondary is healthy
  3. backup + restore drill completed and evidenced
  4. management access policy validated on the workload IPs

4.6 Data team PostgreSQL bring-up checklist (command-driven)

Use this sequence for the current platform-dev rollout before Nexus onboarding.

Current implementation profile (platform-dev):

  • data nodes: postgresql-prim-dev-core (10.0.32.130) and postgresql-sec-dev-core (10.0.32.133)
  • topology today: primary + secondary
  • routine administration uses the dev-core IPs from approved dev-admin sources only

Set working variables on workbench-data-dev-admin for the current data-admin path:

export PG_CORE_IPS="10.0.32.130 10.0.32.133"
export ADMIN_SOURCE_IP="10.0.23.172"
export PG_PRIMARY_IP="10.0.32.130"

4.6.1 Gate A: workload admin path policy

Run from workbench-data-dev-admin:

# Current state: SSH must work on the workload IPs.
for ip in $PG_CORE_IPS; do
  timeout 6 ssh -o BatchMode=yes -o StrictHostKeyChecking=accept-new "debian@$ip" "hostname -f"
done

Run on each PostgreSQL VM:

ip -br a
ip route
ip route get "$ADMIN_SOURCE_IP"

Pass criteria:

  1. SSH succeeds on 10.0.32.130/133 from approved dev-admin source hosts.
  2. Any obsolete route for 10.0.20.0/24 remains absent.
  3. No guest-admin NIC is present on VLAN 23.

4.6.2 Gate B: host and storage baseline

Run on each PostgreSQL VM:

hostnamectl
timedatectl
lsblk -f
findmnt /var/lib/postgresql || true

If data disk is fresh/unformatted, initialize once (example sdb):

sudo parted -s /dev/sdb mklabel gpt
sudo parted -s /dev/sdb mkpart primary ext4 0% 100%
sudo mkfs.ext4 -L pgdata /dev/sdb1
sudo mkdir -p /var/lib/postgresql
echo "LABEL=pgdata /var/lib/postgresql ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab
sudo mount -a
findmnt /var/lib/postgresql

Pass criteria:

  1. time sync healthy
  2. PostgreSQL data path mounted on persistent disk (scsi1/disk1), not root disk

4.6.3 Gate C: package install and local database readiness

Run on each PostgreSQL VM:

sudo apt-get update
sudo apt-get install -y postgresql postgresql-contrib pgbackrest
sudo systemctl enable --now postgresql
sudo -u postgres pg_isready -h 127.0.0.1 -p 5432
sudo -u postgres psql -d postgres -c "select version();"

Create Nexus database principal (run once on current primary/writer):

sudo -u postgres psql -v ON_ERROR_STOP=1 <<'SQL'
DO $$
BEGIN
  IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'nexus') THEN
    CREATE ROLE nexus LOGIN PASSWORD 'REPLACE_WITH_SECRET' NOSUPERUSER NOCREATEDB NOCREATEROLE;
  END IF;
END $$;
SELECT 'CREATE DATABASE nexus OWNER nexus'
WHERE NOT EXISTS (SELECT 1 FROM pg_database WHERE datname = 'nexus')\gexec
SQL

Pass criteria:

  1. PostgreSQL healthy on each VM (pg_isready local).
  2. nexus role and nexus database exist on writer.

4.6.4 Gate D: replication and failover validation

Current platform-dev profile (2-node primary/secondary):

# On primary
sudo -u postgres psql -d postgres -c "select application_name, client_addr, state, sync_state from pg_stat_replication;"

# On secondary
sudo -u postgres psql -d postgres -c "select pg_is_in_recovery();"

Pass criteria:

  1. at least one healthy writer and one healthy replica
  2. replica state is streaming/caught-up per defined RPO policy
  3. current manual failover procedure is documented and recorded

4.6.5 Gate E: backup and restore drill

Run backup checks on writer:

sudo pgbackrest --stanza=dev check
sudo pgbackrest --stanza=dev backup
sudo pgbackrest info --stanza=dev

Run restore drill on isolated restore target (not on active writer):

# Example on restore target after backup artifacts are accessible.
sudo systemctl stop postgresql || true
sudo pgbackrest --stanza=dev --delta restore
sudo systemctl start postgresql
sudo -u postgres psql -d postgres -c "select now();"

Pass criteria:

  1. latest backup completes successfully
  2. restore target starts from backup
  3. validation query succeeds on restored instance

4.6.6 Gate F: Nexus handoff package

Provide the following to the CI/CD team before Nexus setup:

  1. PostgreSQL endpoint details (current primary IP/host, port 5432, database nexus, username nexus, TLS mode)
  2. evidence for gates A-E (command outputs, timestamps, operator)
  3. rollback notes (how to revert failover, backup restore reference)

Only after Gate F is complete should bootstrap step 5 ("Nexus deployment") proceed.

5. Debian template baseline

5.1 Template sizing

For a minimal but comfortable rebuild-oriented template:

  • disk0/root: 12 GiB for generic services.
  • disk0/root: 16 GiB for heavier base roles (for example Nexus/PostgreSQL nodes).
  • swap: none by default (or minimal small swap file only if required by workload policy).

Rationale: keep OS disposable; keep all mutable service data on disk1.

5.2 Filesystem/use model

  • Root filesystem is immutable-by-process (rebuilt from image).
  • Persistent state (DB data, blob stores, queues, indexes) is always on disk1 or dedicated mounted volume.
  • Terraform should mark persistent disks with lifecycle protection (prevent_destroy) where appropriate.

5.3 Proxmox Debian template spec (default)

Use one reusable Debian template with these defaults:

Setting Baseline
vCPU 2 (1 socket x 2 cores)
RAM 4096 MiB
OS disk (scsi0) 16 GiB, virtio-scsi-single, discard on
Cloud-init disk present (ide2 or equivalent cloud-init device)
NIC model virtio
Guest agent enabled (qemu-guest-agent)
Firmware/machine OVMF (UEFI) + q35 + EFI disk on shared storage

5.4 Network model for template-derived VMs

Do not build separate templates per environment tier.

Use one template and assign NIC/bridge/VLAN when creating each VM:

  1. net0 on vmbr0 for admin-tier VMs:
  2. 22 for prd-admin (guest search domain admin-prd.vijfpas.be)
  3. 23 for dev-admin (current guest search domains include admin-nonprd.vijfpas.be and dev-admin.vijfpas.be)
  4. net1 on vmbr1 with workload VLAN tag:
  5. 25 for planned acc-dmz
  6. 26 for planned acc-svc
  7. 27 for planned acc-core
  8. 28 for planned acc-egress
  9. 30 for prd-dmz
  10. 31 for prd-svc
  11. 32 for dev-core
  12. 33 for dev-egress
  13. 34 for prd-egress

Guardrail:

  • vmbr0 is the control-plane trunk bridge on bond0 and should be VLAN-aware with the live and planned admin/control VLAN IDs in use.
  • nfr-mgmt / VLAN 20 is the native VLAN on vmbr0, so guest NICs attached there should be untagged.
  • Guest NICs on prd-admin / VLAN 22 and dev-admin / VLAN 23 stay tagged on vmbr0.
  • Do not create host subinterfaces for admin VLANs unless the Proxmox host itself needs IPs on those segments.

SSH policy baseline:

  • Allow 22/tcp only from approved environment admin-source networks.
  • Expose SSH on workload VLAN interfaces for managed service VMs unless a documented exception requires a dedicated admin NIC.

5.5 SSH and sudo baseline

  • Create a non-root admin user (for example debian) with sudo rights.
  • Use SSH keys only; disable password auth.
  • Disable root SSH login.
  • Keep private key on admin workstation at %USERPROFILE%\\.ssh (~/.ssh) with passphrase.
  • Keep an encrypted backup of private keys outside Git and outside Terraform state.
  • Distribute only public keys through cloud-init/Ansible.

5.6 Split admin network security baseline

Infra/admin split baseline:

  1. nfr-mgmt (VLAN 20) is substrate-only (Proxmox, PBS, UniFi, control-plane endpoints).
  2. prd-admin (VLAN 22) is the current production admin-source network; guest search domains on VMs use admin-prd.
  3. dev-admin (VLAN 23) is the current development/admin-source network; guest search domains on VMs are still mixed and should converge to dev-admin.
  4. Default deny between environment admin tiers; allow only explicit admin or automation paths.
  5. Default deny from workload VLANs to all management and admin VLANs unless explicitly required.
  6. SSH/API access to managed service VMs should default to their workload IPs from approved admin-source networks only; dedicated guest-admin interfaces are exception cases.

5.6.1 Team access baseline for managed VMs

Use this as the default Linux access model on service VMs:

  1. Create one local role account per team that needs access; do not use a single shared generic admin user.
  2. Give the owning team routine service-admin access to the VMs they operate.
  3. Give infra platform/break-glass access to managed VMs.
  4. Give security read-only or tightly scoped review access by default; use elevated access only when needed for change, review, or incident handling.
  5. Manage public keys, authorized principals, and sudo policy centrally in Ansible/Git; do not rely on manual team-to-team pubkey handoff.
  6. Treat SSH CA or short-lived signed SSH credentials as the preferred future-state improvement.

5.7 VM NIC and VLAN assignment model (Mermaid)

flowchart LR
  subgraph ADMIN[Admin-source VMs]
    APRD[future prd workbench or approved prd automation]
    ANON[workbench-delivery-dev-admin or workbench-runtime-dev-admin or workbench-data-dev-admin or approved development automation]
  end

  subgraph SVC[Managed service VM]
    VM[workload NIC only by default]
  end

  APRD --> M22[vmbr0 VLAN 22 prd-admin]
  ANON --> M23[vmbr0 VLAN 23 dev-admin]
  M20[vmbr0 VLAN 20 nfr-mgmt only]
  M22 -. approved SSH/API only .-> VM
  M23 -. approved SSH/API only .-> VM

  VM --> W30[vmbr1 VLAN 30 prd-dmz]
  VM --> W31[vmbr1 VLAN 31 prd-svc]
  VM --> W32[vmbr1 VLAN 32 dev-core]
  VM --> W33[vmbr1 VLAN 33 dev-egress]
  VM --> W34[vmbr1 VLAN 34 prd-egress]

6. Terraform state placement

6.1 Bootstrap phase

  • Use local encrypted backend only for earliest infra bootstrap.
  • Keep state out of Git.
  • Use Terraform pg backend on dedicated tfstate database/schema in PostgreSQL HA.
  • Separate DB role for Terraform state with least privilege.
  • Include backup/restore coverage for tfstate database.

Alternative: S3-compatible remote backend with locking can be adopted later when object storage + locking guarantees are validated.

7. Temporary Debian egress during bootstrap

When package mirrors are not yet available via Nexus:

  1. Add temporary egress allow rule only for the source infra/admin subnet that needs bootstrap (nfr-mgmt for Proxmox host/template actions, dev-admin for first VM bootstrap, prd-admin only when explicitly required), limited to destinations/ports 80/443.
  2. Build/update Debian template.
  3. Seed Nexus proxy repositories.
  4. Remove temporary egress allow rule immediately.
  5. Validate nodes can update/install only through Nexus path.

8. Things commonly forgotten in this bootstrap

  1. Time sync before TLS/cluster setup (chronyc tracking and chronyc sources -v consistent on all nodes).
  2. Internal CA/certificate automation before exposing internal endpoints.
  3. Explicit backup + restore test before declaring PostgreSQL/Nexus production-ready.
  4. Capacity headroom policy for local mirrored datastore and PostgreSQL WAL growth.
  5. Runbook completion for failover (Nexus, PostgreSQL, Terraform backend restore).
  6. Secrets bootstrap path (OpenBao policies for DB/Nexus credentials).
  7. Reduce controller-local secret files after the first OpenBao cluster is live; keep only break-glass material and recovery custody outside the vault path. The openbao-n01/02/03-nfr-mgmt cluster is now live, initialized, unsealed, and already has the first kv + AppRole + human OIDC baseline, so the remaining blocker is GitLab/Kubernetes auth plus consumer migration, not base VM or playbook structure.

9. External references

10. Platform-dev bootstrap status (current implementation)

Terraform source for this VM set:

  • infra-live/platform-dev-vms/terraform

Current active Debian template baseline:

  • VMID 9000 (vijfpas-debian13-template) on proxmox-a.
  • 2 vCPU (1 socket x 2 cores), 4096 MiB RAM, 16 GiB scsi0 on ceph-vmdata, qemu-guest-agent enabled.
  • Firmware is OVMF (UEFI) with machine type q35, with EFI disk on ceph-vmdata.
  • Current template image was installed from ISO; cloned VMs normally require an attached cloud-init drive (ide2) for ipconfig0, ciuser, and sshkeys settings to apply.
  • Exception: intentionally isolated appliances such as rootca-offline may omit NICs and skip the cloud-init drive plus user/network injection.

Current VM set (recreated):

  • workbench-substrate-nfr-admin (VMID 120) on vmbr0 tag 42 (infra-admin) with static 10.0.42.167/24, guest search domain nfr-admin.vijfpas.be, and persistent Ceph disk (scsi1, 32G); this is the in-place replacement of legacy platform-dev-mgmt.
  • workbench-trust-nfr-admin (VMID 123) on vmbr0 tag 42 (infra-admin) with static 10.0.42.168/24, guest search domain nfr-admin.vijfpas.be, and persistent Ceph disk (scsi1, 32G).
  • workbench-delivery-dev-admin (VMID 124) on vmbr0 tag 23 (dev-admin) with static 10.0.23.170/24, guest search domain dev-admin.vijfpas.be, and persistent Ceph disk (scsi1, 32G).
  • workbench-runtime-dev-admin (VMID 125) on vmbr0 tag 23 (dev-admin) with static 10.0.23.171/24, guest search domain dev-admin.vijfpas.be, and persistent Ceph disk (scsi1, 32G).
  • workbench-data-dev-admin (VMID 126) on vmbr0 tag 23 (dev-admin) with static 10.0.23.172/24, guest search domain dev-admin.vijfpas.be, and persistent Ceph disk (scsi1, 32G).
  • rootca-offline (VMID 133) on shared ceph-vmdata with 16G root disk, no data disk, and no virtual NICs (offline root CA workspace for core.vijfpas.be).
  • intca-nfr-admin (VMID 134) on vmbr0 tag 42 (infra-admin) with static 10.0.42.134/24, guest search domain nfr-admin.vijfpas.be, 16G root disk on shared ceph-vmdata, and no permanently attached secondary data disk; temporary CA artifact transfer uses a detachable auxiliary volume and/or guest-agent path when needed (online intermediate CA/admin workspace for core.vijfpas.be).
  • gitlab-dev-svc (VMID 146) on dev-svc with static 10.0.37.146/24, 16G root disk on shared ceph-vmdata, and a dedicated 32G Ceph-backed data disk, with external PostgreSQL/Gitaly dependencies (internal GitLab application tier).
  • gitaly-dev-core (VMID 147) on dev-core with static 10.0.32.147/24, 16G root disk on shared ceph-vmdata, and a dedicated 128G Ceph-backed repository disk (single external Gitaly node for internal GitLab).
  • the live plain-name K3s cluster now uses k8s-server01/02/03-pfm-svc (VMIDs 140-142) and k8s-worker01/02-pfm-svc (VMIDs 143-144); the old placeholder k8s-platform-* cluster was destroyed on March 30, 2026 before first use and is not reused.
  • historical note: the old unreachable platform-postgresql-sec (VMID 131) on proxmox-d was replaced on proxmox-e by the current secondary postgresql-sec-dev-core (VMID 138), which still retains the postgresql-platform-sec.dev-core.vijfpas.be overlap alias during the migration window; keep 131 outside Terraform until it is manually removed after proxmox-d returns.

Pinned VM network identity baseline (keep stable across recreate):

  • workbench-substrate-nfr-admin: MAC bc:24:11:b8:3e:b9 (static IP)
  • workbench-trust-nfr-admin: MAC bc:24:11:20:68:23 (static IP)
  • workbench-delivery-dev-admin: MAC bc:24:11:23:70:24 (static IP)
  • workbench-runtime-dev-admin: MAC bc:24:11:23:71:25 (static IP)
  • workbench-data-dev-admin: MAC bc:24:11:23:72:26 (static IP)

Security model guidance:

  1. Fastest operational baseline: one shared infra tooling VM in infra-admin (VMID 120).
  2. Stronger separation baseline: one infra tooling VM in infra-admin plus plane workbenches on the environment admin tiers with separate credentials/repos/state and no cross-environment admin reuse.
  3. For strict environments, create dedicated plane workbenches directly on the target admin tier instead of reintroducing shared environment workbenches.

Current reachability reality:

  • workbench-substrate-nfr-admin, workbench-trust-nfr-admin, workbench-delivery-dev-admin, workbench-runtime-dev-admin, and workbench-data-dev-admin answer over SSH from the current substrate workbench.
  • the new short SSH aliases (workbench-*.{nfr-admin,dev-admin}.vijfpas.be) are live and are now the only canonical workbench DNS names
  • SSH/API administration still depends on UniFi policy order and admin source allow-lists (MyPCs rule scope).
  • If a future switch profile change removes VLAN 22 or 23 from the control-plane trunk, any future prd-admin or dev-admin workbench on vmbr0 would become unreachable even when VM config is correct.

Bootstrap package egress model (tight allow-list):

  • Use rule allow-mgmt-bootstrap-package-web (Internal -> External, source nfr-mgmt, tcp/80,443) to only:
  • deb.debian.org
  • security.debian.org
  • checkpoint-api.hashicorp.com
  • releases.hashicorp.com
  • Do not assume block-10net-egress-default-v2 is enabled; confirm the current controller snapshot before using it as the active baseline.
  • Prefer short-lived enablement window during bootstrap, then disable the bootstrap allow rule.

Connection verification method:

  • Use real protocol checks with timeout:
  • SSH: ssh -o BatchMode=yes -o ConnectTimeout=8 <user>@<ip> true
  • API: curl --max-time 8 https://<target>:8006/...
  • Do not rely on synthetic port probes for policy decisions.