Vijfpas bootstrap implementation plan¶
This document defines the practical bootstrap order for the current platform state.
1. Scope and constraints¶
- Outbound from
10.xnetworks is deny-by-default. - Local infra services (
DNS,DHCP) stay local; NTP is provided by host Chrony with explicit upstream allow-list until a dedicated internal NTP tier exists. - NFR and guest-admin access is restricted to two approved admin IPs.
- Persistent service data must live on a second disk/filesystem and survive image rebuilds.
- Debian templates are rebuilt from code (Terraform + Ansible), not patched in place.
Terminology note:
nfr-mgmtis substrate-only.- environment segments use the
<environment>-<tier>naming model. - the live shared-platform service tiers now use the canonical
pfm-svc,pfm-core,pfm-egress, andpfm-bcknames.
2. Bootstrap order (recommended)¶
- Network controls first
- Keep default-deny outbound and infra/admin allow-list rules enabled (
nfr-mgmt,prd-admin,dev-admin). - Keep change-backed temporary exception path for bootstrap package access.
- Base image pipeline
- Build Debian golden template on each Proxmox node.
- Validate cloud-init, SSH hardening, local DNS, Chrony sync, and config-management bootstrap.
- Terraform state backend bootstrap
- Start with short-lived local backend only for initial foundation apply.
- Move state to HA backend after PostgreSQL HA is up.
- PostgreSQL HA foundation on Proxmox C/D/E
- Deploy PostgreSQL HA first; multiple services depend on it.
- Nexus deployment
- Deploy Nexus against PostgreSQL HA and internal blob storage.
- Configure proxy repos so platform nodes consume packages through Nexus.
- Cut consumers to internal package path
- Point Debian/containers/tools to Nexus mirrors/proxies.
- Issue one named Nexus machine credential per VM or service identity; do not share one credential across teams or hosts.
- Start with Debian APT on platform VMs using root-only
auth.conf.dcredentials and separate APT repository entries for base, updates, and security. - Remove temporary outbound bootstrap exceptions.
- Bootstrap internal GitLab on
dev-svc+dev-core - Deploy
gitlab-dev-svcondev-svcplusgitaly-dev-coreondev-core. - Use external PostgreSQL and a single external Gitaly node with local repository storage on
dbpool. - Pull Debian packages and the GitLab CE package from Nexus only; no direct internet egress from the GitLab/Gitaly pair.
- Run the shared-platform K3s bootstrap cluster on the live
pfm-svctier - The old placeholder
k8s-platform-*cluster was intentionally cleared before first use and replaced with the plain-name cluster. - The live cluster now uses
k8s-server01/02/03-pfm-svcplusk8s-worker01/02-pfm-svcon the shared-platform service tier. - The canonical API VIP is
k8s.pfm-svc.vijfpas.beon10.0.43.155. - The live base-cluster stack is
kube-vip, flannel,MetalLB, standaloneTraefik,cert-manager, then the firstceph-csiRBDwave. bootstrap-k3s-artifacts.ymlseeded the current live shared-platform Nexus estate with thek3s-artifactsraw repository, thevijfpas-k3s-bootstrap-readrole, and one machine user per K3s node before bootstrap.- Debian packages, K3s airgap artifacts, and
kube-vipimage tarballs are served from Nexus only; there is no direct internet egress from the cluster.
2.1 Bootstrap sequence map (Mermaid)¶
flowchart LR
A[1. Network controls first] --> B[2. Build Debian base image pipeline]
B --> C[3. Bootstrap temporary Terraform state]
C --> D[4. Deploy PostgreSQL HA on C/D/E]
D --> E[5. Deploy Nexus with PostgreSQL backend]
E --> F[6. Cut package consumers to Nexus]
F --> G[7. Bootstrap internal GitLab]
G --> H[8. Redeploy shared-platform K3s cluster on live pfm-svc]
H --> I[Keep control-plane and cluster bootstrap air-gapped except for approved internal services]
2.2 Implemented base image pipeline (repository automation)¶
Pipeline path:
vijfpas/infra-live/base-image-pipeline
Pipeline phases:
build: Ansible rebuilds Debian template on each Proxmox node.validate: Terraform creates one ephemeral validation VM per node; Ansible validates baseline controls.destroy-validation: Terraform removes ephemeral validation VMs.
Orchestrator:
scripts/pipeline.sh all
Validation baseline implemented by code:
- cloud-init completion (
/var/lib/cloud/instance/boot-finished) - SSH hardening (
PasswordAuthentication no,PermitRootLogin no) opssudo user with key-based access- local DNS resolver assignment on management NIC
- Chrony synchronization checks (
chronyc tracking,chronyc sources -v) qemu-guest-agentenabled/running
3. Nexus design baseline¶
3.1 Placement¶
- Current internal Nexus instance:
nexus-pfm-egresswith service placement onpfm-egress. - The shared artifact path is
nexus.pfm-egress.vijfpas.be. - Routine SSH/admin for this VM is on the
pfm-egressIP only, sourced from approvednfr-adminhosts. - Current internal service DNS is
nexus.pfm-egress.vijfpas.be. - Direct internal
Arecord should target thepfm-egressservice IP until a dedicated internal VIP exists. - Anonymous access is disabled; all UI and repository clients require credentials.
- Do not dual-home the current Nexus VM across multiple service segments.
3.2 Dependencies¶
- PostgreSQL database (external backend on
postgresql.pfm-corefor the shared artifact estate). - Blob storage path for repositories (persistent volume).
- Local reverse proxy/TLS termination on the Nexus VM (
nginxon443to Nexus on127.0.0.1:8081). - DNS record (
nexus.pfm-egress.vijfpas.be; the formernexus.core-egress.vijfpas.beoverlap alias is retired; future VIP/reverse-proxy target if introduced). - Credentials for all clients; anonymous access is disabled.
- Optional SMTP/OIDC integrations.
- PostgreSQL client allow-lists (
pg_hba.conf) must be updated whenever the Nexus service IP/subnet changes.
3.3 HA model¶
- Nexus Repository Community Edition does not provide true active-active multi-node HA.
- Current live state:
- single-node Nexus on
nexus-pfm-egress - no Nexus service VIP today
- external PostgreSQL backend
- Treat any HA expansion as a separate design and migration change, not part of the current implementation baseline.
3.4 Security/scanning note¶
- Nexus is a repository/proxy manager, not a complete malware/CVE control plane by itself.
- Keep separate scanner controls (for example image/dependency scanning and malware checks) in CI/admission policy.
3.5 Consumer credential baseline¶
- Anonymous access is disabled.
- Human UI access uses named human accounts.
- VM/service package consumers use named machine credentials, one per VM or service identity.
- Initial consumer rollout starts with Debian APT on platform VMs.
- APT consumers should use root-only credentials in
/etc/apt/auth.conf.d/and separate Nexus APT repository definitions for base, updates, and security. - Do not reuse the built-in
adminaccount for package managers or automation consumers.
3A. Internal GitLab bootstrap baseline¶
3A.1 Placement¶
- Current bootstrap target for internal delivery tooling:
gitlab-dev-svc(VMID 146) onproxmox-e,dev-svc,10.0.37.146gitaly-dev-core(VMID 147) onproxmox-c,dev-core,10.0.32.147gitlab-dev-svckeeps its root disk on sharedceph-vmdata.gitaly-dev-corekeeps both its root disk and repository data disk on sharedceph-vmdata.
3A.2 Dependencies¶
- PostgreSQL database and role on the current canonical dev PostgreSQL writer alias
postgresql.dev-core.vijfpas.be. - Internal TLS certificate for the current live service FQDN
gitlab.dev-svc.vijfpas.be. - Nexus-served Debian packages and GitLab CE package artifacts.
- Internal DNS records:
gitlab.dev-svc.vijfpas.be->10.0.37.146gitaly.dev-core.vijfpas.be->10.0.32.147- Older
platform-gitlab.*andplatform-gitaly.*overlap aliases are retired from the current implementation baseline.
3A.3 Initial service model¶
- GitLab application tier runs on
gitlab-dev-svc. - Gitaly runs as a single external node on
gitaly-dev-core. - Internal GitLab -> Gitaly RPC uses token-authenticated
tcp/8075ondev-core. - The GitLab/Gitaly pair should not use general internet egress; Debian packages and the GitLab CE package should come from
nexus.pfm-egress. - This is a bootstrap baseline, not a final HA model. Repository HA later means either Gitaly Cluster/Praefect or a deliberate DR design, not ad hoc storage sync.
3.6 Current Nexus recovery baseline¶
Current Nexus implementation is single-node only:
- one live Nexus VM on
nexus-pfm-egress - no Nexus service VIP today
- external PostgreSQL backend
- persistent blob/data path on the same VM
If higher availability is introduced later, document that as a separate design and migration track instead of treating it as part of the current implementation baseline.
4. Current PostgreSQL service baseline¶
4.1 VM placement and storage¶
Current live PostgreSQL pair:
postgresql-prim-dev-coreonproxmox-cpostgresql-sec-dev-coreonproxmox-e- both use a replaceable root disk plus a persistent node-local
dbpooldata disk
4.2 Current replication model¶
- primary + secondary
- PostgreSQL service traffic stays on
dev-core - routine SSH/admin lands on the same
dev-coreIPs from approveddev-adminsources
4.3 Current data protection baseline¶
- PostgreSQL base backup plus WAL archive is required before treating the service as stable
- backup retention and WAL expiry must be driven by the backup tool, not by indefinite local archive growth on the database VM
4.5 Delivery ownership and handoff to Nexus¶
Execution model for this bootstrap phase:
- Owner: platform data team owns PostgreSQL OS/service bootstrap, replication, backup policy, and restore drill.
- Control path: routine SSH/admin comes from approved
dev-adminsource hosts to the workload IPs ondev-core(10.0.32.130/133). - Workload path: PostgreSQL client traffic stays on
dev-core(10.0.32.130/133,tcp/5432). - Legacy route baseline: the explicit return route (
10.0.20.0/24 via 10.0.23.1 dev eth1) existed only for the olderplatform-dev-mgmtbootstrap path and should remain retired.
Nexus step (section 3 and bootstrap step 5) can start only after PostgreSQL exit criteria are met:
- current primary is reachable on
10.0.32.130:5432 - replication between primary and secondary is healthy
- backup + restore drill completed and evidenced
- management access policy validated on the workload IPs
4.6 Data team PostgreSQL bring-up checklist (command-driven)¶
Use this sequence for the current platform-dev rollout before Nexus onboarding.
Current implementation profile (platform-dev):
- data nodes:
postgresql-prim-dev-core(10.0.32.130) andpostgresql-sec-dev-core(10.0.32.133) - topology today: primary + secondary
- routine administration uses the
dev-coreIPs from approveddev-adminsources only
Set working variables on workbench-data-dev-admin for the current data-admin path:
export PG_CORE_IPS="10.0.32.130 10.0.32.133"
export ADMIN_SOURCE_IP="10.0.23.172"
export PG_PRIMARY_IP="10.0.32.130"
4.6.1 Gate A: workload admin path policy¶
Run from workbench-data-dev-admin:
# Current state: SSH must work on the workload IPs.
for ip in $PG_CORE_IPS; do
timeout 6 ssh -o BatchMode=yes -o StrictHostKeyChecking=accept-new "debian@$ip" "hostname -f"
done
Run on each PostgreSQL VM:
Pass criteria:
- SSH succeeds on
10.0.32.130/133from approveddev-adminsource hosts. - Any obsolete route for
10.0.20.0/24remains absent. - No guest-admin NIC is present on VLAN
23.
4.6.2 Gate B: host and storage baseline¶
Run on each PostgreSQL VM:
If data disk is fresh/unformatted, initialize once (example sdb):
sudo parted -s /dev/sdb mklabel gpt
sudo parted -s /dev/sdb mkpart primary ext4 0% 100%
sudo mkfs.ext4 -L pgdata /dev/sdb1
sudo mkdir -p /var/lib/postgresql
echo "LABEL=pgdata /var/lib/postgresql ext4 defaults,noatime 0 2" | sudo tee -a /etc/fstab
sudo mount -a
findmnt /var/lib/postgresql
Pass criteria:
- time sync healthy
- PostgreSQL data path mounted on persistent disk (
scsi1/disk1), not root disk
4.6.3 Gate C: package install and local database readiness¶
Run on each PostgreSQL VM:
sudo apt-get update
sudo apt-get install -y postgresql postgresql-contrib pgbackrest
sudo systemctl enable --now postgresql
sudo -u postgres pg_isready -h 127.0.0.1 -p 5432
sudo -u postgres psql -d postgres -c "select version();"
Create Nexus database principal (run once on current primary/writer):
sudo -u postgres psql -v ON_ERROR_STOP=1 <<'SQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'nexus') THEN
CREATE ROLE nexus LOGIN PASSWORD 'REPLACE_WITH_SECRET' NOSUPERUSER NOCREATEDB NOCREATEROLE;
END IF;
END $$;
SELECT 'CREATE DATABASE nexus OWNER nexus'
WHERE NOT EXISTS (SELECT 1 FROM pg_database WHERE datname = 'nexus')\gexec
SQL
Pass criteria:
- PostgreSQL healthy on each VM (
pg_isreadylocal). nexusrole andnexusdatabase exist on writer.
4.6.4 Gate D: replication and failover validation¶
Current platform-dev profile (2-node primary/secondary):
# On primary
sudo -u postgres psql -d postgres -c "select application_name, client_addr, state, sync_state from pg_stat_replication;"
# On secondary
sudo -u postgres psql -d postgres -c "select pg_is_in_recovery();"
Pass criteria:
- at least one healthy writer and one healthy replica
- replica state is streaming/caught-up per defined RPO policy
- current manual failover procedure is documented and recorded
4.6.5 Gate E: backup and restore drill¶
Run backup checks on writer:
sudo pgbackrest --stanza=dev check
sudo pgbackrest --stanza=dev backup
sudo pgbackrest info --stanza=dev
Run restore drill on isolated restore target (not on active writer):
# Example on restore target after backup artifacts are accessible.
sudo systemctl stop postgresql || true
sudo pgbackrest --stanza=dev --delta restore
sudo systemctl start postgresql
sudo -u postgres psql -d postgres -c "select now();"
Pass criteria:
- latest backup completes successfully
- restore target starts from backup
- validation query succeeds on restored instance
4.6.6 Gate F: Nexus handoff package¶
Provide the following to the CI/CD team before Nexus setup:
- PostgreSQL endpoint details (current primary IP/host, port
5432, databasenexus, usernamenexus, TLS mode) - evidence for gates A-E (command outputs, timestamps, operator)
- rollback notes (how to revert failover, backup restore reference)
Only after Gate F is complete should bootstrap step 5 ("Nexus deployment") proceed.
5. Debian template baseline¶
5.1 Template sizing¶
For a minimal but comfortable rebuild-oriented template:
- disk0/root: 12 GiB for generic services.
- disk0/root: 16 GiB for heavier base roles (for example Nexus/PostgreSQL nodes).
- swap: none by default (or minimal small swap file only if required by workload policy).
Rationale: keep OS disposable; keep all mutable service data on disk1.
5.2 Filesystem/use model¶
- Root filesystem is immutable-by-process (rebuilt from image).
- Persistent state (DB data, blob stores, queues, indexes) is always on disk1 or dedicated mounted volume.
- Terraform should mark persistent disks with lifecycle protection (
prevent_destroy) where appropriate.
5.3 Proxmox Debian template spec (default)¶
Use one reusable Debian template with these defaults:
| Setting | Baseline |
|---|---|
| vCPU | 2 (1 socket x 2 cores) |
| RAM | 4096 MiB |
OS disk (scsi0) |
16 GiB, virtio-scsi-single, discard on |
| Cloud-init disk | present (ide2 or equivalent cloud-init device) |
| NIC model | virtio |
| Guest agent | enabled (qemu-guest-agent) |
| Firmware/machine | OVMF (UEFI) + q35 + EFI disk on shared storage |
5.4 Network model for template-derived VMs¶
Do not build separate templates per environment tier.
Use one template and assign NIC/bridge/VLAN when creating each VM:
net0onvmbr0for admin-tier VMs:22forprd-admin(guest search domainadmin-prd.vijfpas.be)23fordev-admin(current guest search domains includeadmin-nonprd.vijfpas.beanddev-admin.vijfpas.be)net1onvmbr1with workload VLAN tag:25for plannedacc-dmz26for plannedacc-svc27for plannedacc-core28for plannedacc-egress30forprd-dmz31forprd-svc32fordev-core33fordev-egress34forprd-egress
Guardrail:
vmbr0is the control-plane trunk bridge onbond0and should be VLAN-aware with the live and planned admin/control VLAN IDs in use.nfr-mgmt/ VLAN20is the native VLAN onvmbr0, so guest NICs attached there should be untagged.- Guest NICs on
prd-admin/ VLAN22anddev-admin/ VLAN23stay tagged onvmbr0. - Do not create host subinterfaces for admin VLANs unless the Proxmox host itself needs IPs on those segments.
SSH policy baseline:
- Allow
22/tcponly from approved environment admin-source networks. - Expose SSH on workload VLAN interfaces for managed service VMs unless a documented exception requires a dedicated admin NIC.
5.5 SSH and sudo baseline¶
- Create a non-root admin user (for example
debian) with sudo rights. - Use SSH keys only; disable password auth.
- Disable root SSH login.
- Keep private key on admin workstation at
%USERPROFILE%\\.ssh(~/.ssh) with passphrase. - Keep an encrypted backup of private keys outside Git and outside Terraform state.
- Distribute only public keys through cloud-init/Ansible.
5.6 Split admin network security baseline¶
Infra/admin split baseline:
nfr-mgmt(VLAN 20) is substrate-only (Proxmox, PBS, UniFi, control-plane endpoints).prd-admin(VLAN 22) is the current production admin-source network; guest search domains on VMs useadmin-prd.dev-admin(VLAN 23) is the current development/admin-source network; guest search domains on VMs are still mixed and should converge todev-admin.- Default deny between environment admin tiers; allow only explicit admin or automation paths.
- Default deny from workload VLANs to all management and admin VLANs unless explicitly required.
- SSH/API access to managed service VMs should default to their workload IPs from approved admin-source networks only; dedicated guest-admin interfaces are exception cases.
5.6.1 Team access baseline for managed VMs¶
Use this as the default Linux access model on service VMs:
- Create one local role account per team that needs access; do not use a single shared generic admin user.
- Give the owning team routine service-admin access to the VMs they operate.
- Give
infraplatform/break-glass access to managed VMs. - Give
securityread-only or tightly scoped review access by default; use elevated access only when needed for change, review, or incident handling. - Manage public keys, authorized principals, and sudo policy centrally in Ansible/Git; do not rely on manual team-to-team pubkey handoff.
- Treat SSH CA or short-lived signed SSH credentials as the preferred future-state improvement.
5.7 VM NIC and VLAN assignment model (Mermaid)¶
flowchart LR
subgraph ADMIN[Admin-source VMs]
APRD[future prd workbench or approved prd automation]
ANON[workbench-delivery-dev-admin or workbench-runtime-dev-admin or workbench-data-dev-admin or approved development automation]
end
subgraph SVC[Managed service VM]
VM[workload NIC only by default]
end
APRD --> M22[vmbr0 VLAN 22 prd-admin]
ANON --> M23[vmbr0 VLAN 23 dev-admin]
M20[vmbr0 VLAN 20 nfr-mgmt only]
M22 -. approved SSH/API only .-> VM
M23 -. approved SSH/API only .-> VM
VM --> W30[vmbr1 VLAN 30 prd-dmz]
VM --> W31[vmbr1 VLAN 31 prd-svc]
VM --> W32[vmbr1 VLAN 32 dev-core]
VM --> W33[vmbr1 VLAN 33 dev-egress]
VM --> W34[vmbr1 VLAN 34 prd-egress]
6. Terraform state placement¶
6.1 Bootstrap phase¶
- Use local encrypted backend only for earliest infra bootstrap.
- Keep state out of Git.
6.2 Target phase (recommended)¶
- Use Terraform
pgbackend on dedicatedtfstatedatabase/schema in PostgreSQL HA. - Separate DB role for Terraform state with least privilege.
- Include backup/restore coverage for
tfstatedatabase.
Alternative: S3-compatible remote backend with locking can be adopted later when object storage + locking guarantees are validated.
7. Temporary Debian egress during bootstrap¶
When package mirrors are not yet available via Nexus:
- Add temporary egress allow rule only for the source infra/admin subnet that needs bootstrap (
nfr-mgmtfor Proxmox host/template actions,dev-adminfor first VM bootstrap,prd-adminonly when explicitly required), limited to destinations/ports80/443. - Build/update Debian template.
- Seed Nexus proxy repositories.
- Remove temporary egress allow rule immediately.
- Validate nodes can update/install only through Nexus path.
8. Things commonly forgotten in this bootstrap¶
- Time sync before TLS/cluster setup (
chronyc trackingandchronyc sources -vconsistent on all nodes). - Internal CA/certificate automation before exposing internal endpoints.
- Explicit backup + restore test before declaring PostgreSQL/Nexus production-ready.
- Capacity headroom policy for local mirrored datastore and PostgreSQL WAL growth.
- Runbook completion for failover (Nexus, PostgreSQL, Terraform backend restore).
- Secrets bootstrap path (OpenBao policies for DB/Nexus credentials).
- Reduce controller-local secret files after the first
OpenBaocluster is live; keep only break-glass material and recovery custody outside the vault path. Theopenbao-n01/02/03-nfr-mgmtcluster is now live, initialized, unsealed, and already has the firstkv+AppRole+ human OIDC baseline, so the remaining blocker is GitLab/Kubernetes auth plus consumer migration, not base VM or playbook structure.
9. External references¶
- Sonatype Nexus Repository PostgreSQL guidance: https://help.sonatype.com/en/postgresql-for-sonatype-nexus-repository.html
- Sonatype Nexus Repository HA guidance: https://help.sonatype.com/en/high-availability-for-sonatype-nexus-repository.html
- Terraform
pgbackend: https://developer.hashicorp.com/terraform/language/backend/pg - Terraform
s3backend: https://developer.hashicorp.com/terraform/language/backend/s3
10. Platform-dev bootstrap status (current implementation)¶
Terraform source for this VM set:
infra-live/platform-dev-vms/terraform
Current active Debian template baseline:
VMID 9000(vijfpas-debian13-template) onproxmox-a.2 vCPU(1 socket x 2 cores),4096 MiBRAM,16 GiBscsi0onceph-vmdata,qemu-guest-agentenabled.- Firmware is
OVMF (UEFI)with machine typeq35, with EFI disk onceph-vmdata. - Current template image was installed from ISO; cloned VMs normally require an attached cloud-init drive (
ide2) foripconfig0,ciuser, andsshkeyssettings to apply. - Exception: intentionally isolated appliances such as
rootca-offlinemay omit NICs and skip the cloud-init drive plus user/network injection.
Current VM set (recreated):
workbench-substrate-nfr-admin(VMID 120) onvmbr0tag42(infra-admin) with static10.0.42.167/24, guest search domainnfr-admin.vijfpas.be, and persistent Ceph disk (scsi1,32G); this is the in-place replacement of legacyplatform-dev-mgmt.workbench-trust-nfr-admin(VMID 123) onvmbr0tag42(infra-admin) with static10.0.42.168/24, guest search domainnfr-admin.vijfpas.be, and persistent Ceph disk (scsi1,32G).workbench-delivery-dev-admin(VMID 124) onvmbr0tag23(dev-admin) with static10.0.23.170/24, guest search domaindev-admin.vijfpas.be, and persistent Ceph disk (scsi1,32G).workbench-runtime-dev-admin(VMID 125) onvmbr0tag23(dev-admin) with static10.0.23.171/24, guest search domaindev-admin.vijfpas.be, and persistent Ceph disk (scsi1,32G).workbench-data-dev-admin(VMID 126) onvmbr0tag23(dev-admin) with static10.0.23.172/24, guest search domaindev-admin.vijfpas.be, and persistent Ceph disk (scsi1,32G).rootca-offline(VMID 133) on sharedceph-vmdatawith16Groot disk, no data disk, and no virtual NICs (offline root CA workspace forcore.vijfpas.be).intca-nfr-admin(VMID 134) onvmbr0tag42(infra-admin) with static10.0.42.134/24, guest search domainnfr-admin.vijfpas.be,16Groot disk on sharedceph-vmdata, and no permanently attached secondary data disk; temporary CA artifact transfer uses a detachable auxiliary volume and/or guest-agent path when needed (online intermediate CA/admin workspace forcore.vijfpas.be).gitlab-dev-svc(VMID 146) ondev-svcwith static10.0.37.146/24,16Groot disk on sharedceph-vmdata, and a dedicated32GCeph-backed data disk, with external PostgreSQL/Gitaly dependencies (internal GitLab application tier).gitaly-dev-core(VMID 147) ondev-corewith static10.0.32.147/24,16Groot disk on sharedceph-vmdata, and a dedicated128GCeph-backed repository disk (single external Gitaly node for internal GitLab).- the live plain-name K3s cluster now uses
k8s-server01/02/03-pfm-svc(VMIDs 140-142) andk8s-worker01/02-pfm-svc(VMIDs 143-144); the old placeholderk8s-platform-*cluster was destroyed on March 30, 2026 before first use and is not reused. - historical note: the old unreachable
platform-postgresql-sec(VMID 131) onproxmox-dwas replaced onproxmox-eby the current secondarypostgresql-sec-dev-core(VMID 138), which still retains thepostgresql-platform-sec.dev-core.vijfpas.beoverlap alias during the migration window; keep131outside Terraform until it is manually removed afterproxmox-dreturns.
Pinned VM network identity baseline (keep stable across recreate):
workbench-substrate-nfr-admin: MACbc:24:11:b8:3e:b9(static IP)workbench-trust-nfr-admin: MACbc:24:11:20:68:23(static IP)workbench-delivery-dev-admin: MACbc:24:11:23:70:24(static IP)workbench-runtime-dev-admin: MACbc:24:11:23:71:25(static IP)workbench-data-dev-admin: MACbc:24:11:23:72:26(static IP)
Security model guidance:
- Fastest operational baseline: one shared infra tooling VM in
infra-admin(VMID 120). - Stronger separation baseline: one infra tooling VM in
infra-adminplus plane workbenches on the environment admin tiers with separate credentials/repos/state and no cross-environment admin reuse. - For strict environments, create dedicated plane workbenches directly on the target admin tier instead of reintroducing shared environment workbenches.
Current reachability reality:
workbench-substrate-nfr-admin,workbench-trust-nfr-admin,workbench-delivery-dev-admin,workbench-runtime-dev-admin, andworkbench-data-dev-adminanswer over SSH from the current substrate workbench.- the new short SSH aliases (
workbench-*.{nfr-admin,dev-admin}.vijfpas.be) are live and are now the only canonical workbench DNS names - SSH/API administration still depends on UniFi policy order and admin source allow-lists (
MyPCsrule scope). - If a future switch profile change removes VLAN
22or23from the control-plane trunk, any futureprd-adminordev-adminworkbench onvmbr0would become unreachable even when VM config is correct.
Bootstrap package egress model (tight allow-list):
- Use rule
allow-mgmt-bootstrap-package-web(Internal -> External, sourcenfr-mgmt,tcp/80,443) to only: deb.debian.orgsecurity.debian.orgcheckpoint-api.hashicorp.comreleases.hashicorp.com- Do not assume
block-10net-egress-default-v2is enabled; confirm the current controller snapshot before using it as the active baseline. - Prefer short-lived enablement window during bootstrap, then disable the bootstrap allow rule.
Connection verification method:
- Use real protocol checks with timeout:
- SSH:
ssh -o BatchMode=yes -o ConnectTimeout=8 <user>@<ip> true - API:
curl --max-time 8 https://<target>:8006/... - Do not rely on synthetic port probes for policy decisions.