Main project image Main project image Dark

Home Lab

XCPng, k3s, Flux CD, Longhorn, Traefik, TrueNas, Ubiquiti, GitOps, Kubernetes, SealedSecrets

A production-grade home lab infrastructure running on XCP-ng with k3s Kubernetes cluster, managed entirely through GitOps using Flux CD. Features automated deployments, persistent storage with Longhorn, ingress routing with Traefik, and sealed secrets management.

Visit the project ↗

Table of Contents

  1. Overview
  2. Role
  3. The Problem
  4. The Goal
  5. The Solution
  6. Technical Implementation
  7. Key Features
  8. Lessons Learned

Overview

A production-grade home lab infrastructure that evolved from simple experimentation into a comprehensive GitOps-managed Kubernetes platform. The lab runs real services while providing hands-on experience with enterprise cloud-native technologies including k3s, Flux CD, Longhorn storage, Traefik ingress, and sealed secrets management.


Role

As sole architect and operator, I handle infrastructure design, cluster operations, GitOps implementation, application deployment, security management, and disaster recovery procedures across dev and production environments.


The Problem

Manual deployments led to configuration drift between environments, no secure way to version control secrets, lack of disaster recovery procedures, and no safe testing environment. Documentation was scattered and manual processes were difficult to replicate.


The Goal

Build a GitOps-managed Kubernetes platform with automated deployments, secure secrets management, persistent storage with disaster recovery, and multi-environment support (dev/prod) for safe testing—all following enterprise-grade practices.


The Solution

Diagram

Diagram

Infrastructure Architecture

Architecture Layers:

GitOps with Flux CD

Flux continuously watches the GitHub repository and automatically reconciles cluster state. All infrastructure is declared in YAML manifests stored in Git, enabling disaster recovery through complete cluster reconstruction from the repository.

Storage & Networking

Longhorn provides distributed persistent volumes with cross-node replication, snapshots, and disaster recovery capabilities. Traefik handles all HTTP/HTTPS traffic with automatic routing, certificate management, and environment-specific hostname patching.

Security

Sealed Secrets encrypt credentials locally with cluster-specific public keys before storing in Git. The controller automatically decrypts them in-cluster, with separate encryption keys per environment and key rotation support.


Technical Implementation

Repository Structure

The Flux repository follows a structured hierarchy optimized for multi-environment GitOps:

Mlx.Home.k3s.Flux/
├── apps/                           # Application definitions
│   └── <app-name>/
│       ├── base/                   # Production-ready base manifests
│       │   ├── kustomization.yaml
│       │   ├── deployment.yaml
│       │   ├── service.yaml
│       │   ├── ingressroute.yaml
│       │   └── sealedsecret.yaml
│       └── overlays/
│           ├── dev/                # Dev-specific patches
│           │   ├── kustomization.yaml
│           │   ├── deployment_dev.yaml
│           │   └── ingressroute_dev.yaml
│           └── prod/               # Prod-only modifications (rare)

├── clusters/                       # Cluster entry points
│   ├── mlx-home-dev/
│   │   ├── flux-system/           # Auto-generated by Flux bootstrap
│   │   └── kustomization.yaml     # Points to environment stages
│   └── mlx-home-prod/
│       ├── flux-system/
│       └── kustomization.yaml

└── environments/                   # Environment-specific configurations
    ├── dev/
    │   ├── 00_initialize/         # CRDs, namespaces, foundational resources
    │   ├── 01_recovery/           # Longhorn, storage recovery manifests
    │   ├── 02_live/               # Live application deployments
    │   └── namespaces/
    └── prod/
        ├── 00_initialize/
        ├── 01_recovery/
        ├── 02_live/
        └── namespaces/

Design Principles: Base manifests represent production truth; dev overlays patch only necessary differences using Kustomize. Three-stage deployments (initialize, recovery, live) ensure controlled bootstrapping.

Environment Management

Dev: Tests changes on feature branches before production. Uses dev-specific hostnames and may have reduced resources. After testing, changes are applied to prod manifests and merged to main.

Prod: Stable 24/7 services always tracking main branch with production hostnames and resource allocations. Updates only via merged PRs.

Deployment Stages

  1. Initialize: CRDs, namespaces, RBAC, foundational controllers
  2. Recovery: Longhorn storage system, backup volumes, storage classes
  3. Live: Application deployments, services, ingress routes, workloads

Bootstrap sequence: Install Flux → Configure sealed secrets keys → Deploy stages 0-2 sequentially → Restore Longhorn volumes → Verify dashboards → Applications online.


Key Features


Lessons Learned

What Went Well

GitOps Revolutionized Operations: Complete cluster state in Git enables instant rebuilds, PR-based change reviews, commit-based rollbacks, and full audit history.

Multi-Environment Testing: Dev/prod split with feature branch testing caught production-breaking issues early. Rapid iteration in dev provides confidence for production deployments.

Sealed Secrets: Solved credential management by enabling safe storage of encrypted secrets in public Git, version-controlled configs, and separate dev/prod secrets without manual coordination.

Longhorn Disaster Recovery: Full cluster rebuilds tested successfully multiple times—bootstrap Flux, deploy Longhorn, restore volumes, applications return with all data intact.

Kustomize Alignment: Strategic merge patches keep environments aligned. Base = production truth, dev patches only minimal differences (hostnames, resources), making drift obvious.

Self-Documenting Structure: Repository patterns serve as documentation, making new service additions straightforward.

What Could Be Improved

Monitoring: Add Prometheus, Grafana, and Loki for cluster health visibility, proactive alerting, and centralized logging.

Resource Tuning: Better resource requests/limits to prevent contention and improve capacity planning.

Automated Testing: Validate manifests, test ingress routes, and verify sealed secrets before cluster deployment.

Backup Automation: Scheduled snapshots, off-cluster storage, and automated verification.

Network Policies: Restrict pod-to-pod communication and enforce least privilege for better security.

Certificate Management: Implement cert-manager with Let’s Encrypt for automated provisioning and renewal.

Key Takeaways