Files
infra-automation/docs/linux-vm-deployment.md
Infrastructure Team 04a381e0d5 Add comprehensive documentation
- Add linux-vm-deployment.md with complete deployment guide
  - Architecture overview and security model
  - Supported distributions matrix
  - LVM partitioning specifications
  - Distribution-specific configurations
  - Troubleshooting procedures
  - Performance tuning guidelines
2025-11-10 22:52:03 +01:00

25 KiB

Multi-Distribution Linux VM Deployment Documentation

Overview

This document describes the automated deployment process for multiple Linux distributions on KVM/libvirt hypervisors. The deployment supports major server distributions including Debian, Ubuntu, RHEL, CentOS Stream, Rocky Linux, AlmaLinux, SLES, and openSUSE Leap.

Table of Contents

  1. Supported Distributions
  2. Architecture
  3. Prerequisites
  4. Cloud Image Sources
  5. Deployment Process
  6. Distribution-Specific Configuration
  7. Security Features
  8. Post-Deployment
  9. Troubleshooting
  10. Best Practices

Supported Distributions

Debian Family

Distribution Version Package Manager Firewall Cloud Image
Debian 11 (Bullseye) apt ufw Auto-download
Debian 12 (Bookworm) apt ufw Auto-download
Ubuntu 20.04 LTS (Focal) apt ufw Auto-download
Ubuntu 22.04 LTS (Jammy) apt ufw Auto-download
Ubuntu 24.04 LTS (Noble) apt ufw Auto-download

RHEL Family

Distribution Version Package Manager Firewall SELinux Cloud Image
RHEL 8, 9 dnf firewalld Enforcing ⚠️ Manual download
CentOS Stream 8, 9 dnf firewalld Enforcing Auto-download
Rocky Linux 8, 9 dnf firewalld Enforcing Auto-download
AlmaLinux 8, 9 dnf firewalld Enforcing Auto-download

SUSE Family

Distribution Version Package Manager Firewall Cloud Image
SLES 15 zypper firewalld ⚠️ Manual download
openSUSE Leap 15.5, 15.6 zypper firewalld Auto-download

Legend:

  • = Automatically downloaded from official repositories
  • ⚠️ = Requires subscription and manual download

Architecture

Multi-Distribution Support Design

┌─────────────────────────────────────────────────────────┐
│ Ansible Control Node                                    │
│                                                          │
│  ┌────────────────────────────────────────────────┐    │
│  │ deploy-linux-vm.yml                             │    │
│  │                                                 │    │
│  │  • Distribution Selection Logic                │    │
│  │  • Cloud Image Repository Map                  │    │
│  │  • OS Family Detection                         │    │
│  │  • Package Manager Adaptation                  │    │
│  └────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
                         │
                         │ SSH
                         ▼
┌─────────────────────────────────────────────────────────┐
│ KVM Hypervisor (grokbox)                                │
│                                                          │
│  ┌──────────────────────────────────────────────┐      │
│  │ Cloud Image Cache                             │      │
│  │ /var/lib/libvirt/images/                     │      │
│  │  ├─ debian-12-*.qcow2                        │      │
│  │  ├─ ubuntu-22.04-*.img                       │      │
│  │  ├─ centos-stream-9-*.qcow2                  │      │
│  │  ├─ rocky-9-*.qcow2                          │      │
│  │  └─ almalinux-9-*.qcow2                      │      │
│  └──────────────────────────────────────────────┘      │
│                                                          │
│  ┌──────────────────────────────────────────────┐      │
│  │ libvirt/QEMU                                  │      │
│  │                                               │      │
│  │  ┌──────────────┐  ┌──────────────┐         │      │
│  │  │ Debian VM    │  │ Ubuntu VM    │         │      │
│  │  │ ufw enabled  │  │ ufw enabled  │         │      │
│  │  └──────────────┘  └──────────────┘         │      │
│  │                                               │      │
│  │  ┌──────────────┐  ┌──────────────┐         │      │
│  │  │ Rocky VM     │  │ Alma VM      │         │      │
│  │  │ SELinux=Enf  │  │ SELinux=Enf  │         │      │
│  │  │ firewalld    │  │ firewalld    │         │      │
│  │  └──────────────┘  └──────────────┘         │      │
│  └──────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────────┘

Deployment Workflow

[Start]
   │
   ▼
[Validate Distribution Selection]
   │
   ├─ Check distribution in supported list
   ├─ Set distribution facts (family, package manager, etc.)
   └─ Display deployment configuration
   │
   ▼
[Pre-flight Checks]
   │
   ├─ Verify VM doesn't already exist
   ├─ Validate virtualization support
   └─ Install required packages on hypervisor
   │
   ▼
[Download Cloud Image]
   │
   ├─ Check if image cached
   ├─ Download from official repository
   └─ Verify checksum (SHA256/SHA512)
   │
   ▼
[Create VM Storage]
   │
   ├─ Create qcow2 disk (CoW from base image)
   └─ Set proper permissions (libvirt-qemu/qemu)
   │
   ▼
[Generate Cloud-Init Configuration]
   │
   ├─ Select template based on OS family:
   │   ├─ Debian/Ubuntu → apt, ufw, unattended-upgrades
   │   ├─ RHEL family → dnf, firewalld, SELinux, dnf-automatic
   │   └─ SUSE family → zypper, firewalld
   │
   ├─ Create meta-data (hostname, instance-id)
   ├─ Create user-data (users, packages, security)
   └─ Generate cloud-init ISO
   │
   ▼
[Deploy VM]
   │
   ├─ Run virt-install with appropriate os-variant
   ├─ Attach disk and cloud-init ISO
   └─ Start VM
   │
   ▼
[Wait for Boot]
   │
   ├─ VM boots from qcow2 disk
   ├─ Cloud-init runs configuration
   ├─ Network configured via DHCP
   └─ Get IP address from libvirt
   │
   ▼
[Validation]
   │
   ├─ Test SSH connectivity
   ├─ Verify cloud-init completion
   ├─ Display VM information
   └─ System health checks
   │
   ▼
[Cleanup]
   │
   └─ Remove temporary files
   │
   ▼
[Complete]

Prerequisites

Hypervisor Requirements

Hardware:

  • CPU with virtualization extensions (Intel VT-x or AMD-V)
  • Sufficient RAM for host + guest VMs
  • Adequate storage for cloud images and VM disks

Software:

For Debian/Ubuntu hypervisors:

apt install -y \
  libvirt-daemon-system \
  libvirt-clients \
  virtinst \
  qemu-kvm \
  qemu-utils \
  cloud-image-utils \
  genisoimage \
  python3-libvirt

For RHEL/CentOS/Rocky/Alma hypervisors:

dnf install -y \
  libvirt \
  libvirt-client \
  virt-install \
  qemu-kvm \
  qemu-img \
  cloud-utils \
  genisoimage \
  python3-libvirt

Services:

systemctl enable --now libvirtd

Network Requirements

  • Internet connectivity for cloud image downloads
  • DNS resolution working
  • libvirt default network active:
    virsh net-list
    virsh net-start default  # if not started
    virsh net-autostart default
    

Ansible Control Node

  • Ansible 2.9 or newer
  • SSH access to hypervisor
  • Python 3.x installed

Cloud Image Sources

Official Repositories

Debian:

Ubuntu:

CentOS Stream:

Rocky Linux:

AlmaLinux:

openSUSE Leap:

Manual Download Required

Red Hat Enterprise Linux (RHEL):

  • Requires: Red Hat subscription
  • Portal: https://access.redhat.com/downloads/
  • Steps:
    1. Log in to Red Hat Customer Portal
    2. Navigate to Downloads
    3. Select "Red Hat Enterprise Linux"
    4. Download KVM Guest Image
    5. Place at: /var/lib/libvirt/images/rhel-X-x86_64-kvm.qcow2

SUSE Linux Enterprise Server (SLES):

  • Requires: SUSE subscription
  • Portal: https://scc.suse.com/
  • Steps:
    1. Log in to SUSE Customer Center
    2. Download cloud image for SLES 15
    3. Place at: /var/lib/libvirt/images/sles-15-genericcloud-amd64.qcow2

Deployment Process

Basic Deployment

ansible-playbook plays/deploy-linux-vm.yml \
  -e "os_distribution=<distro-version>" \
  -e "vm_name=<name>"

Distribution Selection

The os_distribution variable determines which Linux distribution to deploy. Format: distro-version

Examples:

# Debian
-e "os_distribution=debian-12"

# Ubuntu
-e "os_distribution=ubuntu-22.04"

# CentOS Stream
-e "os_distribution=centos-stream-9"

# Rocky Linux
-e "os_distribution=rocky-9"

# AlmaLinux
-e "os_distribution=almalinux-9"

# openSUSE
-e "os_distribution=opensuse-leap-15.6"

Resource Customization

ansible-playbook plays/deploy-linux-vm.yml \
  -e "os_distribution=rocky-9" \
  -e "vm_name=database-server" \
  -e "vm_hostname=db01" \
  -e "vm_domain=production.local" \
  -e "vm_vcpus=8" \
  -e "vm_memory_mb=16384" \
  -e "vm_disk_size_gb=200"

Configuration Variables

Variable Type Default Description
os_distribution Required debian-12 Distribution identifier
vm_name String linux-guest VM name in libvirt
vm_hostname String linux-vm Guest hostname
vm_domain String localdomain DNS domain
vm_vcpus Integer 2 Number of virtual CPUs
vm_memory_mb Integer 2048 RAM in megabytes
vm_disk_size_gb Integer 20 Disk size in gigabytes
vm_network String default Libvirt network name
vm_bridge String virbr0 Bridge interface
ansible_user_ssh_key String (preset) SSH public key for ansible user

Distribution-Specific Configuration

Debian/Ubuntu Systems

Package Manager: apt

Cloud-Init Packages:

  • sudo, vim, htop, tmux, curl, wget, rsync, git
  • python3, python3-pip, jq, bc
  • aide, auditd, chrony, ufw, lvm2
  • cloud-guest-utils, parted
  • unattended-upgrades, apt-listchanges

Security Configuration:

  • Firewall: ufw enabled, SSH allowed
  • SSH: Root login disabled, key-only auth
  • Updates: unattended-upgrades for security updates
  • Audit: auditd enabled
  • Time Sync: chrony configured

User Management:

  • ansible user → member of sudo group
  • Passwordless sudo access

Automatic Updates Configuration:

/etc/apt/apt.conf.d/50unattended-upgrades
/etc/apt/apt.conf.d/20auto-upgrades

Post-Boot Commands:

systemctl enable ssh && systemctl restart ssh
systemctl enable chrony && systemctl start chrony
ufw --force enable && ufw allow ssh
systemctl enable auditd && systemctl start auditd
growpart /dev/vda 1 && resize2fs /dev/vda1

RHEL Family Systems

Package Manager: dnf

Cloud-Init Packages:

  • sudo, vim, htop, tmux, curl, wget, rsync, git
  • python3, python3-pip, jq, bc
  • aide, audit, chrony, firewalld, lvm2
  • cloud-utils-growpart, gdisk
  • dnf-automatic
  • policycoreutils-python-utils

Security Configuration:

  • Firewall: firewalld enabled, SSH service allowed
  • SELinux: Enforcing mode
  • SSH: Root login disabled, key-only auth
  • Updates: dnf-automatic for security updates
  • Audit: auditd enabled
  • Time Sync: chronyd configured

User Management:

  • ansible user → member of wheel group
  • Passwordless sudo access

SELinux Configuration:

setenforce 1
sed -i 's/^SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config

Automatic Updates Configuration:

/etc/dnf/automatic.conf
  upgrade_type = security
  apply_updates = yes

Post-Boot Commands:

systemctl enable sshd && systemctl restart sshd
systemctl enable chronyd && systemctl start chronyd
systemctl enable firewalld && systemctl start firewalld
firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload
systemctl enable auditd && systemctl start auditd
systemctl enable dnf-automatic.timer && systemctl start dnf-automatic.timer
setenforce 1
growpart /dev/vda 1 && xfs_growfs /

SUSE Family Systems

Package Manager: zypper

Cloud-Init Packages:

  • sudo, vim, htop, tmux, curl, wget, rsync, git
  • python3, python3-pip, jq, bc
  • aide, audit, chrony, firewalld, lvm2
  • cloud-utils-growpart, gdisk

Security Configuration:

  • Firewall: firewalld enabled, SSH service allowed
  • SSH: Root login disabled, key-only auth
  • Audit: auditd enabled
  • Time Sync: chronyd configured

User Management:

  • ansible user → member of wheel group
  • Passwordless sudo access

Post-Boot Commands:

systemctl enable sshd && systemctl restart sshd
systemctl enable chronyd && systemctl start chronyd
systemctl enable firewalld && systemctl start firewalld
firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload
systemctl enable auditd && systemctl start auditd
growpart /dev/vda 1 && xfs_growfs / || resize2fs /dev/vda1 || btrfs filesystem resize max /

Security Features

Universal Security Measures

All deployed VMs, regardless of distribution, include:

  1. User Security:

    • Dedicated ansible service account
    • SSH key-based authentication only
    • Passwordless sudo (with logging)
    • Root SSH login disabled
    • Emergency console access available (password: ChangeMe123!)
  2. Network Security:

    • Host-based firewall enabled and configured
    • SSH service allowed
    • Default deny policy for incoming traffic
    • Outgoing traffic allowed
  3. System Security:

    • Audit daemon (auditd) enabled
    • Automatic security updates configured
    • Time synchronization enabled (chrony)
    • File integrity monitoring installed (AIDE)
    • Secure SSH configuration applied
  4. SSH Hardening:

    PermitRootLogin no
    PasswordAuthentication no
    PubkeyAuthentication yes
    MaxAuthTries 3
    MaxSessions 10
    ClientAliveInterval 300
    ClientAliveCountMax 2
    

Distribution-Specific Security

RHEL Family:

  • SELinux in enforcing mode
  • firewalld with rich rules support
  • dnf-automatic for security updates
  • Subscription management for certified packages (RHEL)

Debian/Ubuntu:

  • AppArmor profiles (Ubuntu)
  • UFW for simplified firewall management
  • unattended-upgrades for security updates
  • Automatic security patch installation

SUSE Family:

  • AppArmor support
  • firewalld with zones
  • YaST integration for security management

Compliance Alignment

The deployment follows CLAUDE.md security principles:

Principle of least privilege Encryption in transit (SSH) Key-based authentication Automated security updates System auditing enabled Time synchronization Firewall enabled by default Regular security patching

Post-Deployment

Adding to Ansible Inventory

Debian/Ubuntu VM:

debian_servers:
  hosts:
    debian12-vm:
      ansible_host: 192.168.122.X
      ansible_user: ansible
      ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new'
      os_distribution: debian-12
      os_family: debian
      package_manager: apt

RHEL Family VM:

rhel_servers:
  hosts:
    rocky9-vm:
      ansible_host: 192.168.122.X
      ansible_user: ansible
      ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new'
      os_distribution: rocky-9
      os_family: rhel
      package_manager: dnf
      selinux_mode: enforcing

Initial Configuration

After deployment, run configuration management:

# Update system packages
ansible <vm_name> -m package -a "name=* state=latest" -b

# Install additional packages
ansible <vm_name> -m package -a "name=nginx state=present" -b

# Run configuration playbooks
ansible-playbook -i inventories/development/hosts.yml \
  playbooks/configure-webserver.yml \
  -l <vm_name>

Verification Steps

  1. SSH Access:

    ssh -J grokbox ansible@<VM_IP>
    
  2. Cloud-Init Status:

    cloud-init status --wait
    cloud-init status --long
    
  3. System Information:

    cat /etc/os-release
    uname -r
    
  4. Security Checks:

    # Firewall
    sudo ufw status verbose  # Debian/Ubuntu
    sudo firewall-cmd --list-all  # RHEL/SUSE
    
    # SELinux (RHEL family)
    getenforce
    
    # Audit daemon
    sudo systemctl status auditd
    
    # Automatic updates
    sudo systemctl status unattended-upgrades  # Debian/Ubuntu
    sudo systemctl status dnf-automatic.timer  # RHEL
    
  5. Disk and Memory:

    df -h
    free -h
    lsblk
    

Troubleshooting

Distribution Selection Issues

Problem: Invalid distribution error

Solution:

# List supported distributions
grep "^      " plays/deploy-linux-vm.yml | grep -E "debian-|ubuntu-|rhel-|centos-|rocky-|alma|sles-|opensuse"

# Use exact distribution identifier
-e "os_distribution=debian-12"  # Correct
-e "os_distribution=debian"     # Wrong

Cloud Image Download Failures

Problem: Image download fails or times out

Causes:

  • Network connectivity issues
  • Repository temporarily unavailable
  • Proxy configuration needed

Solutions:

# Test connectivity
curl -I https://cloud.debian.org/
curl -I https://cloud-images.ubuntu.com/

# Manual download
cd /var/lib/libvirt/images
wget <cloud_image_url>

# Configure proxy (if needed)
export https_proxy=http://proxy:port

Checksum Verification Failures

Problem: Checksum verification fails

Causes:

  • Corrupt download
  • Mismatch between image and checksum file
  • Wrong checksum type

Solutions:

# Re-download image
rm /var/lib/libvirt/images/<image-name>
ansible-playbook plays/deploy-linux-vm.yml -e "os_distribution=..." -t download

# Verify manually
cd /var/lib/libvirt/images
sha256sum <image-name>
# Compare with checksum file

VM Boot Issues

Problem: VM created but won't boot or get IP

Causes:

  • Cloud-init configuration error
  • Network misconfiguration
  • Insufficient resources

Solutions:

# Check VM status
virsh list --all
virsh dominfo <vm_name>

# View console
virsh console <vm_name>

# Check cloud-init logs (via console)
tail -f /var/log/cloud-init-output.log
journalctl -u cloud-init

# Restart VM
virsh destroy <vm_name>
virsh start <vm_name>

SSH Connection Issues

Problem: Cannot SSH to deployed VM

Causes:

  • SSH key not configured correctly
  • Firewall blocking
  • cloud-init not completed
  • Wrong IP address

Solutions:

# Verify IP address
virsh domifaddr <vm_name>

# Test connectivity
ping <VM_IP>

# Check SSH service via console
virsh console <vm_name>
# Then: systemctl status ssh|sshd

# Verify firewall
# Via console:
sudo ufw status  # Debian/Ubuntu
sudo firewall-cmd --list-all  # RHEL/SUSE

# Check cloud-init completion
# Via console:
cloud-init status --wait

SELinux Issues (RHEL Family)

Problem: Services failing due to SELinux denials

Solutions:

# Check SELinux status
getenforce
sestatus

# View denials
sudo ausearch -m avc -ts recent

# Temporarily set to permissive (troubleshooting only)
sudo setenforce 0

# Generate policy from denials
sudo ausearch -m avc -ts recent | audit2allow -M myapp
sudo semodule -i myapp.pp

# Re-enable enforcing
sudo setenforce 1

Package Manager Issues

Debian/Ubuntu:

# Update package cache
sudo apt update

# Fix broken packages
sudo apt --fix-broken install

# Clear cache
sudo apt clean

RHEL Family:

# Update metadata
sudo dnf makecache

# Check for problems
sudo dnf check

# Clean cache
sudo dnf clean all

SUSE:

# Refresh repositories
sudo zypper refresh

# Verify
sudo zypper verify

# Clean cache
sudo zypper clean

Best Practices

Distribution Selection

  1. Use LTS versions for production:

    • Ubuntu 22.04 LTS (support until 2027)
    • Ubuntu 24.04 LTS (support until 2029)
    • RHEL/Rocky/Alma 9 (support until 2032)
  2. Match distribution to workload:

    • Web servers: Ubuntu, Debian
    • Enterprise applications: RHEL, Rocky Linux, AlmaLinux
    • Container hosts: CentOS Stream, Rocky Linux
    • Development: Ubuntu, Debian, openSUSE
  3. Consider support requirements:

    • Commercial support: RHEL, SLES
    • Community support: CentOS Stream, Rocky Linux, AlmaLinux, Debian, Ubuntu, openSUSE

Resource Allocation

Minimum Requirements:

  • 1 vCPU, 1GB RAM, 10GB disk (testing only)

Recommended for Production:

  • 2+ vCPUs, 2GB+ RAM, 20GB+ disk

Workload-Specific:

Web Server:        2-4 vCPUs,  4GB RAM,  40GB disk
Database Server:   4-8 vCPUs, 16GB RAM, 100GB+ disk
Application Server: 4-8 vCPUs,  8GB RAM,  80GB disk
Container Host:    4-8 vCPUs, 16GB RAM,  80GB disk
Development:       2-4 vCPUs,  8GB RAM,  50GB disk

Security Hardening

  1. Change default passwords immediately:

    sudo passwd root  # Change from ChangeMe123!
    
  2. Configure proper SSH keys:

    • Use dedicated key per environment
    • Rotate keys regularly (90-180 days)
    • Use Ed25519 keys when possible
  3. Enable additional security features:

    • CIS benchmarks scanning
    • Intrusion detection (fail2ban, OSSEC)
    • Log forwarding to SIEM
    • Vulnerability scanning
  4. Regular updates:

    • Monitor automatic update logs
    • Schedule manual updates for major versions
    • Test updates in staging first

Operational Excellence

  1. Naming Conventions:

    • Use descriptive, meaningful VM names
    • Include purpose and environment: web-prod-01, db-dev-01
    • Document naming scheme
  2. Inventory Management:

    • Keep Ansible inventory up-to-date
    • Document VM purpose and owner
    • Track VM lifecycle
  3. Monitoring:

    • Set up monitoring for all VMs
    • Configure alerting for critical issues
    • Monitor resource usage trends
  4. Backup Strategy:

    • Regular VM backups or disk snapshots
    • Test restore procedures
    • Document backup retention policy
  5. Documentation:

    • Document VM purpose and configuration
    • Maintain runbooks for common tasks
    • Keep network diagrams current

Performance Optimization

  1. Disk I/O:

    • Use virtio drivers (already configured)
    • Consider separate disk for databases
    • Use appropriate filesystem (xfs for RHEL, ext4 for Debian)
  2. Network:

    • Use virtio network driver (already configured)
    • Consider SR-IOV for high-performance needs
    • Monitor network latency
  3. CPU:

    • Right-size vCPU allocation
    • Avoid overcommitment on critical VMs
    • Use CPU pinning for performance-critical workloads
  4. Memory:

    • Allocate sufficient RAM to avoid swapping
    • Monitor memory usage
    • Consider huge pages for databases

References


Document Version: 1.0 Last Updated: 2025-11-10 Maintained By: Ansible Infrastructure Team