diff --git a/docs/debian12-vm-deployment.md b/docs/debian12-vm-deployment.md new file mode 100644 index 0000000..7ace82d --- /dev/null +++ b/docs/debian12-vm-deployment.md @@ -0,0 +1,728 @@ +# Debian 12 VM Deployment Documentation + +## Overview + +This document describes the automated deployment process for Debian 12 virtual machines on the grokbox KVM/libvirt hypervisor. The deployment uses cloud-init for unattended configuration and follows the security-first principles outlined in CLAUDE.md. + +## Table of Contents + +1. [Architecture](#architecture) +2. [Prerequisites](#prerequisites) +3. [Deployment Process](#deployment-process) +4. [Configuration](#configuration) +5. [Security Features](#security-features) +6. [Post-Deployment](#post-deployment) +7. [Troubleshooting](#troubleshooting) +8. [Maintenance](#maintenance) + +## Architecture + +### Infrastructure Components + +``` +┌─────────────────────────────────────────────┐ +│ grokbox (KVM Hypervisor) │ +│ │ +│ ┌──────────────────────────────────────┐ │ +│ │ libvirt/QEMU │ │ +│ │ │ │ +│ │ ┌────────────────────────────────┐ │ │ +│ │ │ Debian 12 Guest VM │ │ │ +│ │ │ │ │ │ +│ │ │ - 2 vCPUs / 2GB RAM │ │ │ +│ │ │ - 20GB qcow2 disk │ │ │ +│ │ │ - cloud-init configured │ │ │ +│ │ │ - ansible user ready │ │ │ +│ │ │ - Security hardened │ │ │ +│ │ └────────────────────────────────┘ │ │ +│ │ │ │ +│ │ Network: virbr0 (192.168.122.0/24) │ │ +│ └──────────────────────────────────────┘ │ +└─────────────────────────────────────────────┘ +``` + +### Deployment Workflow + +``` +[Ansible Control Node] + │ + │ 1. SSH to grokbox + ▼ + [grokbox hypervisor] + │ + │ 2. Download Debian cloud image + ├─ 3. Verify checksums + ├─ 4. Create VM disk (qcow2) + ├─ 5. Generate cloud-init ISO + ├─ 6. Create VM with virt-install + │ + │ 7. VM boots with cloud-init + ▼ + [Debian 12 VM] + │ + ├─ 8. Create ansible user + ├─ 9. Configure SSH + ├─ 10. Install packages + ├─ 11. Security hardening + └─ 12. System ready +``` + +## Prerequisites + +### Hypervisor Requirements + +On **grokbox**, ensure the following are present: + +1. **Virtualization Support** + ```bash + # Verify CPU virtualization + egrep -c '(vmx|svm)' /proc/cpuinfo # Should be > 0 + + # Verify KVM module loaded + lsmod | grep kvm + ``` + +2. **Required Packages** + - libvirt-daemon-system + - libvirt-clients + - virtinst + - qemu-kvm + - qemu-utils + - cloud-image-utils + - genisoimage + - python3-libvirt + +3. **Sufficient Resources** + - Storage: ~25GB available in `/var/lib/libvirt/images/` + - Memory: Enough free RAM for VM allocation + - Network: libvirt default network configured + +4. **libvirtd Service Running** + ```bash + systemctl status libvirtd + ``` + +### Ansible Control Node Requirements + +1. Ansible 2.9 or newer +2. SSH access to grokbox hypervisor +3. SSH key configured for grok user +4. Python 3.x installed + +### Network Requirements + +- Connectivity to Debian cloud image repository +- DNS resolution working +- Default libvirt network (virbr0) configured and active + +## Deployment Process + +### Step 1: Pre-flight Checks + +The playbook performs the following validations: + +- **VM Name Uniqueness**: Ensures no VM with the same name exists +- **Virtualization Support**: Validates QEMU/KVM capabilities +- **Package Installation**: Installs required tools if missing +- **Service Status**: Verifies libvirtd is running + +### Step 2: Image Management + +#### Download Debian Cloud Image + +- **Source**: https://cloud.debian.org/images/cloud/bookworm/latest/ +- **Image**: debian-12-generic-amd64.qcow2 +- **Cache Location**: `/var/lib/libvirt/images/debian-12-generic-amd64.qcow2` +- **Checksum Verification**: SHA512SUMS validated + +The base image is downloaded once and cached for subsequent deployments. + +#### Create VM Disk + +A new copy-on-write (CoW) disk is created using qemu-img: + +```bash +qemu-img create -f qcow2 \ + -F qcow2 \ + -b /var/lib/libvirt/images/debian-12-generic-amd64.qcow2 \ + /var/lib/libvirt/images/debian12-guest.qcow2 \ + 20G +``` + +This creates a thin-provisioned disk backed by the cloud image. + +### Step 3: Cloud-Init Configuration + +Two configuration files are generated: + +#### meta-data +```yaml +instance-id: debian12-guest +local-hostname: debian12 +``` + +#### user-data +Comprehensive cloud-init configuration including: + +- **User Management**: Creates ansible user with SSH keys +- **Security Configuration**: SSH hardening, firewall setup +- **Package Installation**: Essential and security packages +- **System Configuration**: Time sync, locale, timezone +- **Automatic Updates**: Unattended security upgrades + +#### ISO Generation + +The configuration files are packaged into a bootable ISO: + +```bash +genisoimage -output debian12-guest-cloud-init.iso \ + -volid cidata -joliet -rock \ + user-data meta-data +``` + +### Step 4: VM Creation + +VM is created using virt-install: + +```bash +virt-install \ + --name debian12-guest \ + --memory 2048 \ + --vcpus 2 \ + --disk path=/var/lib/libvirt/images/debian12-guest.qcow2,format=qcow2,bus=virtio \ + --disk path=/var/lib/libvirt/images/debian12-guest-cloud-init.iso,device=cdrom \ + --network network=default,model=virtio \ + --os-variant debian11 \ + --graphics none \ + --console pty,target_type=serial \ + --import \ + --noautoconsole +``` + +### Step 5: Boot and Initialization + +1. **VM Boots**: Starts from the qcow2 disk +2. **Cloud-Init Runs**: Reads configuration from ISO +3. **System Configuration**: Applies all settings +4. **Network Configuration**: Obtains IP via DHCP +5. **Package Updates**: Downloads and installs updates +6. **Service Initialization**: Starts all configured services + +**Typical boot time**: 60-90 seconds + +### Step 6: Validation + +The playbook validates: + +- VM is running and accessible +- IP address assigned +- SSH port (22) accepting connections +- cloud-init completed successfully +- System resources available + +### Step 7: Post-Deployment Configuration + +Optional second play that: + +- Waits for cloud-init completion +- Gathers system facts +- Displays system information +- Validates disk and memory usage + +## Configuration + +### Default Configuration + +```yaml +# VM Specifications +vm_name: "debian12-guest" +vm_hostname: "debian12" +vm_domain: "localdomain" +vm_vcpus: 2 +vm_memory_mb: 2048 +vm_disk_size_gb: 20 + +# Network +vm_network: "default" +vm_bridge: "virbr0" + +# Storage +vm_disk_path: "/var/lib/libvirt/images/{{ vm_name }}.qcow2" +cloud_init_iso_path: "/var/lib/libvirt/images/{{ vm_name }}-cloud-init.iso" +``` + +### Customization Examples + +#### High-Performance VM + +```bash +ansible-playbook plays/deploy-debian12-vm.yml \ + -e "vm_name=app-server" \ + -e "vm_vcpus=8" \ + -e "vm_memory_mb=16384" \ + -e "vm_disk_size_gb=100" +``` + +#### Development VM + +```bash +ansible-playbook plays/deploy-debian12-vm.yml \ + -e "vm_name=dev-workstation" \ + -e "vm_vcpus=4" \ + -e "vm_memory_mb=8192" \ + -e "vm_disk_size_gb=50" \ + -e "vm_hostname=devbox" \ + -e "vm_domain=dev.local" +``` + +#### Custom SSH Key + +```bash +ansible-playbook plays/deploy-debian12-vm.yml \ + -e "vm_name=secure-vm" \ + -e "ansible_user_ssh_key='ssh-ed25519 AAAA...'" +``` + +### Variable Precedence + +Variables can be set in order of precedence: + +1. **Command-line** (`-e` flag) - Highest +2. **Playbook vars section** +3. **Inventory host_vars** +4. **Inventory group_vars** +5. **Defaults in playbook** - Lowest + +## Security Features + +### User Management + +- **ansible user**: Non-root service account + - Passwordless sudo access + - SSH key authentication only + - Member of sudo group + - Home directory: `/home/ansible` + +- **root user**: Console access only + - SSH login disabled + - Password set for emergency console access + - Remote access blocked + +### SSH Hardening + +Configuration in `/etc/ssh/sshd_config.d/99-security.conf`: + +``` +PermitRootLogin no +PasswordAuthentication no +PubkeyAuthentication yes +MaxAuthTries 3 +MaxSessions 10 +ClientAliveInterval 300 +ClientAliveCountMax 2 +``` + +### Firewall Configuration + +- **UFW (Uncomplicated Firewall)** enabled by default +- Default policy: deny incoming, allow outgoing +- SSH (port 22) allowed +- Additional rules can be added post-deployment + +### Automatic Security Updates + +Unattended-upgrades configured for: + +- Automatic installation of security updates +- Daily update checks +- Automatic cleanup of old kernels +- Email notifications (if configured) +- **No automatic reboot** (requires manual intervention) + +### Audit and Monitoring + +- **auditd**: System call auditing enabled +- **aide**: File integrity monitoring installed +- **chrony**: Time synchronization configured +- **Logging**: All cloud-init output logged + +### Compliance Features + +Aligned with CLAUDE.md security requirements: + +- ✅ Principle of least privilege +- ✅ Encryption in transit (SSH) +- ✅ Key-based authentication +- ✅ Automated security updates +- ✅ System auditing enabled +- ✅ Time synchronization +- ✅ Firewall enabled by default + +## Post-Deployment + +### Adding to Inventory + +Update your Ansible inventory: + +```yaml +# inventories/development/hosts.yml +kvm_guests: + children: + application_servers: + hosts: + debian12-guest: + ansible_host: 192.168.122.X + ansible_user: ansible + ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new' + ansible_python_interpreter: /usr/bin/python3 + host_description: "Application Server - Debian 12" + host_role: application + host_type: virtual_machine + hypervisor: grokbox + vm_vcpus: 2 + vm_memory_mb: 2048 + autostart: true +``` + +### Initial Access + +```bash +# Get VM IP address +ssh grokbox "virsh domifaddr debian12-guest" + +# SSH to VM via ProxyJump +ssh -J grokbox ansible@192.168.122.X + +# Or add to ~/.ssh/config +Host debian12-guest + HostName 192.168.122.X + User ansible + ProxyJump grokbox + StrictHostKeyChecking accept-new +``` + +### Configuration Management + +Run additional roles or playbooks: + +```bash +# Example: Configure web server +ansible-playbook -i inventories/development/hosts.yml \ + playbooks/configure-webserver.yml \ + -l debian12-guest + +# Example: Security hardening +ansible-playbook -i inventories/development/hosts.yml \ + playbooks/security-hardening.yml \ + -l debian12-guest +``` + +### VM Management Commands + +```bash +# Start VM +virsh start debian12-guest + +# Shutdown VM gracefully +virsh shutdown debian12-guest + +# Force shutdown +virsh destroy debian12-guest + +# Reboot VM +virsh reboot debian12-guest + +# Enable autostart +virsh autostart debian12-guest + +# Disable autostart +virsh autostart debian12-guest --disable + +# VM status +virsh dominfo debian12-guest + +# VM resource usage +virsh domstats debian12-guest + +# Console access +virsh console debian12-guest +``` + +## Troubleshooting + +### Common Issues + +#### 1. VM Already Exists + +**Error**: VM with name already exists + +**Solution**: +```bash +# Check existing VMs +virsh list --all + +# Remove existing VM +virsh destroy debian12-guest # if running +virsh undefine debian12-guest --remove-all-storage +``` + +#### 2. Image Download Fails + +**Error**: Failed to download cloud image + +**Causes**: +- Network connectivity issues +- Proxy configuration +- DNS resolution problems + +**Solution**: +```bash +# Test connectivity +curl -I https://cloud.debian.org + +# Manual download +cd /var/lib/libvirt/images +wget https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2 + +# Re-run playbook +ansible-playbook plays/deploy-debian12-vm.yml -t deploy +``` + +#### 3. VM Won't Get IP Address + +**Error**: IP address not assigned after 10 retries + +**Causes**: +- DHCP server not running +- Network misconfiguration +- VM network interface issues + +**Solution**: +```bash +# Check libvirt network +virsh net-list --all +virsh net-info default +virsh net-start default # if not started + +# Check VM network interface +virsh domiflist debian12-guest + +# Check DHCP leases +virsh net-dhcp-leases default + +# Access console to troubleshoot +virsh console debian12-guest +# Check: ip addr, systemctl status networking +``` + +#### 4. SSH Connection Failed + +**Error**: SSH connection timeout or refused + +**Causes**: +- SSH service not started +- Firewall blocking +- Wrong IP address +- cloud-init not completed + +**Solution**: +```bash +# Verify VM is running +virsh list + +# Check cloud-init status via console +virsh console debian12-guest +# Run: cloud-init status --wait + +# Check SSH service +# Via console: systemctl status ssh + +# Check firewall +# Via console: ufw status + +# Verify SSH key +ssh-add -l +``` + +#### 5. Insufficient Resources + +**Error**: Failed to allocate memory or storage + +**Solution**: +```bash +# Check available resources +free -h +df -h /var/lib/libvirt/images/ + +# Adjust VM resources +ansible-playbook plays/deploy-debian12-vm.yml \ + -e "vm_memory_mb=1024" \ + -e "vm_disk_size_gb=10" +``` + +### Debug Mode + +Enable verbose logging: + +```bash +# Ansible verbose mode +ansible-playbook plays/deploy-debian12-vm.yml -vvv + +# Check cloud-init logs on VM +virsh console debian12-guest +# Then: tail -f /var/log/cloud-init-output.log + +# Check libvirt logs +journalctl -u libvirtd -f +``` + +### Health Checks + +```bash +# Verify VM health +virsh dominfo debian12-guest +virsh domstats debian12-guest + +# Network connectivity +ping $(virsh domifaddr debian12-guest | grep -oP '(\d{1,3}\.){3}\d{1,3}' | head -1) + +# SSH connectivity +ssh -J grokbox ansible@$(virsh domifaddr debian12-guest | grep -oP '(\d{1,3}\.){3}\d{1,3}' | head -1) "echo 'VM is accessible'" +``` + +## Maintenance + +### Updating the Base Image + +Periodically update the cached Debian cloud image: + +```bash +# Remove old image +ssh grokbox "rm /var/lib/libvirt/images/debian-12-generic-amd64.qcow2" + +# Download latest +ansible-playbook plays/deploy-debian12-vm.yml -t download,verify +``` + +### VM Snapshots + +Create snapshots before major changes: + +```bash +# Create snapshot +virsh snapshot-create-as debian12-guest \ + snapshot1 \ + "Before application deployment" + +# List snapshots +virsh snapshot-list debian12-guest + +# Revert to snapshot +virsh snapshot-revert debian12-guest snapshot1 + +# Delete snapshot +virsh snapshot-delete debian12-guest snapshot1 +``` + +### Backup and Restore + +#### Backup VM + +```bash +# Stop VM +virsh shutdown debian12-guest + +# Backup disk +cp /var/lib/libvirt/images/debian12-guest.qcow2 \ + /backup/debian12-guest-$(date +%Y%m%d).qcow2 + +# Backup XML config +virsh dumpxml debian12-guest > /backup/debian12-guest.xml + +# Start VM +virsh start debian12-guest +``` + +#### Restore VM + +```bash +# Copy disk back +cp /backup/debian12-guest-20241110.qcow2 \ + /var/lib/libvirt/images/debian12-guest.qcow2 + +# Define VM from XML +virsh define /backup/debian12-guest.xml + +# Start VM +virsh start debian12-guest +``` + +### Resize VM Disk + +```bash +# Shutdown VM +virsh shutdown debian12-guest + +# Resize disk +qemu-img resize /var/lib/libvirt/images/debian12-guest.qcow2 +10G + +# Start VM +virsh start debian12-guest + +# On VM: resize partition and filesystem +growpart /dev/vda 1 +resize2fs /dev/vda1 +``` + +### Resource Adjustment + +Modify VM resources: + +```bash +# Set maximum memory (requires shutdown) +virsh setmaxmem debian12-guest 4194304 --config + +# Set current memory (can be done live) +virsh setmem debian12-guest 4194304 + +# Set vCPUs (requires shutdown) +virsh setvcpus debian12-guest 4 --config --maximum +virsh setvcpus debian12-guest 4 --config +``` + +## Best Practices + +1. **Naming Convention**: Use descriptive VM names indicating purpose +2. **Resource Planning**: Right-size VMs to avoid waste +3. **Documentation**: Document VM purpose and configuration +4. **Monitoring**: Set up monitoring for critical VMs +5. **Backups**: Regular backups of important VMs +6. **Updates**: Keep VMs updated with security patches +7. **Inventory**: Maintain accurate Ansible inventory +8. **Tags**: Use libvirt tags for organization +9. **Networking**: Use appropriate network isolation +10. **Testing**: Test deployment process in development first + +## References + +- [CLAUDE.md](../CLAUDE.md) - Infrastructure guidelines +- [Cheatsheet](../cheatsheets/deploy-debian12-vm.md) - Quick reference +- [Debian Cloud Images](https://cloud.debian.org/images/cloud/) +- [cloud-init Documentation](https://cloudinit.readthedocs.io/) +- [libvirt Documentation](https://libvirt.org/docs.html) +- [virt-install man page](https://linux.die.net/man/1/virt-install) + +## Support and Contact + +For issues or questions: + +1. Check troubleshooting section above +2. Review cloud-init logs: `/var/log/cloud-init.log` +3. Review libvirt logs: `journalctl -u libvirtd` +4. Consult Ansible playbook: `plays/deploy-debian12-vm.yml` + +--- + +**Document Version**: 1.0 +**Last Updated**: 2025-11-10 +**Maintained By**: Ansible Infrastructure Team diff --git a/docs/inventory.md b/docs/inventory.md new file mode 100644 index 0000000..b1a911b --- /dev/null +++ b/docs/inventory.md @@ -0,0 +1,516 @@ +# Ansible Inventory Configuration + +This document describes the dynamic and static inventory configurations for the Ansible infrastructure. + +## Table of Contents + +1. [Overview](#overview) +2. [Inventory Structure](#inventory-structure) +3. [Dynamic Inventory Solutions](#dynamic-inventory-solutions) +4. [Static/Hybrid Inventory](#statichybrid-inventory) +5. [Usage Examples](#usage-examples) +6. [Troubleshooting](#troubleshooting) + +--- + +## Overview + +Per the CLAUDE.md guidelines, this infrastructure uses **dynamic inventories** as the primary inventory source, with static inventories permitted for development environments only. + +### Available Inventory Solutions + +| Solution | Type | Use Case | Status | +|----------|------|----------|--------| +| SSH Config Parser | Dynamic | Quick discovery from SSH config | ✅ Active | +| Libvirt/KVM Plugin | Dynamic | Real-time VM discovery | ✅ Active | +| Static YAML | Static/Hybrid | Development environment | ✅ Active | + +--- + +## Inventory Structure + +``` +inventories/ +├── production/ # Production environment (dynamic only) +│ ├── group_vars/ +│ │ └── all.yml +│ └── [dynamic inventory configs] +├── staging/ # Staging environment (dynamic only) +│ ├── group_vars/ +│ │ └── all.yml +│ └── [dynamic inventory configs] +└── development/ # Development environment + ├── hosts.yml # Static/hybrid inventory + ├── libvirt_kvm.yml # Libvirt dynamic config + ├── group_vars/ + │ ├── all.yml + │ ├── kvm_guests.yml + │ └── hypervisors.yml + └── host_vars/ +``` + +--- + +## Dynamic Inventory Solutions + +### 1. SSH Config Parser (`ssh_config_inventory.py`) + +**Location:** `/opt/ansible/plugins/inventory/ssh_config_inventory.py` + +#### Description +Parses `~/.ssh/config` to automatically generate Ansible inventory with proper grouping and connection parameters. + +#### Features +- Automatic host discovery from SSH config +- Intelligent grouping by host characteristics +- ProxyJump support for nested VM access +- No external dependencies (pure Python) + +#### Usage + +```bash +# List all hosts and groups +python3 plugins/inventory/ssh_config_inventory.py --list + +# Get variables for specific host +python3 plugins/inventory/ssh_config_inventory.py --host pihole + +# Use with ansible commands +ansible all -i plugins/inventory/ssh_config_inventory.py --list-hosts + +# Use with playbooks +ansible-playbook -i plugins/inventory/ssh_config_inventory.py site.yml +``` + +#### Host Categorization Logic + +| Category | Criteria | +|----------|----------| +| `external_hosts` | Public IPs, no ProxyJump, non-ansible user | +| `hypervisors` | ForwardAgent enabled, specific users (grok) | +| `dns_servers` | ansible user + ProxyJump + hostname contains 'pihole'/'dns' | +| `mail_servers` | ansible user + ProxyJump + hostname contains 'mail'/'mx' | +| `development` | ansible user + ProxyJump + hostname contains 'dev'/'test'/'derp' | + +#### Generated Inventory Structure + +```json +{ + "all": { + "children": ["external_hosts", "hypervisors", "kvm_guests"] + }, + "external_hosts": { + "hosts": ["odin"] + }, + "hypervisors": { + "hosts": ["grokbox"] + }, + "kvm_guests": { + "children": ["dns_servers", "mail_servers", "development", "uncategorized"], + "vars": { + "ansible_user": "ansible", + "ansible_ssh_common_args": "-o StrictHostKeyChecking=accept-new" + } + }, + "_meta": { + "hostvars": { ... } + } +} +``` + +--- + +### 2. Libvirt/KVM Dynamic Inventory (`libvirt_kvm.py`) + +**Location:** `/opt/ansible/plugins/inventory/libvirt_kvm.py` + +#### Description +Queries libvirt hypervisors directly to discover KVM guest VMs in real-time, including their state, resources, and network configuration. + +#### Features +- Real-time VM discovery via libvirt API +- VM state detection (running, stopped, paused) +- Automatic IP address detection +- Resource information (vCPUs, memory, networks) +- Multiple hypervisor support +- ProxyJump configuration + +#### Requirements + +```bash +# Debian/Ubuntu +apt-get install python3-libvirt + +# RHEL/Fedora/Rocky +dnf install python3-libvirt +``` + +#### Configuration + +Set environment variables or use configuration file: + +```bash +# Environment variables +export LIBVIRT_DEFAULT_URI="qemu+ssh://grok@grok.home.serneels.xyz/system" +export LIBVIRT_HYPERVISOR_NAME="grokbox" +``` + +Or use YAML configuration file: `inventories/development/libvirt_kvm.yml` + +#### Usage + +```bash +# List all VMs +python3 plugins/inventory/libvirt_kvm.py --list + +# Get specific VM details +python3 plugins/inventory/libvirt_kvm.py --host mymx + +# Use with ansible +ansible running_vms -i plugins/inventory/libvirt_kvm.py -m ping + +# Use with playbooks +ansible-playbook -i plugins/inventory/libvirt_kvm.py playbooks/update.yml +``` + +#### Generated Groups + +| Group | Description | +|-------|-------------| +| `hypervisors` | KVM hypervisor hosts | +| `kvm_guests` | All guest VMs | +| `running_vms` | VMs in running state | +| `stopped_vms` | VMs not running (shutoff, paused, etc.) | + +#### Host Variables + +Each VM includes: +- `vm_name`: VM hostname +- `vm_uuid`: Libvirt UUID +- `vm_state`: Current state (running, shutoff, etc.) +- `vm_vcpus`: Number of virtual CPUs +- `vm_memory_mb`: Memory allocation in MB +- `vm_networks`: Network interface details +- `ansible_host`: IP address (if available) +- `ansible_ssh_common_args`: ProxyJump configuration +- `hypervisor`: Parent hypervisor name + +#### Example Output + +```json +{ + "running_vms": { + "hosts": ["mymx", "pihole", "derp"] + }, + "_meta": { + "hostvars": { + "pihole": { + "vm_name": "pihole", + "vm_uuid": "6d714c93-16fb-41c8-8ef8-9001f9066b3a", + "vm_state": "running", + "vm_vcpus": 2, + "vm_memory_mb": 2048, + "ansible_host": "192.168.122.12", + "ansible_ssh_common_args": "-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new", + "hypervisor": "grokbox" + } + } + } +} +``` + +--- + +## Static/Hybrid Inventory + +**Location:** `/opt/ansible/inventories/development/hosts.yml` + +### Description +Manually maintained static inventory for the development environment with detailed host metadata and configuration. + +### Structure + +```yaml +all: + children: + external_hosts: # Public-facing hosts + hypervisors: # KVM hypervisor hosts + kvm_guests: # Virtual machine guests + children: + dns_servers: + mail_servers: + development: + uncategorized: +``` + +### Group Variables + +Variables are defined in `group_vars/` directory: + +- **`all.yml`**: Global variables for all hosts +- **`kvm_guests.yml`**: Common VM configuration (LVM, networking, ProxyJump) +- **`hypervisors.yml`**: Hypervisor-specific settings (libvirt, QEMU) + +### Host Variables + +Host-specific variables can be placed in `host_vars/` directory: + +``` +host_vars/ +├── pihole.yml +├── mymx.yml +└── derp.yml +``` + +### Usage + +```bash +# List hosts +ansible all -i inventories/development/hosts.yml --list-hosts + +# Run playbook +ansible-playbook -i inventories/development/hosts.yml site.yml + +# Target specific group +ansible dns_servers -i inventories/development/hosts.yml -m ping +``` + +--- + +## Usage Examples + +### Example 1: List All Hosts (Dynamic) + +```bash +# Using SSH config parser +ansible all -i plugins/inventory/ssh_config_inventory.py --list-hosts + +# Using libvirt inventory +ansible all -i plugins/inventory/libvirt_kvm.py --list-hosts +``` + +### Example 2: Ping All Running VMs + +```bash +ansible running_vms -i plugins/inventory/libvirt_kvm.py -m ping +``` + +### Example 3: Run Playbook Against KVM Guests + +```bash +ansible-playbook -i inventories/development/hosts.yml \ + --limit kvm_guests \ + playbooks/system-update.yml +``` + +### Example 4: Check Host Variables + +```bash +# Using dynamic inventory +ansible-inventory -i plugins/inventory/libvirt_kvm.py --host pihole + +# Using static inventory +ansible-inventory -i inventories/development/hosts.yml --host pihole --yaml +``` + +### Example 5: Multiple Inventory Sources + +You can combine multiple inventory sources: + +```bash +ansible-playbook -i inventories/development/hosts.yml \ + -i plugins/inventory/libvirt_kvm.py \ + site.yml +``` + +### Example 6: Filter by Group + +```bash +# Target only mail servers +ansible mail_servers -i plugins/inventory/ssh_config_inventory.py -m setup + +# Target only hypervisors +ansible hypervisors -i inventories/development/hosts.yml -m shell -a "virsh list --all" +``` + +--- + +## Ansible Configuration + +Configure default inventory in `ansible.cfg`: + +```ini +[defaults] +inventory = ./inventories/development/hosts.yml +# Or use dynamic: +# inventory = ./plugins/inventory/libvirt_kvm.py + +# Enable multiple inventory sources +# inventory = ./inventories/development/hosts.yml,./plugins/inventory/libvirt_kvm.py + +# Inventory plugins path +inventory_plugins = ./plugins/inventory + +# Enable fact caching for performance +gathering = smart +fact_caching = jsonfile +fact_caching_connection = /tmp/ansible_facts +fact_caching_timeout = 86400 +``` + +--- + +## Troubleshooting + +### SSH Config Parser Issues + +**Problem:** Hosts not appearing in inventory + +**Solution:** +- Check `~/.ssh/config` exists and is readable +- Verify Host declarations are properly formatted +- Run with `--list` to see parsed output +- Check for Python syntax errors + +**Problem:** Incorrect host categorization + +**Solution:** +- Review categorization logic in `_categorize_host()` method +- Add custom categorization rules +- Use static inventory for specific grouping needs + +### Libvirt Inventory Issues + +**Problem:** `python3-libvirt` not installed + +**Solution:** +```bash +# Debian/Ubuntu +sudo apt-get install python3-libvirt + +# RHEL/Rocky/Fedora +sudo dnf install python3-libvirt +``` + +**Problem:** Connection to hypervisor fails + +**Solution:** +- Verify SSH access to hypervisor: `ssh grok@grok.home.serneels.xyz` +- Check libvirt URI: `virsh -c qemu+ssh://grok@grok.home.serneels.xyz/system list` +- Ensure SSH keys are properly configured +- Check SSH agent forwarding if needed + +**Problem:** VMs discovered but no IP addresses + +**Solution:** +- VMs may not have DHCP leases yet +- Check VM is fully booted: `virsh dominfo ` +- Manually query: `virsh domifaddr ` +- Use static inventory with known IP addresses + +### Static Inventory Issues + +**Problem:** YAML syntax errors + +**Solution:** +- Validate YAML syntax: `yamllint inventories/development/hosts.yml` +- Check indentation (use 2 spaces) +- Verify with: `ansible-inventory -i inventories/development/hosts.yml --list` + +**Problem:** Variables not being applied + +**Solution:** +- Check variable precedence order +- Verify `group_vars/` and `host_vars/` file names match group/host names +- Use `ansible-inventory --host ` to debug variable merging + +### General Debugging + +```bash +# Verify inventory parsing +ansible-inventory -i --list + +# Check specific host variables +ansible-inventory -i --host + +# Graph inventory structure +ansible-inventory -i --graph + +# Test connectivity +ansible all -i -m ping -vvv + +# Dry run playbook +ansible-playbook -i site.yml --check --diff +``` + +--- + +## Security Considerations + +### SSH Config Parser +- ✅ No credentials stored in inventory +- ✅ Uses existing SSH configuration +- ⚠️ Ensure `~/.ssh/config` has proper permissions (600) + +### Libvirt Inventory +- ✅ Uses SSH key authentication +- ✅ No passwords in configuration +- ⚠️ Requires SSH access to hypervisor +- ⚠️ Libvirt connection string may be logged + +### Static Inventory +- ✅ Version controlled and auditable +- ⚠️ Use Ansible Vault for sensitive variables +- ⚠️ Never commit unencrypted credentials + +### Best Practices +- Use Ansible Vault for secrets: `ansible-vault encrypt group_vars/all/vault.yml` +- Rotate SSH keys regularly (90-180 days per CLAUDE.md) +- Use ProxyJump/bastion hosts for nested VM access +- Enable SSH ControlMaster for connection reuse +- Implement inventory caching for large infrastructures + +--- + +## Performance Optimization + +### Caching +Enable fact caching in `ansible.cfg`: +```ini +[defaults] +gathering = smart +fact_caching = jsonfile +fact_caching_connection = /tmp/ansible_facts +fact_caching_timeout = 86400 +``` + +### Parallelism +Adjust fork count: +```ini +[defaults] +forks = 20 +``` + +### SSH Connection Reuse +Configure ControlMaster in `~/.ssh/config`: +``` +Host * + ControlMaster auto + ControlPath ~/.ssh/sockets/%r@%h-%p + ControlPersist 600s +``` + +--- + +## References + +- [Ansible Dynamic Inventory](https://docs.ansible.com/ansible/latest/user_guide/intro_dynamic_inventory.html) +- [Libvirt Python API](https://libvirt.org/python.html) +- [SSH Config Documentation](https://man.openbsd.org/ssh_config) +- [CLAUDE.md Guidelines](/opt/ansible/CLAUDE.md) + +--- + +**Document Version:** 1.0.0 +**Last Updated:** 2025-11-10 +**Maintainer:** Ansible Infrastructure Team diff --git a/docs/linux-vm-deployment.md b/docs/linux-vm-deployment.md new file mode 100644 index 0000000..acaffeb --- /dev/null +++ b/docs/linux-vm-deployment.md @@ -0,0 +1,944 @@ +# Multi-Distribution Linux VM Deployment Documentation + +## Overview + +This document describes the automated deployment process for multiple Linux distributions on KVM/libvirt hypervisors. The deployment supports major server distributions including Debian, Ubuntu, RHEL, CentOS Stream, Rocky Linux, AlmaLinux, SLES, and openSUSE Leap. + +## Table of Contents + +1. [Supported Distributions](#supported-distributions) +2. [Architecture](#architecture) +3. [Prerequisites](#prerequisites) +4. [Cloud Image Sources](#cloud-image-sources) +5. [Deployment Process](#deployment-process) +6. [Distribution-Specific Configuration](#distribution-specific-configuration) +7. [Security Features](#security-features) +8. [Post-Deployment](#post-deployment) +9. [Troubleshooting](#troubleshooting) +10. [Best Practices](#best-practices) + +## Supported Distributions + +### Debian Family + +| Distribution | Version | Package Manager | Firewall | Cloud Image | +|--------------|---------|----------------|----------|-------------| +| Debian | 11 (Bullseye) | apt | ufw | ✅ Auto-download | +| Debian | 12 (Bookworm) | apt | ufw | ✅ Auto-download | +| Ubuntu | 20.04 LTS (Focal) | apt | ufw | ✅ Auto-download | +| Ubuntu | 22.04 LTS (Jammy) | apt | ufw | ✅ Auto-download | +| Ubuntu | 24.04 LTS (Noble) | apt | ufw | ✅ Auto-download | + +### RHEL Family + +| Distribution | Version | Package Manager | Firewall | SELinux | Cloud Image | +|--------------|---------|----------------|----------|---------|-------------| +| RHEL | 8, 9 | dnf | firewalld | Enforcing | ⚠️ Manual download | +| CentOS Stream | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download | +| Rocky Linux | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download | +| AlmaLinux | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download | + +### SUSE Family + +| Distribution | Version | Package Manager | Firewall | Cloud Image | +|--------------|---------|----------------|----------|-------------| +| SLES | 15 | zypper | firewalld | ⚠️ Manual download | +| openSUSE Leap | 15.5, 15.6 | zypper | firewalld | ✅ Auto-download | + +**Legend:** +- ✅ = Automatically downloaded from official repositories +- ⚠️ = Requires subscription and manual download + +## Architecture + +### Multi-Distribution Support Design + +``` +┌─────────────────────────────────────────────────────────┐ +│ Ansible Control Node │ +│ │ +│ ┌────────────────────────────────────────────────┐ │ +│ │ deploy-linux-vm.yml │ │ +│ │ │ │ +│ │ • Distribution Selection Logic │ │ +│ │ • Cloud Image Repository Map │ │ +│ │ • OS Family Detection │ │ +│ │ • Package Manager Adaptation │ │ +│ └────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ + │ + │ SSH + ▼ +┌─────────────────────────────────────────────────────────┐ +│ KVM Hypervisor (grokbox) │ +│ │ +│ ┌──────────────────────────────────────────────┐ │ +│ │ Cloud Image Cache │ │ +│ │ /var/lib/libvirt/images/ │ │ +│ │ ├─ debian-12-*.qcow2 │ │ +│ │ ├─ ubuntu-22.04-*.img │ │ +│ │ ├─ centos-stream-9-*.qcow2 │ │ +│ │ ├─ rocky-9-*.qcow2 │ │ +│ │ └─ almalinux-9-*.qcow2 │ │ +│ └──────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────┐ │ +│ │ libvirt/QEMU │ │ +│ │ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Debian VM │ │ Ubuntu VM │ │ │ +│ │ │ ufw enabled │ │ ufw enabled │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ │ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Rocky VM │ │ Alma VM │ │ │ +│ │ │ SELinux=Enf │ │ SELinux=Enf │ │ │ +│ │ │ firewalld │ │ firewalld │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ └──────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Deployment Workflow + +``` +[Start] + │ + ▼ +[Validate Distribution Selection] + │ + ├─ Check distribution in supported list + ├─ Set distribution facts (family, package manager, etc.) + └─ Display deployment configuration + │ + ▼ +[Pre-flight Checks] + │ + ├─ Verify VM doesn't already exist + ├─ Validate virtualization support + └─ Install required packages on hypervisor + │ + ▼ +[Download Cloud Image] + │ + ├─ Check if image cached + ├─ Download from official repository + └─ Verify checksum (SHA256/SHA512) + │ + ▼ +[Create VM Storage] + │ + ├─ Create qcow2 disk (CoW from base image) + └─ Set proper permissions (libvirt-qemu/qemu) + │ + ▼ +[Generate Cloud-Init Configuration] + │ + ├─ Select template based on OS family: + │ ├─ Debian/Ubuntu → apt, ufw, unattended-upgrades + │ ├─ RHEL family → dnf, firewalld, SELinux, dnf-automatic + │ └─ SUSE family → zypper, firewalld + │ + ├─ Create meta-data (hostname, instance-id) + ├─ Create user-data (users, packages, security) + └─ Generate cloud-init ISO + │ + ▼ +[Deploy VM] + │ + ├─ Run virt-install with appropriate os-variant + ├─ Attach disk and cloud-init ISO + └─ Start VM + │ + ▼ +[Wait for Boot] + │ + ├─ VM boots from qcow2 disk + ├─ Cloud-init runs configuration + ├─ Network configured via DHCP + └─ Get IP address from libvirt + │ + ▼ +[Validation] + │ + ├─ Test SSH connectivity + ├─ Verify cloud-init completion + ├─ Display VM information + └─ System health checks + │ + ▼ +[Cleanup] + │ + └─ Remove temporary files + │ + ▼ +[Complete] +``` + +## Prerequisites + +### Hypervisor Requirements + +**Hardware:** +- CPU with virtualization extensions (Intel VT-x or AMD-V) +- Sufficient RAM for host + guest VMs +- Adequate storage for cloud images and VM disks + +**Software:** + +**For Debian/Ubuntu hypervisors:** +```bash +apt install -y \ + libvirt-daemon-system \ + libvirt-clients \ + virtinst \ + qemu-kvm \ + qemu-utils \ + cloud-image-utils \ + genisoimage \ + python3-libvirt +``` + +**For RHEL/CentOS/Rocky/Alma hypervisors:** +```bash +dnf install -y \ + libvirt \ + libvirt-client \ + virt-install \ + qemu-kvm \ + qemu-img \ + cloud-utils \ + genisoimage \ + python3-libvirt +``` + +**Services:** +```bash +systemctl enable --now libvirtd +``` + +### Network Requirements + +- Internet connectivity for cloud image downloads +- DNS resolution working +- libvirt default network active: + ```bash + virsh net-list + virsh net-start default # if not started + virsh net-autostart default + ``` + +### Ansible Control Node + +- Ansible 2.9 or newer +- SSH access to hypervisor +- Python 3.x installed + +## Cloud Image Sources + +### Official Repositories + +**Debian:** +- URL: https://cloud.debian.org/images/cloud/ +- Format: qcow2 +- Checksum: SHA512SUMS provided +- Update Frequency: Regular (stable releases) + +**Ubuntu:** +- URL: https://cloud-images.ubuntu.com/ +- Format: img (qcow2 compatible) +- Checksum: SHA256SUMS provided +- Update Frequency: Daily builds available + +**CentOS Stream:** +- URL: https://cloud.centos.org/centos/ +- Format: qcow2 +- Checksum: SHA256 CHECKSUM file +- Update Frequency: Regular updates + +**Rocky Linux:** +- URL: https://download.rockylinux.org/pub/rocky/ +- Format: qcow2 +- Checksum: SHA256 CHECKSUM file +- Update Frequency: Regular with point releases + +**AlmaLinux:** +- URL: https://repo.almalinux.org/almalinux/ +- Format: qcow2 +- Checksum: SHA256 CHECKSUM file +- Update Frequency: Regular with point releases + +**openSUSE Leap:** +- URL: https://download.opensuse.org/distribution/leap/ +- Format: qcow2 +- Checksum: SHA256 per-file +- Update Frequency: Per release cycle + +### Manual Download Required + +**Red Hat Enterprise Linux (RHEL):** +- Requires: Red Hat subscription +- Portal: https://access.redhat.com/downloads/ +- Steps: + 1. Log in to Red Hat Customer Portal + 2. Navigate to Downloads + 3. Select "Red Hat Enterprise Linux" + 4. Download KVM Guest Image + 5. Place at: `/var/lib/libvirt/images/rhel-X-x86_64-kvm.qcow2` + +**SUSE Linux Enterprise Server (SLES):** +- Requires: SUSE subscription +- Portal: https://scc.suse.com/ +- Steps: + 1. Log in to SUSE Customer Center + 2. Download cloud image for SLES 15 + 3. Place at: `/var/lib/libvirt/images/sles-15-genericcloud-amd64.qcow2` + +## Deployment Process + +### Basic Deployment + +```bash +ansible-playbook plays/deploy-linux-vm.yml \ + -e "os_distribution=" \ + -e "vm_name=" +``` + +### Distribution Selection + +The `os_distribution` variable determines which Linux distribution to deploy. Format: `distro-version` + +**Examples:** +```bash +# Debian +-e "os_distribution=debian-12" + +# Ubuntu +-e "os_distribution=ubuntu-22.04" + +# CentOS Stream +-e "os_distribution=centos-stream-9" + +# Rocky Linux +-e "os_distribution=rocky-9" + +# AlmaLinux +-e "os_distribution=almalinux-9" + +# openSUSE +-e "os_distribution=opensuse-leap-15.6" +``` + +### Resource Customization + +```bash +ansible-playbook plays/deploy-linux-vm.yml \ + -e "os_distribution=rocky-9" \ + -e "vm_name=database-server" \ + -e "vm_hostname=db01" \ + -e "vm_domain=production.local" \ + -e "vm_vcpus=8" \ + -e "vm_memory_mb=16384" \ + -e "vm_disk_size_gb=200" +``` + +### Configuration Variables + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `os_distribution` | **Required** | debian-12 | Distribution identifier | +| `vm_name` | String | linux-guest | VM name in libvirt | +| `vm_hostname` | String | linux-vm | Guest hostname | +| `vm_domain` | String | localdomain | DNS domain | +| `vm_vcpus` | Integer | 2 | Number of virtual CPUs | +| `vm_memory_mb` | Integer | 2048 | RAM in megabytes | +| `vm_disk_size_gb` | Integer | 20 | Disk size in gigabytes | +| `vm_network` | String | default | Libvirt network name | +| `vm_bridge` | String | virbr0 | Bridge interface | +| `ansible_user_ssh_key` | String | (preset) | SSH public key for ansible user | + +## Distribution-Specific Configuration + +### Debian/Ubuntu Systems + +**Package Manager:** apt + +**Cloud-Init Packages:** +- sudo, vim, htop, tmux, curl, wget, rsync, git +- python3, python3-pip, jq, bc +- aide, auditd, chrony, ufw, lvm2 +- cloud-guest-utils, parted +- unattended-upgrades, apt-listchanges + +**Security Configuration:** +- **Firewall:** ufw enabled, SSH allowed +- **SSH:** Root login disabled, key-only auth +- **Updates:** unattended-upgrades for security updates +- **Audit:** auditd enabled +- **Time Sync:** chrony configured + +**User Management:** +- ansible user → member of `sudo` group +- Passwordless sudo access + +**Automatic Updates Configuration:** +``` +/etc/apt/apt.conf.d/50unattended-upgrades +/etc/apt/apt.conf.d/20auto-upgrades +``` + +**Post-Boot Commands:** +```bash +systemctl enable ssh && systemctl restart ssh +systemctl enable chrony && systemctl start chrony +ufw --force enable && ufw allow ssh +systemctl enable auditd && systemctl start auditd +growpart /dev/vda 1 && resize2fs /dev/vda1 +``` + +### RHEL Family Systems + +**Package Manager:** dnf + +**Cloud-Init Packages:** +- sudo, vim, htop, tmux, curl, wget, rsync, git +- python3, python3-pip, jq, bc +- aide, audit, chrony, firewalld, lvm2 +- cloud-utils-growpart, gdisk +- dnf-automatic +- policycoreutils-python-utils + +**Security Configuration:** +- **Firewall:** firewalld enabled, SSH service allowed +- **SELinux:** Enforcing mode +- **SSH:** Root login disabled, key-only auth +- **Updates:** dnf-automatic for security updates +- **Audit:** auditd enabled +- **Time Sync:** chronyd configured + +**User Management:** +- ansible user → member of `wheel` group +- Passwordless sudo access + +**SELinux Configuration:** +```bash +setenforce 1 +sed -i 's/^SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config +``` + +**Automatic Updates Configuration:** +``` +/etc/dnf/automatic.conf + upgrade_type = security + apply_updates = yes +``` + +**Post-Boot Commands:** +```bash +systemctl enable sshd && systemctl restart sshd +systemctl enable chronyd && systemctl start chronyd +systemctl enable firewalld && systemctl start firewalld +firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload +systemctl enable auditd && systemctl start auditd +systemctl enable dnf-automatic.timer && systemctl start dnf-automatic.timer +setenforce 1 +growpart /dev/vda 1 && xfs_growfs / +``` + +### SUSE Family Systems + +**Package Manager:** zypper + +**Cloud-Init Packages:** +- sudo, vim, htop, tmux, curl, wget, rsync, git +- python3, python3-pip, jq, bc +- aide, audit, chrony, firewalld, lvm2 +- cloud-utils-growpart, gdisk + +**Security Configuration:** +- **Firewall:** firewalld enabled, SSH service allowed +- **SSH:** Root login disabled, key-only auth +- **Audit:** auditd enabled +- **Time Sync:** chronyd configured + +**User Management:** +- ansible user → member of `wheel` group +- Passwordless sudo access + +**Post-Boot Commands:** +```bash +systemctl enable sshd && systemctl restart sshd +systemctl enable chronyd && systemctl start chronyd +systemctl enable firewalld && systemctl start firewalld +firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload +systemctl enable auditd && systemctl start auditd +growpart /dev/vda 1 && xfs_growfs / || resize2fs /dev/vda1 || btrfs filesystem resize max / +``` + +## Security Features + +### Universal Security Measures + +All deployed VMs, regardless of distribution, include: + +1. **User Security:** + - Dedicated `ansible` service account + - SSH key-based authentication only + - Passwordless sudo (with logging) + - Root SSH login disabled + - Emergency console access available (password: ChangeMe123!) + +2. **Network Security:** + - Host-based firewall enabled and configured + - SSH service allowed + - Default deny policy for incoming traffic + - Outgoing traffic allowed + +3. **System Security:** + - Audit daemon (auditd) enabled + - Automatic security updates configured + - Time synchronization enabled (chrony) + - File integrity monitoring installed (AIDE) + - Secure SSH configuration applied + +4. **SSH Hardening:** + ``` + PermitRootLogin no + PasswordAuthentication no + PubkeyAuthentication yes + MaxAuthTries 3 + MaxSessions 10 + ClientAliveInterval 300 + ClientAliveCountMax 2 + ``` + +### Distribution-Specific Security + +**RHEL Family:** +- SELinux in enforcing mode +- firewalld with rich rules support +- dnf-automatic for security updates +- Subscription management for certified packages (RHEL) + +**Debian/Ubuntu:** +- AppArmor profiles (Ubuntu) +- UFW for simplified firewall management +- unattended-upgrades for security updates +- Automatic security patch installation + +**SUSE Family:** +- AppArmor support +- firewalld with zones +- YaST integration for security management + +### Compliance Alignment + +The deployment follows CLAUDE.md security principles: + +✅ Principle of least privilege +✅ Encryption in transit (SSH) +✅ Key-based authentication +✅ Automated security updates +✅ System auditing enabled +✅ Time synchronization +✅ Firewall enabled by default +✅ Regular security patching + +## Post-Deployment + +### Adding to Ansible Inventory + +**Debian/Ubuntu VM:** +```yaml +debian_servers: + hosts: + debian12-vm: + ansible_host: 192.168.122.X + ansible_user: ansible + ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new' + os_distribution: debian-12 + os_family: debian + package_manager: apt +``` + +**RHEL Family VM:** +```yaml +rhel_servers: + hosts: + rocky9-vm: + ansible_host: 192.168.122.X + ansible_user: ansible + ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new' + os_distribution: rocky-9 + os_family: rhel + package_manager: dnf + selinux_mode: enforcing +``` + +### Initial Configuration + +After deployment, run configuration management: + +```bash +# Update system packages +ansible -m package -a "name=* state=latest" -b + +# Install additional packages +ansible -m package -a "name=nginx state=present" -b + +# Run configuration playbooks +ansible-playbook -i inventories/development/hosts.yml \ + playbooks/configure-webserver.yml \ + -l +``` + +### Verification Steps + +1. **SSH Access:** + ```bash + ssh -J grokbox ansible@ + ``` + +2. **Cloud-Init Status:** + ```bash + cloud-init status --wait + cloud-init status --long + ``` + +3. **System Information:** + ```bash + cat /etc/os-release + uname -r + ``` + +4. **Security Checks:** + ```bash + # Firewall + sudo ufw status verbose # Debian/Ubuntu + sudo firewall-cmd --list-all # RHEL/SUSE + + # SELinux (RHEL family) + getenforce + + # Audit daemon + sudo systemctl status auditd + + # Automatic updates + sudo systemctl status unattended-upgrades # Debian/Ubuntu + sudo systemctl status dnf-automatic.timer # RHEL + ``` + +5. **Disk and Memory:** + ```bash + df -h + free -h + lsblk + ``` + +## Troubleshooting + +### Distribution Selection Issues + +**Problem:** Invalid distribution error + +**Solution:** +```bash +# List supported distributions +grep "^ " plays/deploy-linux-vm.yml | grep -E "debian-|ubuntu-|rhel-|centos-|rocky-|alma|sles-|opensuse" + +# Use exact distribution identifier +-e "os_distribution=debian-12" # Correct +-e "os_distribution=debian" # Wrong +``` + +### Cloud Image Download Failures + +**Problem:** Image download fails or times out + +**Causes:** +- Network connectivity issues +- Repository temporarily unavailable +- Proxy configuration needed + +**Solutions:** +```bash +# Test connectivity +curl -I https://cloud.debian.org/ +curl -I https://cloud-images.ubuntu.com/ + +# Manual download +cd /var/lib/libvirt/images +wget + +# Configure proxy (if needed) +export https_proxy=http://proxy:port +``` + +### Checksum Verification Failures + +**Problem:** Checksum verification fails + +**Causes:** +- Corrupt download +- Mismatch between image and checksum file +- Wrong checksum type + +**Solutions:** +```bash +# Re-download image +rm /var/lib/libvirt/images/ +ansible-playbook plays/deploy-linux-vm.yml -e "os_distribution=..." -t download + +# Verify manually +cd /var/lib/libvirt/images +sha256sum +# Compare with checksum file +``` + +### VM Boot Issues + +**Problem:** VM created but won't boot or get IP + +**Causes:** +- Cloud-init configuration error +- Network misconfiguration +- Insufficient resources + +**Solutions:** +```bash +# Check VM status +virsh list --all +virsh dominfo + +# View console +virsh console + +# Check cloud-init logs (via console) +tail -f /var/log/cloud-init-output.log +journalctl -u cloud-init + +# Restart VM +virsh destroy +virsh start +``` + +### SSH Connection Issues + +**Problem:** Cannot SSH to deployed VM + +**Causes:** +- SSH key not configured correctly +- Firewall blocking +- cloud-init not completed +- Wrong IP address + +**Solutions:** +```bash +# Verify IP address +virsh domifaddr + +# Test connectivity +ping + +# Check SSH service via console +virsh console +# Then: systemctl status ssh|sshd + +# Verify firewall +# Via console: +sudo ufw status # Debian/Ubuntu +sudo firewall-cmd --list-all # RHEL/SUSE + +# Check cloud-init completion +# Via console: +cloud-init status --wait +``` + +### SELinux Issues (RHEL Family) + +**Problem:** Services failing due to SELinux denials + +**Solutions:** +```bash +# Check SELinux status +getenforce +sestatus + +# View denials +sudo ausearch -m avc -ts recent + +# Temporarily set to permissive (troubleshooting only) +sudo setenforce 0 + +# Generate policy from denials +sudo ausearch -m avc -ts recent | audit2allow -M myapp +sudo semodule -i myapp.pp + +# Re-enable enforcing +sudo setenforce 1 +``` + +### Package Manager Issues + +**Debian/Ubuntu:** +```bash +# Update package cache +sudo apt update + +# Fix broken packages +sudo apt --fix-broken install + +# Clear cache +sudo apt clean +``` + +**RHEL Family:** +```bash +# Update metadata +sudo dnf makecache + +# Check for problems +sudo dnf check + +# Clean cache +sudo dnf clean all +``` + +**SUSE:** +```bash +# Refresh repositories +sudo zypper refresh + +# Verify +sudo zypper verify + +# Clean cache +sudo zypper clean +``` + +## Best Practices + +### Distribution Selection + +1. **Use LTS versions for production:** + - Ubuntu 22.04 LTS (support until 2027) + - Ubuntu 24.04 LTS (support until 2029) + - RHEL/Rocky/Alma 9 (support until 2032) + +2. **Match distribution to workload:** + - Web servers: Ubuntu, Debian + - Enterprise applications: RHEL, Rocky Linux, AlmaLinux + - Container hosts: CentOS Stream, Rocky Linux + - Development: Ubuntu, Debian, openSUSE + +3. **Consider support requirements:** + - Commercial support: RHEL, SLES + - Community support: CentOS Stream, Rocky Linux, AlmaLinux, Debian, Ubuntu, openSUSE + +### Resource Allocation + +**Minimum Requirements:** +- 1 vCPU, 1GB RAM, 10GB disk (testing only) + +**Recommended for Production:** +- 2+ vCPUs, 2GB+ RAM, 20GB+ disk + +**Workload-Specific:** +``` +Web Server: 2-4 vCPUs, 4GB RAM, 40GB disk +Database Server: 4-8 vCPUs, 16GB RAM, 100GB+ disk +Application Server: 4-8 vCPUs, 8GB RAM, 80GB disk +Container Host: 4-8 vCPUs, 16GB RAM, 80GB disk +Development: 2-4 vCPUs, 8GB RAM, 50GB disk +``` + +### Security Hardening + +1. **Change default passwords immediately:** + ```bash + sudo passwd root # Change from ChangeMe123! + ``` + +2. **Configure proper SSH keys:** + - Use dedicated key per environment + - Rotate keys regularly (90-180 days) + - Use Ed25519 keys when possible + +3. **Enable additional security features:** + - CIS benchmarks scanning + - Intrusion detection (fail2ban, OSSEC) + - Log forwarding to SIEM + - Vulnerability scanning + +4. **Regular updates:** + - Monitor automatic update logs + - Schedule manual updates for major versions + - Test updates in staging first + +### Operational Excellence + +1. **Naming Conventions:** + - Use descriptive, meaningful VM names + - Include purpose and environment: `web-prod-01`, `db-dev-01` + - Document naming scheme + +2. **Inventory Management:** + - Keep Ansible inventory up-to-date + - Document VM purpose and owner + - Track VM lifecycle + +3. **Monitoring:** + - Set up monitoring for all VMs + - Configure alerting for critical issues + - Monitor resource usage trends + +4. **Backup Strategy:** + - Regular VM backups or disk snapshots + - Test restore procedures + - Document backup retention policy + +5. **Documentation:** + - Document VM purpose and configuration + - Maintain runbooks for common tasks + - Keep network diagrams current + +### Performance Optimization + +1. **Disk I/O:** + - Use virtio drivers (already configured) + - Consider separate disk for databases + - Use appropriate filesystem (xfs for RHEL, ext4 for Debian) + +2. **Network:** + - Use virtio network driver (already configured) + - Consider SR-IOV for high-performance needs + - Monitor network latency + +3. **CPU:** + - Right-size vCPU allocation + - Avoid overcommitment on critical VMs + - Use CPU pinning for performance-critical workloads + +4. **Memory:** + - Allocate sufficient RAM to avoid swapping + - Monitor memory usage + - Consider huge pages for databases + +## References + +- [CLAUDE.md](../CLAUDE.md) - Infrastructure guidelines +- [Cheatsheet](../cheatsheets/deploy-linux-vm.md) - Quick reference +- [Debian Cloud Images](https://cloud.debian.org/images/cloud/) +- [Ubuntu Cloud Images](https://cloud-images.ubuntu.com/) +- [CentOS Stream](https://www.centos.org/centos-stream/) +- [Rocky Linux](https://rockylinux.org/) +- [AlmaLinux](https://almalinux.org/) +- [openSUSE](https://www.opensuse.org/) +- [cloud-init Documentation](https://cloudinit.readthedocs.io/) +- [libvirt Documentation](https://libvirt.org/docs.html) + +--- + +**Document Version**: 1.0 +**Last Updated**: 2025-11-10 +**Maintained By**: Ansible Infrastructure Team