Add comprehensive documentation

- Add linux-vm-deployment.md with complete deployment guide
  - Architecture overview and security model
  - Supported distributions matrix
  - LVM partitioning specifications
  - Distribution-specific configurations
  - Troubleshooting procedures
  - Performance tuning guidelines
This commit is contained in:
Infrastructure Team
2025-11-10 22:52:03 +01:00
parent 82796a18e4
commit 04a381e0d5
3 changed files with 2188 additions and 0 deletions

View File

@@ -0,0 +1,728 @@
# Debian 12 VM Deployment Documentation
## Overview
This document describes the automated deployment process for Debian 12 virtual machines on the grokbox KVM/libvirt hypervisor. The deployment uses cloud-init for unattended configuration and follows the security-first principles outlined in CLAUDE.md.
## Table of Contents
1. [Architecture](#architecture)
2. [Prerequisites](#prerequisites)
3. [Deployment Process](#deployment-process)
4. [Configuration](#configuration)
5. [Security Features](#security-features)
6. [Post-Deployment](#post-deployment)
7. [Troubleshooting](#troubleshooting)
8. [Maintenance](#maintenance)
## Architecture
### Infrastructure Components
```
┌─────────────────────────────────────────────┐
│ grokbox (KVM Hypervisor) │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ libvirt/QEMU │ │
│ │ │ │
│ │ ┌────────────────────────────────┐ │ │
│ │ │ Debian 12 Guest VM │ │ │
│ │ │ │ │ │
│ │ │ - 2 vCPUs / 2GB RAM │ │ │
│ │ │ - 20GB qcow2 disk │ │ │
│ │ │ - cloud-init configured │ │ │
│ │ │ - ansible user ready │ │ │
│ │ │ - Security hardened │ │ │
│ │ └────────────────────────────────┘ │ │
│ │ │ │
│ │ Network: virbr0 (192.168.122.0/24) │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
```
### Deployment Workflow
```
[Ansible Control Node]
│ 1. SSH to grokbox
[grokbox hypervisor]
│ 2. Download Debian cloud image
├─ 3. Verify checksums
├─ 4. Create VM disk (qcow2)
├─ 5. Generate cloud-init ISO
├─ 6. Create VM with virt-install
│ 7. VM boots with cloud-init
[Debian 12 VM]
├─ 8. Create ansible user
├─ 9. Configure SSH
├─ 10. Install packages
├─ 11. Security hardening
└─ 12. System ready
```
## Prerequisites
### Hypervisor Requirements
On **grokbox**, ensure the following are present:
1. **Virtualization Support**
```bash
# Verify CPU virtualization
egrep -c '(vmx|svm)' /proc/cpuinfo # Should be > 0
# Verify KVM module loaded
lsmod | grep kvm
```
2. **Required Packages**
- libvirt-daemon-system
- libvirt-clients
- virtinst
- qemu-kvm
- qemu-utils
- cloud-image-utils
- genisoimage
- python3-libvirt
3. **Sufficient Resources**
- Storage: ~25GB available in `/var/lib/libvirt/images/`
- Memory: Enough free RAM for VM allocation
- Network: libvirt default network configured
4. **libvirtd Service Running**
```bash
systemctl status libvirtd
```
### Ansible Control Node Requirements
1. Ansible 2.9 or newer
2. SSH access to grokbox hypervisor
3. SSH key configured for grok user
4. Python 3.x installed
### Network Requirements
- Connectivity to Debian cloud image repository
- DNS resolution working
- Default libvirt network (virbr0) configured and active
## Deployment Process
### Step 1: Pre-flight Checks
The playbook performs the following validations:
- **VM Name Uniqueness**: Ensures no VM with the same name exists
- **Virtualization Support**: Validates QEMU/KVM capabilities
- **Package Installation**: Installs required tools if missing
- **Service Status**: Verifies libvirtd is running
### Step 2: Image Management
#### Download Debian Cloud Image
- **Source**: https://cloud.debian.org/images/cloud/bookworm/latest/
- **Image**: debian-12-generic-amd64.qcow2
- **Cache Location**: `/var/lib/libvirt/images/debian-12-generic-amd64.qcow2`
- **Checksum Verification**: SHA512SUMS validated
The base image is downloaded once and cached for subsequent deployments.
#### Create VM Disk
A new copy-on-write (CoW) disk is created using qemu-img:
```bash
qemu-img create -f qcow2 \
-F qcow2 \
-b /var/lib/libvirt/images/debian-12-generic-amd64.qcow2 \
/var/lib/libvirt/images/debian12-guest.qcow2 \
20G
```
This creates a thin-provisioned disk backed by the cloud image.
### Step 3: Cloud-Init Configuration
Two configuration files are generated:
#### meta-data
```yaml
instance-id: debian12-guest
local-hostname: debian12
```
#### user-data
Comprehensive cloud-init configuration including:
- **User Management**: Creates ansible user with SSH keys
- **Security Configuration**: SSH hardening, firewall setup
- **Package Installation**: Essential and security packages
- **System Configuration**: Time sync, locale, timezone
- **Automatic Updates**: Unattended security upgrades
#### ISO Generation
The configuration files are packaged into a bootable ISO:
```bash
genisoimage -output debian12-guest-cloud-init.iso \
-volid cidata -joliet -rock \
user-data meta-data
```
### Step 4: VM Creation
VM is created using virt-install:
```bash
virt-install \
--name debian12-guest \
--memory 2048 \
--vcpus 2 \
--disk path=/var/lib/libvirt/images/debian12-guest.qcow2,format=qcow2,bus=virtio \
--disk path=/var/lib/libvirt/images/debian12-guest-cloud-init.iso,device=cdrom \
--network network=default,model=virtio \
--os-variant debian11 \
--graphics none \
--console pty,target_type=serial \
--import \
--noautoconsole
```
### Step 5: Boot and Initialization
1. **VM Boots**: Starts from the qcow2 disk
2. **Cloud-Init Runs**: Reads configuration from ISO
3. **System Configuration**: Applies all settings
4. **Network Configuration**: Obtains IP via DHCP
5. **Package Updates**: Downloads and installs updates
6. **Service Initialization**: Starts all configured services
**Typical boot time**: 60-90 seconds
### Step 6: Validation
The playbook validates:
- VM is running and accessible
- IP address assigned
- SSH port (22) accepting connections
- cloud-init completed successfully
- System resources available
### Step 7: Post-Deployment Configuration
Optional second play that:
- Waits for cloud-init completion
- Gathers system facts
- Displays system information
- Validates disk and memory usage
## Configuration
### Default Configuration
```yaml
# VM Specifications
vm_name: "debian12-guest"
vm_hostname: "debian12"
vm_domain: "localdomain"
vm_vcpus: 2
vm_memory_mb: 2048
vm_disk_size_gb: 20
# Network
vm_network: "default"
vm_bridge: "virbr0"
# Storage
vm_disk_path: "/var/lib/libvirt/images/{{ vm_name }}.qcow2"
cloud_init_iso_path: "/var/lib/libvirt/images/{{ vm_name }}-cloud-init.iso"
```
### Customization Examples
#### High-Performance VM
```bash
ansible-playbook plays/deploy-debian12-vm.yml \
-e "vm_name=app-server" \
-e "vm_vcpus=8" \
-e "vm_memory_mb=16384" \
-e "vm_disk_size_gb=100"
```
#### Development VM
```bash
ansible-playbook plays/deploy-debian12-vm.yml \
-e "vm_name=dev-workstation" \
-e "vm_vcpus=4" \
-e "vm_memory_mb=8192" \
-e "vm_disk_size_gb=50" \
-e "vm_hostname=devbox" \
-e "vm_domain=dev.local"
```
#### Custom SSH Key
```bash
ansible-playbook plays/deploy-debian12-vm.yml \
-e "vm_name=secure-vm" \
-e "ansible_user_ssh_key='ssh-ed25519 AAAA...'"
```
### Variable Precedence
Variables can be set in order of precedence:
1. **Command-line** (`-e` flag) - Highest
2. **Playbook vars section**
3. **Inventory host_vars**
4. **Inventory group_vars**
5. **Defaults in playbook** - Lowest
## Security Features
### User Management
- **ansible user**: Non-root service account
- Passwordless sudo access
- SSH key authentication only
- Member of sudo group
- Home directory: `/home/ansible`
- **root user**: Console access only
- SSH login disabled
- Password set for emergency console access
- Remote access blocked
### SSH Hardening
Configuration in `/etc/ssh/sshd_config.d/99-security.conf`:
```
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
MaxSessions 10
ClientAliveInterval 300
ClientAliveCountMax 2
```
### Firewall Configuration
- **UFW (Uncomplicated Firewall)** enabled by default
- Default policy: deny incoming, allow outgoing
- SSH (port 22) allowed
- Additional rules can be added post-deployment
### Automatic Security Updates
Unattended-upgrades configured for:
- Automatic installation of security updates
- Daily update checks
- Automatic cleanup of old kernels
- Email notifications (if configured)
- **No automatic reboot** (requires manual intervention)
### Audit and Monitoring
- **auditd**: System call auditing enabled
- **aide**: File integrity monitoring installed
- **chrony**: Time synchronization configured
- **Logging**: All cloud-init output logged
### Compliance Features
Aligned with CLAUDE.md security requirements:
- ✅ Principle of least privilege
- ✅ Encryption in transit (SSH)
- ✅ Key-based authentication
- ✅ Automated security updates
- ✅ System auditing enabled
- ✅ Time synchronization
- ✅ Firewall enabled by default
## Post-Deployment
### Adding to Inventory
Update your Ansible inventory:
```yaml
# inventories/development/hosts.yml
kvm_guests:
children:
application_servers:
hosts:
debian12-guest:
ansible_host: 192.168.122.X
ansible_user: ansible
ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new'
ansible_python_interpreter: /usr/bin/python3
host_description: "Application Server - Debian 12"
host_role: application
host_type: virtual_machine
hypervisor: grokbox
vm_vcpus: 2
vm_memory_mb: 2048
autostart: true
```
### Initial Access
```bash
# Get VM IP address
ssh grokbox "virsh domifaddr debian12-guest"
# SSH to VM via ProxyJump
ssh -J grokbox ansible@192.168.122.X
# Or add to ~/.ssh/config
Host debian12-guest
HostName 192.168.122.X
User ansible
ProxyJump grokbox
StrictHostKeyChecking accept-new
```
### Configuration Management
Run additional roles or playbooks:
```bash
# Example: Configure web server
ansible-playbook -i inventories/development/hosts.yml \
playbooks/configure-webserver.yml \
-l debian12-guest
# Example: Security hardening
ansible-playbook -i inventories/development/hosts.yml \
playbooks/security-hardening.yml \
-l debian12-guest
```
### VM Management Commands
```bash
# Start VM
virsh start debian12-guest
# Shutdown VM gracefully
virsh shutdown debian12-guest
# Force shutdown
virsh destroy debian12-guest
# Reboot VM
virsh reboot debian12-guest
# Enable autostart
virsh autostart debian12-guest
# Disable autostart
virsh autostart debian12-guest --disable
# VM status
virsh dominfo debian12-guest
# VM resource usage
virsh domstats debian12-guest
# Console access
virsh console debian12-guest
```
## Troubleshooting
### Common Issues
#### 1. VM Already Exists
**Error**: VM with name already exists
**Solution**:
```bash
# Check existing VMs
virsh list --all
# Remove existing VM
virsh destroy debian12-guest # if running
virsh undefine debian12-guest --remove-all-storage
```
#### 2. Image Download Fails
**Error**: Failed to download cloud image
**Causes**:
- Network connectivity issues
- Proxy configuration
- DNS resolution problems
**Solution**:
```bash
# Test connectivity
curl -I https://cloud.debian.org
# Manual download
cd /var/lib/libvirt/images
wget https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2
# Re-run playbook
ansible-playbook plays/deploy-debian12-vm.yml -t deploy
```
#### 3. VM Won't Get IP Address
**Error**: IP address not assigned after 10 retries
**Causes**:
- DHCP server not running
- Network misconfiguration
- VM network interface issues
**Solution**:
```bash
# Check libvirt network
virsh net-list --all
virsh net-info default
virsh net-start default # if not started
# Check VM network interface
virsh domiflist debian12-guest
# Check DHCP leases
virsh net-dhcp-leases default
# Access console to troubleshoot
virsh console debian12-guest
# Check: ip addr, systemctl status networking
```
#### 4. SSH Connection Failed
**Error**: SSH connection timeout or refused
**Causes**:
- SSH service not started
- Firewall blocking
- Wrong IP address
- cloud-init not completed
**Solution**:
```bash
# Verify VM is running
virsh list
# Check cloud-init status via console
virsh console debian12-guest
# Run: cloud-init status --wait
# Check SSH service
# Via console: systemctl status ssh
# Check firewall
# Via console: ufw status
# Verify SSH key
ssh-add -l
```
#### 5. Insufficient Resources
**Error**: Failed to allocate memory or storage
**Solution**:
```bash
# Check available resources
free -h
df -h /var/lib/libvirt/images/
# Adjust VM resources
ansible-playbook plays/deploy-debian12-vm.yml \
-e "vm_memory_mb=1024" \
-e "vm_disk_size_gb=10"
```
### Debug Mode
Enable verbose logging:
```bash
# Ansible verbose mode
ansible-playbook plays/deploy-debian12-vm.yml -vvv
# Check cloud-init logs on VM
virsh console debian12-guest
# Then: tail -f /var/log/cloud-init-output.log
# Check libvirt logs
journalctl -u libvirtd -f
```
### Health Checks
```bash
# Verify VM health
virsh dominfo debian12-guest
virsh domstats debian12-guest
# Network connectivity
ping $(virsh domifaddr debian12-guest | grep -oP '(\d{1,3}\.){3}\d{1,3}' | head -1)
# SSH connectivity
ssh -J grokbox ansible@$(virsh domifaddr debian12-guest | grep -oP '(\d{1,3}\.){3}\d{1,3}' | head -1) "echo 'VM is accessible'"
```
## Maintenance
### Updating the Base Image
Periodically update the cached Debian cloud image:
```bash
# Remove old image
ssh grokbox "rm /var/lib/libvirt/images/debian-12-generic-amd64.qcow2"
# Download latest
ansible-playbook plays/deploy-debian12-vm.yml -t download,verify
```
### VM Snapshots
Create snapshots before major changes:
```bash
# Create snapshot
virsh snapshot-create-as debian12-guest \
snapshot1 \
"Before application deployment"
# List snapshots
virsh snapshot-list debian12-guest
# Revert to snapshot
virsh snapshot-revert debian12-guest snapshot1
# Delete snapshot
virsh snapshot-delete debian12-guest snapshot1
```
### Backup and Restore
#### Backup VM
```bash
# Stop VM
virsh shutdown debian12-guest
# Backup disk
cp /var/lib/libvirt/images/debian12-guest.qcow2 \
/backup/debian12-guest-$(date +%Y%m%d).qcow2
# Backup XML config
virsh dumpxml debian12-guest > /backup/debian12-guest.xml
# Start VM
virsh start debian12-guest
```
#### Restore VM
```bash
# Copy disk back
cp /backup/debian12-guest-20241110.qcow2 \
/var/lib/libvirt/images/debian12-guest.qcow2
# Define VM from XML
virsh define /backup/debian12-guest.xml
# Start VM
virsh start debian12-guest
```
### Resize VM Disk
```bash
# Shutdown VM
virsh shutdown debian12-guest
# Resize disk
qemu-img resize /var/lib/libvirt/images/debian12-guest.qcow2 +10G
# Start VM
virsh start debian12-guest
# On VM: resize partition and filesystem
growpart /dev/vda 1
resize2fs /dev/vda1
```
### Resource Adjustment
Modify VM resources:
```bash
# Set maximum memory (requires shutdown)
virsh setmaxmem debian12-guest 4194304 --config
# Set current memory (can be done live)
virsh setmem debian12-guest 4194304
# Set vCPUs (requires shutdown)
virsh setvcpus debian12-guest 4 --config --maximum
virsh setvcpus debian12-guest 4 --config
```
## Best Practices
1. **Naming Convention**: Use descriptive VM names indicating purpose
2. **Resource Planning**: Right-size VMs to avoid waste
3. **Documentation**: Document VM purpose and configuration
4. **Monitoring**: Set up monitoring for critical VMs
5. **Backups**: Regular backups of important VMs
6. **Updates**: Keep VMs updated with security patches
7. **Inventory**: Maintain accurate Ansible inventory
8. **Tags**: Use libvirt tags for organization
9. **Networking**: Use appropriate network isolation
10. **Testing**: Test deployment process in development first
## References
- [CLAUDE.md](../CLAUDE.md) - Infrastructure guidelines
- [Cheatsheet](../cheatsheets/deploy-debian12-vm.md) - Quick reference
- [Debian Cloud Images](https://cloud.debian.org/images/cloud/)
- [cloud-init Documentation](https://cloudinit.readthedocs.io/)
- [libvirt Documentation](https://libvirt.org/docs.html)
- [virt-install man page](https://linux.die.net/man/1/virt-install)
## Support and Contact
For issues or questions:
1. Check troubleshooting section above
2. Review cloud-init logs: `/var/log/cloud-init.log`
3. Review libvirt logs: `journalctl -u libvirtd`
4. Consult Ansible playbook: `plays/deploy-debian12-vm.yml`
---
**Document Version**: 1.0
**Last Updated**: 2025-11-10
**Maintained By**: Ansible Infrastructure Team

516
docs/inventory.md Normal file
View File

@@ -0,0 +1,516 @@
# Ansible Inventory Configuration
This document describes the dynamic and static inventory configurations for the Ansible infrastructure.
## Table of Contents
1. [Overview](#overview)
2. [Inventory Structure](#inventory-structure)
3. [Dynamic Inventory Solutions](#dynamic-inventory-solutions)
4. [Static/Hybrid Inventory](#statichybrid-inventory)
5. [Usage Examples](#usage-examples)
6. [Troubleshooting](#troubleshooting)
---
## Overview
Per the CLAUDE.md guidelines, this infrastructure uses **dynamic inventories** as the primary inventory source, with static inventories permitted for development environments only.
### Available Inventory Solutions
| Solution | Type | Use Case | Status |
|----------|------|----------|--------|
| SSH Config Parser | Dynamic | Quick discovery from SSH config | ✅ Active |
| Libvirt/KVM Plugin | Dynamic | Real-time VM discovery | ✅ Active |
| Static YAML | Static/Hybrid | Development environment | ✅ Active |
---
## Inventory Structure
```
inventories/
├── production/ # Production environment (dynamic only)
│ ├── group_vars/
│ │ └── all.yml
│ └── [dynamic inventory configs]
├── staging/ # Staging environment (dynamic only)
│ ├── group_vars/
│ │ └── all.yml
│ └── [dynamic inventory configs]
└── development/ # Development environment
├── hosts.yml # Static/hybrid inventory
├── libvirt_kvm.yml # Libvirt dynamic config
├── group_vars/
│ ├── all.yml
│ ├── kvm_guests.yml
│ └── hypervisors.yml
└── host_vars/
```
---
## Dynamic Inventory Solutions
### 1. SSH Config Parser (`ssh_config_inventory.py`)
**Location:** `/opt/ansible/plugins/inventory/ssh_config_inventory.py`
#### Description
Parses `~/.ssh/config` to automatically generate Ansible inventory with proper grouping and connection parameters.
#### Features
- Automatic host discovery from SSH config
- Intelligent grouping by host characteristics
- ProxyJump support for nested VM access
- No external dependencies (pure Python)
#### Usage
```bash
# List all hosts and groups
python3 plugins/inventory/ssh_config_inventory.py --list
# Get variables for specific host
python3 plugins/inventory/ssh_config_inventory.py --host pihole
# Use with ansible commands
ansible all -i plugins/inventory/ssh_config_inventory.py --list-hosts
# Use with playbooks
ansible-playbook -i plugins/inventory/ssh_config_inventory.py site.yml
```
#### Host Categorization Logic
| Category | Criteria |
|----------|----------|
| `external_hosts` | Public IPs, no ProxyJump, non-ansible user |
| `hypervisors` | ForwardAgent enabled, specific users (grok) |
| `dns_servers` | ansible user + ProxyJump + hostname contains 'pihole'/'dns' |
| `mail_servers` | ansible user + ProxyJump + hostname contains 'mail'/'mx' |
| `development` | ansible user + ProxyJump + hostname contains 'dev'/'test'/'derp' |
#### Generated Inventory Structure
```json
{
"all": {
"children": ["external_hosts", "hypervisors", "kvm_guests"]
},
"external_hosts": {
"hosts": ["odin"]
},
"hypervisors": {
"hosts": ["grokbox"]
},
"kvm_guests": {
"children": ["dns_servers", "mail_servers", "development", "uncategorized"],
"vars": {
"ansible_user": "ansible",
"ansible_ssh_common_args": "-o StrictHostKeyChecking=accept-new"
}
},
"_meta": {
"hostvars": { ... }
}
}
```
---
### 2. Libvirt/KVM Dynamic Inventory (`libvirt_kvm.py`)
**Location:** `/opt/ansible/plugins/inventory/libvirt_kvm.py`
#### Description
Queries libvirt hypervisors directly to discover KVM guest VMs in real-time, including their state, resources, and network configuration.
#### Features
- Real-time VM discovery via libvirt API
- VM state detection (running, stopped, paused)
- Automatic IP address detection
- Resource information (vCPUs, memory, networks)
- Multiple hypervisor support
- ProxyJump configuration
#### Requirements
```bash
# Debian/Ubuntu
apt-get install python3-libvirt
# RHEL/Fedora/Rocky
dnf install python3-libvirt
```
#### Configuration
Set environment variables or use configuration file:
```bash
# Environment variables
export LIBVIRT_DEFAULT_URI="qemu+ssh://grok@grok.home.serneels.xyz/system"
export LIBVIRT_HYPERVISOR_NAME="grokbox"
```
Or use YAML configuration file: `inventories/development/libvirt_kvm.yml`
#### Usage
```bash
# List all VMs
python3 plugins/inventory/libvirt_kvm.py --list
# Get specific VM details
python3 plugins/inventory/libvirt_kvm.py --host mymx
# Use with ansible
ansible running_vms -i plugins/inventory/libvirt_kvm.py -m ping
# Use with playbooks
ansible-playbook -i plugins/inventory/libvirt_kvm.py playbooks/update.yml
```
#### Generated Groups
| Group | Description |
|-------|-------------|
| `hypervisors` | KVM hypervisor hosts |
| `kvm_guests` | All guest VMs |
| `running_vms` | VMs in running state |
| `stopped_vms` | VMs not running (shutoff, paused, etc.) |
#### Host Variables
Each VM includes:
- `vm_name`: VM hostname
- `vm_uuid`: Libvirt UUID
- `vm_state`: Current state (running, shutoff, etc.)
- `vm_vcpus`: Number of virtual CPUs
- `vm_memory_mb`: Memory allocation in MB
- `vm_networks`: Network interface details
- `ansible_host`: IP address (if available)
- `ansible_ssh_common_args`: ProxyJump configuration
- `hypervisor`: Parent hypervisor name
#### Example Output
```json
{
"running_vms": {
"hosts": ["mymx", "pihole", "derp"]
},
"_meta": {
"hostvars": {
"pihole": {
"vm_name": "pihole",
"vm_uuid": "6d714c93-16fb-41c8-8ef8-9001f9066b3a",
"vm_state": "running",
"vm_vcpus": 2,
"vm_memory_mb": 2048,
"ansible_host": "192.168.122.12",
"ansible_ssh_common_args": "-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new",
"hypervisor": "grokbox"
}
}
}
}
```
---
## Static/Hybrid Inventory
**Location:** `/opt/ansible/inventories/development/hosts.yml`
### Description
Manually maintained static inventory for the development environment with detailed host metadata and configuration.
### Structure
```yaml
all:
children:
external_hosts: # Public-facing hosts
hypervisors: # KVM hypervisor hosts
kvm_guests: # Virtual machine guests
children:
dns_servers:
mail_servers:
development:
uncategorized:
```
### Group Variables
Variables are defined in `group_vars/` directory:
- **`all.yml`**: Global variables for all hosts
- **`kvm_guests.yml`**: Common VM configuration (LVM, networking, ProxyJump)
- **`hypervisors.yml`**: Hypervisor-specific settings (libvirt, QEMU)
### Host Variables
Host-specific variables can be placed in `host_vars/` directory:
```
host_vars/
├── pihole.yml
├── mymx.yml
└── derp.yml
```
### Usage
```bash
# List hosts
ansible all -i inventories/development/hosts.yml --list-hosts
# Run playbook
ansible-playbook -i inventories/development/hosts.yml site.yml
# Target specific group
ansible dns_servers -i inventories/development/hosts.yml -m ping
```
---
## Usage Examples
### Example 1: List All Hosts (Dynamic)
```bash
# Using SSH config parser
ansible all -i plugins/inventory/ssh_config_inventory.py --list-hosts
# Using libvirt inventory
ansible all -i plugins/inventory/libvirt_kvm.py --list-hosts
```
### Example 2: Ping All Running VMs
```bash
ansible running_vms -i plugins/inventory/libvirt_kvm.py -m ping
```
### Example 3: Run Playbook Against KVM Guests
```bash
ansible-playbook -i inventories/development/hosts.yml \
--limit kvm_guests \
playbooks/system-update.yml
```
### Example 4: Check Host Variables
```bash
# Using dynamic inventory
ansible-inventory -i plugins/inventory/libvirt_kvm.py --host pihole
# Using static inventory
ansible-inventory -i inventories/development/hosts.yml --host pihole --yaml
```
### Example 5: Multiple Inventory Sources
You can combine multiple inventory sources:
```bash
ansible-playbook -i inventories/development/hosts.yml \
-i plugins/inventory/libvirt_kvm.py \
site.yml
```
### Example 6: Filter by Group
```bash
# Target only mail servers
ansible mail_servers -i plugins/inventory/ssh_config_inventory.py -m setup
# Target only hypervisors
ansible hypervisors -i inventories/development/hosts.yml -m shell -a "virsh list --all"
```
---
## Ansible Configuration
Configure default inventory in `ansible.cfg`:
```ini
[defaults]
inventory = ./inventories/development/hosts.yml
# Or use dynamic:
# inventory = ./plugins/inventory/libvirt_kvm.py
# Enable multiple inventory sources
# inventory = ./inventories/development/hosts.yml,./plugins/inventory/libvirt_kvm.py
# Inventory plugins path
inventory_plugins = ./plugins/inventory
# Enable fact caching for performance
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
```
---
## Troubleshooting
### SSH Config Parser Issues
**Problem:** Hosts not appearing in inventory
**Solution:**
- Check `~/.ssh/config` exists and is readable
- Verify Host declarations are properly formatted
- Run with `--list` to see parsed output
- Check for Python syntax errors
**Problem:** Incorrect host categorization
**Solution:**
- Review categorization logic in `_categorize_host()` method
- Add custom categorization rules
- Use static inventory for specific grouping needs
### Libvirt Inventory Issues
**Problem:** `python3-libvirt` not installed
**Solution:**
```bash
# Debian/Ubuntu
sudo apt-get install python3-libvirt
# RHEL/Rocky/Fedora
sudo dnf install python3-libvirt
```
**Problem:** Connection to hypervisor fails
**Solution:**
- Verify SSH access to hypervisor: `ssh grok@grok.home.serneels.xyz`
- Check libvirt URI: `virsh -c qemu+ssh://grok@grok.home.serneels.xyz/system list`
- Ensure SSH keys are properly configured
- Check SSH agent forwarding if needed
**Problem:** VMs discovered but no IP addresses
**Solution:**
- VMs may not have DHCP leases yet
- Check VM is fully booted: `virsh dominfo <vm_name>`
- Manually query: `virsh domifaddr <vm_name>`
- Use static inventory with known IP addresses
### Static Inventory Issues
**Problem:** YAML syntax errors
**Solution:**
- Validate YAML syntax: `yamllint inventories/development/hosts.yml`
- Check indentation (use 2 spaces)
- Verify with: `ansible-inventory -i inventories/development/hosts.yml --list`
**Problem:** Variables not being applied
**Solution:**
- Check variable precedence order
- Verify `group_vars/` and `host_vars/` file names match group/host names
- Use `ansible-inventory --host <hostname>` to debug variable merging
### General Debugging
```bash
# Verify inventory parsing
ansible-inventory -i <inventory_source> --list
# Check specific host variables
ansible-inventory -i <inventory_source> --host <hostname>
# Graph inventory structure
ansible-inventory -i <inventory_source> --graph
# Test connectivity
ansible all -i <inventory_source> -m ping -vvv
# Dry run playbook
ansible-playbook -i <inventory_source> site.yml --check --diff
```
---
## Security Considerations
### SSH Config Parser
- ✅ No credentials stored in inventory
- ✅ Uses existing SSH configuration
- ⚠️ Ensure `~/.ssh/config` has proper permissions (600)
### Libvirt Inventory
- ✅ Uses SSH key authentication
- ✅ No passwords in configuration
- ⚠️ Requires SSH access to hypervisor
- ⚠️ Libvirt connection string may be logged
### Static Inventory
- ✅ Version controlled and auditable
- ⚠️ Use Ansible Vault for sensitive variables
- ⚠️ Never commit unencrypted credentials
### Best Practices
- Use Ansible Vault for secrets: `ansible-vault encrypt group_vars/all/vault.yml`
- Rotate SSH keys regularly (90-180 days per CLAUDE.md)
- Use ProxyJump/bastion hosts for nested VM access
- Enable SSH ControlMaster for connection reuse
- Implement inventory caching for large infrastructures
---
## Performance Optimization
### Caching
Enable fact caching in `ansible.cfg`:
```ini
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
```
### Parallelism
Adjust fork count:
```ini
[defaults]
forks = 20
```
### SSH Connection Reuse
Configure ControlMaster in `~/.ssh/config`:
```
Host *
ControlMaster auto
ControlPath ~/.ssh/sockets/%r@%h-%p
ControlPersist 600s
```
---
## References
- [Ansible Dynamic Inventory](https://docs.ansible.com/ansible/latest/user_guide/intro_dynamic_inventory.html)
- [Libvirt Python API](https://libvirt.org/python.html)
- [SSH Config Documentation](https://man.openbsd.org/ssh_config)
- [CLAUDE.md Guidelines](/opt/ansible/CLAUDE.md)
---
**Document Version:** 1.0.0
**Last Updated:** 2025-11-10
**Maintainer:** Ansible Infrastructure Team

944
docs/linux-vm-deployment.md Normal file
View File

@@ -0,0 +1,944 @@
# Multi-Distribution Linux VM Deployment Documentation
## Overview
This document describes the automated deployment process for multiple Linux distributions on KVM/libvirt hypervisors. The deployment supports major server distributions including Debian, Ubuntu, RHEL, CentOS Stream, Rocky Linux, AlmaLinux, SLES, and openSUSE Leap.
## Table of Contents
1. [Supported Distributions](#supported-distributions)
2. [Architecture](#architecture)
3. [Prerequisites](#prerequisites)
4. [Cloud Image Sources](#cloud-image-sources)
5. [Deployment Process](#deployment-process)
6. [Distribution-Specific Configuration](#distribution-specific-configuration)
7. [Security Features](#security-features)
8. [Post-Deployment](#post-deployment)
9. [Troubleshooting](#troubleshooting)
10. [Best Practices](#best-practices)
## Supported Distributions
### Debian Family
| Distribution | Version | Package Manager | Firewall | Cloud Image |
|--------------|---------|----------------|----------|-------------|
| Debian | 11 (Bullseye) | apt | ufw | ✅ Auto-download |
| Debian | 12 (Bookworm) | apt | ufw | ✅ Auto-download |
| Ubuntu | 20.04 LTS (Focal) | apt | ufw | ✅ Auto-download |
| Ubuntu | 22.04 LTS (Jammy) | apt | ufw | ✅ Auto-download |
| Ubuntu | 24.04 LTS (Noble) | apt | ufw | ✅ Auto-download |
### RHEL Family
| Distribution | Version | Package Manager | Firewall | SELinux | Cloud Image |
|--------------|---------|----------------|----------|---------|-------------|
| RHEL | 8, 9 | dnf | firewalld | Enforcing | ⚠️ Manual download |
| CentOS Stream | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download |
| Rocky Linux | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download |
| AlmaLinux | 8, 9 | dnf | firewalld | Enforcing | ✅ Auto-download |
### SUSE Family
| Distribution | Version | Package Manager | Firewall | Cloud Image |
|--------------|---------|----------------|----------|-------------|
| SLES | 15 | zypper | firewalld | ⚠️ Manual download |
| openSUSE Leap | 15.5, 15.6 | zypper | firewalld | ✅ Auto-download |
**Legend:**
- ✅ = Automatically downloaded from official repositories
- ⚠️ = Requires subscription and manual download
## Architecture
### Multi-Distribution Support Design
```
┌─────────────────────────────────────────────────────────┐
│ Ansible Control Node │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ deploy-linux-vm.yml │ │
│ │ │ │
│ │ • Distribution Selection Logic │ │
│ │ • Cloud Image Repository Map │ │
│ │ • OS Family Detection │ │
│ │ • Package Manager Adaptation │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ SSH
┌─────────────────────────────────────────────────────────┐
│ KVM Hypervisor (grokbox) │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Cloud Image Cache │ │
│ │ /var/lib/libvirt/images/ │ │
│ │ ├─ debian-12-*.qcow2 │ │
│ │ ├─ ubuntu-22.04-*.img │ │
│ │ ├─ centos-stream-9-*.qcow2 │ │
│ │ ├─ rocky-9-*.qcow2 │ │
│ │ └─ almalinux-9-*.qcow2 │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ libvirt/QEMU │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Debian VM │ │ Ubuntu VM │ │ │
│ │ │ ufw enabled │ │ ufw enabled │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Rocky VM │ │ Alma VM │ │ │
│ │ │ SELinux=Enf │ │ SELinux=Enf │ │ │
│ │ │ firewalld │ │ firewalld │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### Deployment Workflow
```
[Start]
[Validate Distribution Selection]
├─ Check distribution in supported list
├─ Set distribution facts (family, package manager, etc.)
└─ Display deployment configuration
[Pre-flight Checks]
├─ Verify VM doesn't already exist
├─ Validate virtualization support
└─ Install required packages on hypervisor
[Download Cloud Image]
├─ Check if image cached
├─ Download from official repository
└─ Verify checksum (SHA256/SHA512)
[Create VM Storage]
├─ Create qcow2 disk (CoW from base image)
└─ Set proper permissions (libvirt-qemu/qemu)
[Generate Cloud-Init Configuration]
├─ Select template based on OS family:
│ ├─ Debian/Ubuntu → apt, ufw, unattended-upgrades
│ ├─ RHEL family → dnf, firewalld, SELinux, dnf-automatic
│ └─ SUSE family → zypper, firewalld
├─ Create meta-data (hostname, instance-id)
├─ Create user-data (users, packages, security)
└─ Generate cloud-init ISO
[Deploy VM]
├─ Run virt-install with appropriate os-variant
├─ Attach disk and cloud-init ISO
└─ Start VM
[Wait for Boot]
├─ VM boots from qcow2 disk
├─ Cloud-init runs configuration
├─ Network configured via DHCP
└─ Get IP address from libvirt
[Validation]
├─ Test SSH connectivity
├─ Verify cloud-init completion
├─ Display VM information
└─ System health checks
[Cleanup]
└─ Remove temporary files
[Complete]
```
## Prerequisites
### Hypervisor Requirements
**Hardware:**
- CPU with virtualization extensions (Intel VT-x or AMD-V)
- Sufficient RAM for host + guest VMs
- Adequate storage for cloud images and VM disks
**Software:**
**For Debian/Ubuntu hypervisors:**
```bash
apt install -y \
libvirt-daemon-system \
libvirt-clients \
virtinst \
qemu-kvm \
qemu-utils \
cloud-image-utils \
genisoimage \
python3-libvirt
```
**For RHEL/CentOS/Rocky/Alma hypervisors:**
```bash
dnf install -y \
libvirt \
libvirt-client \
virt-install \
qemu-kvm \
qemu-img \
cloud-utils \
genisoimage \
python3-libvirt
```
**Services:**
```bash
systemctl enable --now libvirtd
```
### Network Requirements
- Internet connectivity for cloud image downloads
- DNS resolution working
- libvirt default network active:
```bash
virsh net-list
virsh net-start default # if not started
virsh net-autostart default
```
### Ansible Control Node
- Ansible 2.9 or newer
- SSH access to hypervisor
- Python 3.x installed
## Cloud Image Sources
### Official Repositories
**Debian:**
- URL: https://cloud.debian.org/images/cloud/
- Format: qcow2
- Checksum: SHA512SUMS provided
- Update Frequency: Regular (stable releases)
**Ubuntu:**
- URL: https://cloud-images.ubuntu.com/
- Format: img (qcow2 compatible)
- Checksum: SHA256SUMS provided
- Update Frequency: Daily builds available
**CentOS Stream:**
- URL: https://cloud.centos.org/centos/
- Format: qcow2
- Checksum: SHA256 CHECKSUM file
- Update Frequency: Regular updates
**Rocky Linux:**
- URL: https://download.rockylinux.org/pub/rocky/
- Format: qcow2
- Checksum: SHA256 CHECKSUM file
- Update Frequency: Regular with point releases
**AlmaLinux:**
- URL: https://repo.almalinux.org/almalinux/
- Format: qcow2
- Checksum: SHA256 CHECKSUM file
- Update Frequency: Regular with point releases
**openSUSE Leap:**
- URL: https://download.opensuse.org/distribution/leap/
- Format: qcow2
- Checksum: SHA256 per-file
- Update Frequency: Per release cycle
### Manual Download Required
**Red Hat Enterprise Linux (RHEL):**
- Requires: Red Hat subscription
- Portal: https://access.redhat.com/downloads/
- Steps:
1. Log in to Red Hat Customer Portal
2. Navigate to Downloads
3. Select "Red Hat Enterprise Linux"
4. Download KVM Guest Image
5. Place at: `/var/lib/libvirt/images/rhel-X-x86_64-kvm.qcow2`
**SUSE Linux Enterprise Server (SLES):**
- Requires: SUSE subscription
- Portal: https://scc.suse.com/
- Steps:
1. Log in to SUSE Customer Center
2. Download cloud image for SLES 15
3. Place at: `/var/lib/libvirt/images/sles-15-genericcloud-amd64.qcow2`
## Deployment Process
### Basic Deployment
```bash
ansible-playbook plays/deploy-linux-vm.yml \
-e "os_distribution=<distro-version>" \
-e "vm_name=<name>"
```
### Distribution Selection
The `os_distribution` variable determines which Linux distribution to deploy. Format: `distro-version`
**Examples:**
```bash
# Debian
-e "os_distribution=debian-12"
# Ubuntu
-e "os_distribution=ubuntu-22.04"
# CentOS Stream
-e "os_distribution=centos-stream-9"
# Rocky Linux
-e "os_distribution=rocky-9"
# AlmaLinux
-e "os_distribution=almalinux-9"
# openSUSE
-e "os_distribution=opensuse-leap-15.6"
```
### Resource Customization
```bash
ansible-playbook plays/deploy-linux-vm.yml \
-e "os_distribution=rocky-9" \
-e "vm_name=database-server" \
-e "vm_hostname=db01" \
-e "vm_domain=production.local" \
-e "vm_vcpus=8" \
-e "vm_memory_mb=16384" \
-e "vm_disk_size_gb=200"
```
### Configuration Variables
| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `os_distribution` | **Required** | debian-12 | Distribution identifier |
| `vm_name` | String | linux-guest | VM name in libvirt |
| `vm_hostname` | String | linux-vm | Guest hostname |
| `vm_domain` | String | localdomain | DNS domain |
| `vm_vcpus` | Integer | 2 | Number of virtual CPUs |
| `vm_memory_mb` | Integer | 2048 | RAM in megabytes |
| `vm_disk_size_gb` | Integer | 20 | Disk size in gigabytes |
| `vm_network` | String | default | Libvirt network name |
| `vm_bridge` | String | virbr0 | Bridge interface |
| `ansible_user_ssh_key` | String | (preset) | SSH public key for ansible user |
## Distribution-Specific Configuration
### Debian/Ubuntu Systems
**Package Manager:** apt
**Cloud-Init Packages:**
- sudo, vim, htop, tmux, curl, wget, rsync, git
- python3, python3-pip, jq, bc
- aide, auditd, chrony, ufw, lvm2
- cloud-guest-utils, parted
- unattended-upgrades, apt-listchanges
**Security Configuration:**
- **Firewall:** ufw enabled, SSH allowed
- **SSH:** Root login disabled, key-only auth
- **Updates:** unattended-upgrades for security updates
- **Audit:** auditd enabled
- **Time Sync:** chrony configured
**User Management:**
- ansible user → member of `sudo` group
- Passwordless sudo access
**Automatic Updates Configuration:**
```
/etc/apt/apt.conf.d/50unattended-upgrades
/etc/apt/apt.conf.d/20auto-upgrades
```
**Post-Boot Commands:**
```bash
systemctl enable ssh && systemctl restart ssh
systemctl enable chrony && systemctl start chrony
ufw --force enable && ufw allow ssh
systemctl enable auditd && systemctl start auditd
growpart /dev/vda 1 && resize2fs /dev/vda1
```
### RHEL Family Systems
**Package Manager:** dnf
**Cloud-Init Packages:**
- sudo, vim, htop, tmux, curl, wget, rsync, git
- python3, python3-pip, jq, bc
- aide, audit, chrony, firewalld, lvm2
- cloud-utils-growpart, gdisk
- dnf-automatic
- policycoreutils-python-utils
**Security Configuration:**
- **Firewall:** firewalld enabled, SSH service allowed
- **SELinux:** Enforcing mode
- **SSH:** Root login disabled, key-only auth
- **Updates:** dnf-automatic for security updates
- **Audit:** auditd enabled
- **Time Sync:** chronyd configured
**User Management:**
- ansible user → member of `wheel` group
- Passwordless sudo access
**SELinux Configuration:**
```bash
setenforce 1
sed -i 's/^SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config
```
**Automatic Updates Configuration:**
```
/etc/dnf/automatic.conf
upgrade_type = security
apply_updates = yes
```
**Post-Boot Commands:**
```bash
systemctl enable sshd && systemctl restart sshd
systemctl enable chronyd && systemctl start chronyd
systemctl enable firewalld && systemctl start firewalld
firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload
systemctl enable auditd && systemctl start auditd
systemctl enable dnf-automatic.timer && systemctl start dnf-automatic.timer
setenforce 1
growpart /dev/vda 1 && xfs_growfs /
```
### SUSE Family Systems
**Package Manager:** zypper
**Cloud-Init Packages:**
- sudo, vim, htop, tmux, curl, wget, rsync, git
- python3, python3-pip, jq, bc
- aide, audit, chrony, firewalld, lvm2
- cloud-utils-growpart, gdisk
**Security Configuration:**
- **Firewall:** firewalld enabled, SSH service allowed
- **SSH:** Root login disabled, key-only auth
- **Audit:** auditd enabled
- **Time Sync:** chronyd configured
**User Management:**
- ansible user → member of `wheel` group
- Passwordless sudo access
**Post-Boot Commands:**
```bash
systemctl enable sshd && systemctl restart sshd
systemctl enable chronyd && systemctl start chronyd
systemctl enable firewalld && systemctl start firewalld
firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload
systemctl enable auditd && systemctl start auditd
growpart /dev/vda 1 && xfs_growfs / || resize2fs /dev/vda1 || btrfs filesystem resize max /
```
## Security Features
### Universal Security Measures
All deployed VMs, regardless of distribution, include:
1. **User Security:**
- Dedicated `ansible` service account
- SSH key-based authentication only
- Passwordless sudo (with logging)
- Root SSH login disabled
- Emergency console access available (password: ChangeMe123!)
2. **Network Security:**
- Host-based firewall enabled and configured
- SSH service allowed
- Default deny policy for incoming traffic
- Outgoing traffic allowed
3. **System Security:**
- Audit daemon (auditd) enabled
- Automatic security updates configured
- Time synchronization enabled (chrony)
- File integrity monitoring installed (AIDE)
- Secure SSH configuration applied
4. **SSH Hardening:**
```
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
MaxSessions 10
ClientAliveInterval 300
ClientAliveCountMax 2
```
### Distribution-Specific Security
**RHEL Family:**
- SELinux in enforcing mode
- firewalld with rich rules support
- dnf-automatic for security updates
- Subscription management for certified packages (RHEL)
**Debian/Ubuntu:**
- AppArmor profiles (Ubuntu)
- UFW for simplified firewall management
- unattended-upgrades for security updates
- Automatic security patch installation
**SUSE Family:**
- AppArmor support
- firewalld with zones
- YaST integration for security management
### Compliance Alignment
The deployment follows CLAUDE.md security principles:
✅ Principle of least privilege
✅ Encryption in transit (SSH)
✅ Key-based authentication
✅ Automated security updates
✅ System auditing enabled
✅ Time synchronization
✅ Firewall enabled by default
✅ Regular security patching
## Post-Deployment
### Adding to Ansible Inventory
**Debian/Ubuntu VM:**
```yaml
debian_servers:
hosts:
debian12-vm:
ansible_host: 192.168.122.X
ansible_user: ansible
ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new'
os_distribution: debian-12
os_family: debian
package_manager: apt
```
**RHEL Family VM:**
```yaml
rhel_servers:
hosts:
rocky9-vm:
ansible_host: 192.168.122.X
ansible_user: ansible
ansible_ssh_common_args: '-o ProxyJump=grokbox -o StrictHostKeyChecking=accept-new'
os_distribution: rocky-9
os_family: rhel
package_manager: dnf
selinux_mode: enforcing
```
### Initial Configuration
After deployment, run configuration management:
```bash
# Update system packages
ansible <vm_name> -m package -a "name=* state=latest" -b
# Install additional packages
ansible <vm_name> -m package -a "name=nginx state=present" -b
# Run configuration playbooks
ansible-playbook -i inventories/development/hosts.yml \
playbooks/configure-webserver.yml \
-l <vm_name>
```
### Verification Steps
1. **SSH Access:**
```bash
ssh -J grokbox ansible@<VM_IP>
```
2. **Cloud-Init Status:**
```bash
cloud-init status --wait
cloud-init status --long
```
3. **System Information:**
```bash
cat /etc/os-release
uname -r
```
4. **Security Checks:**
```bash
# Firewall
sudo ufw status verbose # Debian/Ubuntu
sudo firewall-cmd --list-all # RHEL/SUSE
# SELinux (RHEL family)
getenforce
# Audit daemon
sudo systemctl status auditd
# Automatic updates
sudo systemctl status unattended-upgrades # Debian/Ubuntu
sudo systemctl status dnf-automatic.timer # RHEL
```
5. **Disk and Memory:**
```bash
df -h
free -h
lsblk
```
## Troubleshooting
### Distribution Selection Issues
**Problem:** Invalid distribution error
**Solution:**
```bash
# List supported distributions
grep "^ " plays/deploy-linux-vm.yml | grep -E "debian-|ubuntu-|rhel-|centos-|rocky-|alma|sles-|opensuse"
# Use exact distribution identifier
-e "os_distribution=debian-12" # Correct
-e "os_distribution=debian" # Wrong
```
### Cloud Image Download Failures
**Problem:** Image download fails or times out
**Causes:**
- Network connectivity issues
- Repository temporarily unavailable
- Proxy configuration needed
**Solutions:**
```bash
# Test connectivity
curl -I https://cloud.debian.org/
curl -I https://cloud-images.ubuntu.com/
# Manual download
cd /var/lib/libvirt/images
wget <cloud_image_url>
# Configure proxy (if needed)
export https_proxy=http://proxy:port
```
### Checksum Verification Failures
**Problem:** Checksum verification fails
**Causes:**
- Corrupt download
- Mismatch between image and checksum file
- Wrong checksum type
**Solutions:**
```bash
# Re-download image
rm /var/lib/libvirt/images/<image-name>
ansible-playbook plays/deploy-linux-vm.yml -e "os_distribution=..." -t download
# Verify manually
cd /var/lib/libvirt/images
sha256sum <image-name>
# Compare with checksum file
```
### VM Boot Issues
**Problem:** VM created but won't boot or get IP
**Causes:**
- Cloud-init configuration error
- Network misconfiguration
- Insufficient resources
**Solutions:**
```bash
# Check VM status
virsh list --all
virsh dominfo <vm_name>
# View console
virsh console <vm_name>
# Check cloud-init logs (via console)
tail -f /var/log/cloud-init-output.log
journalctl -u cloud-init
# Restart VM
virsh destroy <vm_name>
virsh start <vm_name>
```
### SSH Connection Issues
**Problem:** Cannot SSH to deployed VM
**Causes:**
- SSH key not configured correctly
- Firewall blocking
- cloud-init not completed
- Wrong IP address
**Solutions:**
```bash
# Verify IP address
virsh domifaddr <vm_name>
# Test connectivity
ping <VM_IP>
# Check SSH service via console
virsh console <vm_name>
# Then: systemctl status ssh|sshd
# Verify firewall
# Via console:
sudo ufw status # Debian/Ubuntu
sudo firewall-cmd --list-all # RHEL/SUSE
# Check cloud-init completion
# Via console:
cloud-init status --wait
```
### SELinux Issues (RHEL Family)
**Problem:** Services failing due to SELinux denials
**Solutions:**
```bash
# Check SELinux status
getenforce
sestatus
# View denials
sudo ausearch -m avc -ts recent
# Temporarily set to permissive (troubleshooting only)
sudo setenforce 0
# Generate policy from denials
sudo ausearch -m avc -ts recent | audit2allow -M myapp
sudo semodule -i myapp.pp
# Re-enable enforcing
sudo setenforce 1
```
### Package Manager Issues
**Debian/Ubuntu:**
```bash
# Update package cache
sudo apt update
# Fix broken packages
sudo apt --fix-broken install
# Clear cache
sudo apt clean
```
**RHEL Family:**
```bash
# Update metadata
sudo dnf makecache
# Check for problems
sudo dnf check
# Clean cache
sudo dnf clean all
```
**SUSE:**
```bash
# Refresh repositories
sudo zypper refresh
# Verify
sudo zypper verify
# Clean cache
sudo zypper clean
```
## Best Practices
### Distribution Selection
1. **Use LTS versions for production:**
- Ubuntu 22.04 LTS (support until 2027)
- Ubuntu 24.04 LTS (support until 2029)
- RHEL/Rocky/Alma 9 (support until 2032)
2. **Match distribution to workload:**
- Web servers: Ubuntu, Debian
- Enterprise applications: RHEL, Rocky Linux, AlmaLinux
- Container hosts: CentOS Stream, Rocky Linux
- Development: Ubuntu, Debian, openSUSE
3. **Consider support requirements:**
- Commercial support: RHEL, SLES
- Community support: CentOS Stream, Rocky Linux, AlmaLinux, Debian, Ubuntu, openSUSE
### Resource Allocation
**Minimum Requirements:**
- 1 vCPU, 1GB RAM, 10GB disk (testing only)
**Recommended for Production:**
- 2+ vCPUs, 2GB+ RAM, 20GB+ disk
**Workload-Specific:**
```
Web Server: 2-4 vCPUs, 4GB RAM, 40GB disk
Database Server: 4-8 vCPUs, 16GB RAM, 100GB+ disk
Application Server: 4-8 vCPUs, 8GB RAM, 80GB disk
Container Host: 4-8 vCPUs, 16GB RAM, 80GB disk
Development: 2-4 vCPUs, 8GB RAM, 50GB disk
```
### Security Hardening
1. **Change default passwords immediately:**
```bash
sudo passwd root # Change from ChangeMe123!
```
2. **Configure proper SSH keys:**
- Use dedicated key per environment
- Rotate keys regularly (90-180 days)
- Use Ed25519 keys when possible
3. **Enable additional security features:**
- CIS benchmarks scanning
- Intrusion detection (fail2ban, OSSEC)
- Log forwarding to SIEM
- Vulnerability scanning
4. **Regular updates:**
- Monitor automatic update logs
- Schedule manual updates for major versions
- Test updates in staging first
### Operational Excellence
1. **Naming Conventions:**
- Use descriptive, meaningful VM names
- Include purpose and environment: `web-prod-01`, `db-dev-01`
- Document naming scheme
2. **Inventory Management:**
- Keep Ansible inventory up-to-date
- Document VM purpose and owner
- Track VM lifecycle
3. **Monitoring:**
- Set up monitoring for all VMs
- Configure alerting for critical issues
- Monitor resource usage trends
4. **Backup Strategy:**
- Regular VM backups or disk snapshots
- Test restore procedures
- Document backup retention policy
5. **Documentation:**
- Document VM purpose and configuration
- Maintain runbooks for common tasks
- Keep network diagrams current
### Performance Optimization
1. **Disk I/O:**
- Use virtio drivers (already configured)
- Consider separate disk for databases
- Use appropriate filesystem (xfs for RHEL, ext4 for Debian)
2. **Network:**
- Use virtio network driver (already configured)
- Consider SR-IOV for high-performance needs
- Monitor network latency
3. **CPU:**
- Right-size vCPU allocation
- Avoid overcommitment on critical VMs
- Use CPU pinning for performance-critical workloads
4. **Memory:**
- Allocate sufficient RAM to avoid swapping
- Monitor memory usage
- Consider huge pages for databases
## References
- [CLAUDE.md](../CLAUDE.md) - Infrastructure guidelines
- [Cheatsheet](../cheatsheets/deploy-linux-vm.md) - Quick reference
- [Debian Cloud Images](https://cloud.debian.org/images/cloud/)
- [Ubuntu Cloud Images](https://cloud-images.ubuntu.com/)
- [CentOS Stream](https://www.centos.org/centos-stream/)
- [Rocky Linux](https://rockylinux.org/)
- [AlmaLinux](https://almalinux.org/)
- [openSUSE](https://www.opensuse.org/)
- [cloud-init Documentation](https://cloudinit.readthedocs.io/)
- [libvirt Documentation](https://libvirt.org/docs.html)
---
**Document Version**: 1.0
**Last Updated**: 2025-11-10
**Maintained By**: Ansible Infrastructure Team