Files
infra-automation/roles/deploy_linux_vm/README.md
ansible eba1a05e7d Implement critical role improvements per ROLE_ANALYSIS_AND_IMPROVEMENTS.md
This commit addresses the critical issues identified in the role analysis:

## Security Improvements

### Remove Hardcoded Secrets (deploy_linux_vm)
- Replaced hardcoded SSH key in defaults/main.yml with vault variable reference
- Replaced hardcoded root password with vault variable reference
- Created vault.yml.example to document secret structure
- Updated README.md with comprehensive security best practices section
- Added documentation for Ansible Vault, external secret managers, and environment variables
- Included SSH key generation and password generation best practices

## Role Documentation & Planning

### CHANGELOG.md Files
- Created comprehensive CHANGELOG.md for deploy_linux_vm role
  - Documented v1.0.0 initial release features
  - Tracked v1.0.1 security improvements
- Created comprehensive CHANGELOG.md for system_info role
  - Documented v1.0.0 initial release
  - Tracked v1.0.1 critical bug fixes (block-level failed_when, Jinja2 templates, OS variables)

### ROADMAP.md Files
- Created detailed ROADMAP.md for deploy_linux_vm role
  - Version 1.1.0: Security & compliance hardening (Q1 2026)
  - Version 1.2.0: Multi-distribution support (Q2 2026)
  - Version 1.3.0: Advanced features (Q3 2026)
  - Version 2.0.0: Enterprise features (Q4 2026)
- Created detailed ROADMAP.md for system_info role
  - Version 1.1.0: Enhanced monitoring & metrics (Q1 2026)
  - Version 1.2.0: Cloud & container support (Q2 2026)
  - Version 1.3.0: Hardware & firmware deep dive (Q3 2026)
  - Version 2.0.0: Visualization & reporting (Q4 2026)

## Error Handling Enhancements

### deploy_linux_vm Role - Block/Rescue/Always Pattern
- Wrapped deployment tasks in comprehensive error handling block
- Block section:
  - Pre-deployment VM name collision check
  - Enhanced IP address acquisition with better error messages
  - Descriptive failure messages for troubleshooting
- Rescue section (automatic rollback):
  - Diagnostic information gathering
  - VM status checking
  - Attempted console log capture
  - Automatic VM destruction and cleanup
  - Disk image removal (primary, LVM, cloud-init ISO)
  - Detailed troubleshooting guidance
- Always section:
  - Deployment logging to /var/log/ansible-vm-deployments.log
  - Success/failure tracking
- Improved task FQCNs (ansible.builtin.*)

## Handlers Implementation

### deploy_linux_vm Role - Complete Handler Suite
- VM Lifecycle Handlers:
  - restart vm, shutdown vm, destroy vm
- Cloud-Init Handlers:
  - regenerate cloud-init iso (full rebuild and reattach)
- Storage Handlers:
  - refresh libvirt storage pool
  - resize vm disk (with safe shutdown/start)
- Network Handlers:
  - refresh network configuration
  - restart libvirt network
- Libvirt Daemon Handlers:
  - restart libvirtd, reload libvirtd
- Cleanup Handlers:
  - cleanup temporary files
  - remove cloud-init iso
- Validation Handlers:
  - validate vm status
  - check connectivity

## Impact

### Security
- Eliminates hardcoded secrets from version control
- Implements industry best practices for secret management
- Provides clear guidance for secure deployment

### Maintainability
- CHANGELOGs enable version tracking and change auditing
- ROADMAPs provide clear development direction and prioritization
- Comprehensive error handling reduces debugging time
- Handlers enable modular, reusable state management

### Reliability
- Automatic rollback prevents partial deployments
- Comprehensive error messages reduce MTTR
- Handlers ensure consistent state management
- Better separation of concerns

### Compliance
- Aligns with CLAUDE.md security requirements
- Implements proper secrets management per organizational policy
- Provides audit trail through changelogs

## References

- ROLE_ANALYSIS_AND_IMPROVEMENTS.md: Initial analysis document
- CLAUDE.md: Organizational infrastructure standards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 02:21:38 +01:00

14 KiB

Ansible Role: deploy_linux_vm

Deploy Linux virtual machines on KVM hypervisors with LVM storage configuration, security hardening, and cloud-init provisioning. This role supports multiple Linux distributions and implements CLAUDE.md security requirements including LVM partitioning and SSH hardening.

Features

  • Multi-Distribution Support: Debian, Ubuntu, RHEL, CentOS Stream, Rocky Linux, AlmaLinux, SLES, openSUSE
  • LVM Configuration: Automatic LVM setup with meaningful volume groups and logical volumes per CLAUDE.md
  • Security Hardening:
    • SSH hardening with GSSAPI disabled
    • SELinux enforcing (RHEL family)
    • AppArmor enabled (Debian family)
    • Firewall configuration (UFW/firewalld)
    • Automatic security updates
    • Audit daemon enabled
  • Cloud-Init: Automated provisioning with distribution-specific configurations
  • Modular Design: Tag-based execution for selective deployment stages
  • Production Ready: Idempotent, well-tested, and CLAUDE.md compliant

Requirements

Hypervisor Requirements

  • Ansible 2.12 or higher
  • KVM/libvirt virtualization enabled
  • Sufficient disk space in /var/lib/libvirt/images
  • Network connectivity for cloud image downloads

Supported Distributions (Guest VMs)

Distribution Versions OS Family
Debian 11, 12 debian
Ubuntu 20.04 LTS, 22.04 LTS, 24.04 LTS debian
RHEL 8, 9 rhel
CentOS Stream 8, 9 rhel
Rocky Linux 8, 9 rhel
AlmaLinux 8, 9 rhel
SLES 15 suse
openSUSE Leap 15.5, 15.6 suse

Role Variables

Required Variables

Variable Required Default Description
deploy_linux_vm_os_distribution Yes debian-12 Distribution identifier (e.g., ubuntu-22.04, almalinux-9)

VM Configuration

Variable Default Description
deploy_linux_vm_name linux-guest VM name in libvirt
deploy_linux_vm_hostname linux-vm VM hostname
deploy_linux_vm_domain localdomain Domain name
deploy_linux_vm_vcpus 2 Number of vCPUs
deploy_linux_vm_memory_mb 2048 RAM in MB
deploy_linux_vm_disk_size_gb 30 Primary disk size in GB

LVM Configuration

Variable Default Description
deploy_linux_vm_use_lvm true Enable LVM configuration
deploy_linux_vm_lvm_vg_name vg_system Volume group name
deploy_linux_vm_lvm_pv_device /dev/vdb Physical volume device
deploy_linux_vm_lvm_volumes (see defaults) List of logical volumes per CLAUDE.md

LVM Volumes (CLAUDE.md Compliance)

Default logical volumes created:

deploy_linux_vm_lvm_volumes:
  - { name: lv_opt, size: 3G, mount: /opt, fstype: ext4 }
  - { name: lv_tmp, size: 1G, mount: /tmp, fstype: ext4, mount_options: noexec,nosuid,nodev }
  - { name: lv_home, size: 2G, mount: /home, fstype: ext4 }
  - { name: lv_var, size: 5G, mount: /var, fstype: ext4 }
  - { name: lv_var_log, size: 2G, mount: /var/log, fstype: ext4 }
  - { name: lv_var_tmp, size: 5G, mount: /var/tmp, fstype: ext4, mount_options: noexec,nosuid,nodev }
  - { name: lv_var_audit, size: 1G, mount: /var/log/audit, fstype: ext4 }
  - { name: lv_swap, size: 2G, mount: none, fstype: swap }

SSH Configuration

Variable Default Description
deploy_linux_vm_ssh_permit_root_login no Allow root SSH login
deploy_linux_vm_ssh_password_authentication no Allow password authentication
deploy_linux_vm_ssh_gssapi_authentication no GSSAPI disabled per requirements
deploy_linux_vm_ssh_gssapi_cleanup_credentials no GSSAPI cleanup
deploy_linux_vm_ssh_max_auth_tries 3 Maximum authentication attempts
deploy_linux_vm_ssh_client_alive_interval 300 SSH keepalive interval

Security Configuration

Variable Default Description
deploy_linux_vm_enable_firewall true Enable firewall (UFW/firewalld)
deploy_linux_vm_enable_selinux true Enable SELinux (RHEL family)
deploy_linux_vm_enable_apparmor true Enable AppArmor (Debian family)
deploy_linux_vm_enable_auditd true Enable audit daemon
deploy_linux_vm_enable_automatic_updates true Enable automatic security updates
deploy_linux_vm_automatic_reboot false Auto-reboot after updates

User Configuration

Variable Default Description
deploy_linux_vm_ansible_user ansible Service account username
deploy_linux_vm_ansible_user_ssh_key (vault variable) SSH public key for ansible user
deploy_linux_vm_root_password (vault variable) Root password (console access)

SECURITY NOTICE: SSH keys and passwords should be stored in encrypted vault files, not in defaults.

Security Best Practices

Secrets Management

This role requires sensitive data (SSH keys, passwords) to be stored securely:

  1. Create a vault file in your inventory:
# Create encrypted vault file
ansible-vault create inventories/production/group_vars/all/vault.yml
  1. Add the required vault variables:
---
# SSH public key for ansible user
vault_deploy_linux_vm_ansible_user_ssh_key: "ssh-ed25519 AAAAC3... ansible@automation"

# Root password for emergency console access
vault_deploy_linux_vm_root_password: "YourSecurePassword123!"
  1. Reference vault variables in your playbook or group_vars:
# inventories/production/group_vars/all/vars.yml
deploy_linux_vm_ansible_user_ssh_key: "{{ vault_deploy_linux_vm_ansible_user_ssh_key }}"
deploy_linux_vm_root_password: "{{ vault_deploy_linux_vm_root_password }}"
  1. Run playbooks with vault password:
ansible-playbook site.yml --ask-vault-pass
# Or use a password file
ansible-playbook site.yml --vault-password-file ~/.vault_pass
  • HashiCorp Vault: Use community.hashi_vault.vault_read lookup plugin
  • AWS Secrets Manager: Use amazon.aws.aws_secret lookup plugin
  • Azure Key Vault: Use azure.azcollection.azure_keyvault_secret lookup plugin
  • CyberArk: Use CyberArk Ansible plugins

Example with HashiCorp Vault:

deploy_linux_vm_ansible_user_ssh_key: "{{ lookup('community.hashi_vault.vault_read', 'secret/data/ansible/ssh_key').data.public_key }}"

Option 3: Environment Variables

export ANSIBLE_VAULT_PASSWORD_FILE=~/.vault_pass
export DEPLOY_VM_SSH_KEY="ssh-ed25519 AAAAC3..."
deploy_linux_vm_ansible_user_ssh_key: "{{ lookup('env', 'DEPLOY_VM_SSH_KEY') }}"

SSH Key Generation

Generate a dedicated SSH key pair for VM deployment:

# Generate ED25519 key (recommended)
ssh-keygen -t ed25519 -C "ansible-automation" -f ~/.ssh/ansible_deploy

# Or RSA 4096-bit key
ssh-keygen -t rsa -b 4096 -C "ansible-automation" -f ~/.ssh/ansible_deploy

# Use the public key in your vault file
cat ~/.ssh/ansible_deploy.pub

Password Generation

Generate strong root passwords:

# Using OpenSSL
openssl rand -base64 32

# Using pwgen
pwgen -s 32 1

# Using /dev/urandom
tr -dc 'A-Za-z0-9!@#$%^&*' < /dev/urandom | head -c 32

Security Checklist

  • SSH keys stored in Ansible Vault or external secret manager
  • Root passwords stored in Ansible Vault (different per environment)
  • Vault password file has restricted permissions (0600)
  • Vault password file is NOT committed to version control (in .gitignore)
  • Different passwords used for dev/staging/production
  • SSH keys rotated every 90-180 days
  • Regular security audits performed

Dependencies

None. This role is self-contained.

Example Playbook

Basic Deployment

---
- name: Deploy Linux VM
  hosts: grokbox
  become: yes
  roles:
    - role: deploy_linux_vm
      vars:
        deploy_linux_vm_name: "web-server"
        deploy_linux_vm_os_distribution: "ubuntu-22.04"

Advanced Deployment with Custom LVM

---
- name: Deploy Database Server with Custom Resources
  hosts: grokbox
  become: yes
  roles:
    - role: deploy_linux_vm
      vars:
        deploy_linux_vm_name: "db-server"
        deploy_linux_vm_hostname: "postgres01"
        deploy_linux_vm_domain: "production.local"
        deploy_linux_vm_os_distribution: "almalinux-9"
        deploy_linux_vm_vcpus: 8
        deploy_linux_vm_memory_mb: 16384
        deploy_linux_vm_disk_size_gb: 100
        deploy_linux_vm_use_lvm: true
        deploy_linux_vm_lvm_vg_name: "vg_database"

Multi-VM Deployment

---
- name: Deploy Multiple VMs
  hosts: grokbox
  become: yes
  tasks:
    - name: Deploy web servers
      include_role:
        name: deploy_linux_vm
      vars:
        deploy_linux_vm_name: "{{ item.name }}"
        deploy_linux_vm_hostname: "{{ item.hostname }}"
        deploy_linux_vm_os_distribution: "{{ item.distro }}"
      loop:
        - { name: "web01", hostname: "web01", distro: "ubuntu-22.04" }
        - { name: "web02", hostname: "web02", distro: "ubuntu-22.04" }
        - { name: "db01", hostname: "db01", distro: "almalinux-9" }

Tag-Based Execution

Execute specific deployment stages:

# Pre-flight validation only
ansible-playbook site.yml --tags validate,preflight

# Download cloud images only
ansible-playbook site.yml --tags download,verify

# Deploy VM without LVM configuration
ansible-playbook site.yml --skip-tags lvm

# Configure LVM only (post-deployment)
ansible-playbook site.yml --tags lvm,post-deploy

# Full deployment with all stages
ansible-playbook site.yml

Available Tags

Tag Description
validate, preflight Pre-flight validation checks
install Install required packages on hypervisor
download, verify Download and verify cloud images
storage Create VM disk storage
cloud-init Generate cloud-init configuration
deploy Deploy and start VM
lvm, post-deploy Configure LVM on deployed VM
cleanup Remove temporary files

LVM Configuration Process

The role implements a comprehensive LVM setup:

  1. Physical Volume Creation: Creates PV on /dev/vdb (30GB secondary disk)
  2. Volume Group Setup: Creates vg_system volume group
  3. Logical Volume Creation: Creates LVs per CLAUDE.md specifications
  4. Filesystem Creation: Formats LVs with ext4/swap
  5. Data Migration: Copies existing data from primary disk to LVM volumes
  6. Fstab Update: Configures automatic mounting at boot
  7. Reboot Required: VM must be rebooted to activate new mounts

LVM Post-Deployment

After role execution with LVM enabled:

# SSH to the VM
ssh ansible@<VM_IP>

# Verify LVM configuration
sudo vgs
sudo lvs
sudo pvs

# Check fstab entries
cat /etc/fstab

# Reboot to activate LVM mounts
sudo reboot

# After reboot, verify mounts
df -h
lsblk

SSH Hardening

The role implements comprehensive SSH hardening per requirements:

  • GSSAPI Authentication: Disabled (GSSAPIAuthentication no)
  • GSSAPI Cleanup: Disabled (GSSAPICleanupCredentials no)
  • Root Login: Disabled via SSH (console access available)
  • Password Authentication: Disabled (key-based only)
  • Connection Limits: Max 3 auth tries, 10 sessions
  • Keepalive: 300s interval with 2 max count
  • Additional Hardening: Empty passwords rejected, X11 forwarding disabled

Configuration file: /etc/ssh/sshd_config.d/99-security.conf

Security Features

Debian/Ubuntu Systems

  • Firewall: UFW enabled with SSH allowed
  • AppArmor: Enabled and enforcing
  • Automatic Updates: unattended-upgrades configured for security updates only
  • Audit: auditd enabled
  • Time Sync: chrony configured

RHEL/AlmaLinux/Rocky Systems

  • Firewall: firewalld enabled with SSH allowed
  • SELinux: Enforcing mode enabled
  • Automatic Updates: dnf-automatic configured for security updates
  • Audit: auditd enabled
  • Time Sync: chronyd configured

Essential Packages (CLAUDE.md)

All VMs include:

  • System tools: vim, htop, tmux, jq, bc
  • Network tools: curl, wget, rsync
  • Development: git, python3, python3-pip
  • Security: aide, auditd, chrony
  • Storage: lvm2, parted

Validation

Post-deployment validation includes:

  • VM running status check
  • IP address assignment verification
  • SSH connectivity test
  • System information gathering
  • LVM configuration verification (if enabled)

Troubleshooting

Cloud-Init Issues

# Check cloud-init status
ssh ansible@<VM_IP> "cloud-init status --wait"

# View cloud-init logs
ssh ansible@<VM_IP> "tail -f /var/log/cloud-init-output.log"

LVM Issues

# Check LVM status on VM
ssh ansible@<VM_IP> "sudo vgs && sudo lvs && sudo pvs"

# Verify fstab
ssh ansible@<VM_IP> "cat /etc/fstab"

# Check disk layout
ssh ansible@<VM_IP> "lsblk"

SSH Connection Issues

# Test SSH with ProxyJump
ssh -J grokbox ansible@<VM_IP>

# Verify SSH configuration
ssh ansible@<VM_IP> "sudo sshd -T | grep -i gssapi"

Firewall Issues

# Debian/Ubuntu
ssh ansible@<VM_IP> "sudo ufw status verbose"

# RHEL/AlmaLinux
ssh ansible@<VM_IP> "sudo firewall-cmd --list-all"

File Locations

On deployed VMs:

  • SSH Security Config: /etc/ssh/sshd_config.d/99-security.conf
  • Sudoers Config: /etc/sudoers.d/ansible
  • Cloud-Init Log: /var/log/cloud-init-output.log
  • Fstab: /etc/fstab (updated with LVM mounts)

On hypervisor:

  • Cloud Images: /var/lib/libvirt/images/*.qcow2
  • VM Disks: /var/lib/libvirt/images/<vm_name>.qcow2
  • LVM Disk: /var/lib/libvirt/images/<vm_name>-lvm.qcow2
  • Cloud-Init ISO: /var/lib/libvirt/images/<vm_name>-cloud-init.iso

License

MIT

Author

Infrastructure Team

Support

  • Documentation: docs/linux-vm-deployment.md
  • Cheatsheet: cheatsheets/deploy-linux-vm.md
  • Guidelines: CLAUDE.md