Files
infra-automation/roles/deploy_linux_vm/tasks/deploy.yml
ansible eba1a05e7d Implement critical role improvements per ROLE_ANALYSIS_AND_IMPROVEMENTS.md
This commit addresses the critical issues identified in the role analysis:

## Security Improvements

### Remove Hardcoded Secrets (deploy_linux_vm)
- Replaced hardcoded SSH key in defaults/main.yml with vault variable reference
- Replaced hardcoded root password with vault variable reference
- Created vault.yml.example to document secret structure
- Updated README.md with comprehensive security best practices section
- Added documentation for Ansible Vault, external secret managers, and environment variables
- Included SSH key generation and password generation best practices

## Role Documentation & Planning

### CHANGELOG.md Files
- Created comprehensive CHANGELOG.md for deploy_linux_vm role
  - Documented v1.0.0 initial release features
  - Tracked v1.0.1 security improvements
- Created comprehensive CHANGELOG.md for system_info role
  - Documented v1.0.0 initial release
  - Tracked v1.0.1 critical bug fixes (block-level failed_when, Jinja2 templates, OS variables)

### ROADMAP.md Files
- Created detailed ROADMAP.md for deploy_linux_vm role
  - Version 1.1.0: Security & compliance hardening (Q1 2026)
  - Version 1.2.0: Multi-distribution support (Q2 2026)
  - Version 1.3.0: Advanced features (Q3 2026)
  - Version 2.0.0: Enterprise features (Q4 2026)
- Created detailed ROADMAP.md for system_info role
  - Version 1.1.0: Enhanced monitoring & metrics (Q1 2026)
  - Version 1.2.0: Cloud & container support (Q2 2026)
  - Version 1.3.0: Hardware & firmware deep dive (Q3 2026)
  - Version 2.0.0: Visualization & reporting (Q4 2026)

## Error Handling Enhancements

### deploy_linux_vm Role - Block/Rescue/Always Pattern
- Wrapped deployment tasks in comprehensive error handling block
- Block section:
  - Pre-deployment VM name collision check
  - Enhanced IP address acquisition with better error messages
  - Descriptive failure messages for troubleshooting
- Rescue section (automatic rollback):
  - Diagnostic information gathering
  - VM status checking
  - Attempted console log capture
  - Automatic VM destruction and cleanup
  - Disk image removal (primary, LVM, cloud-init ISO)
  - Detailed troubleshooting guidance
- Always section:
  - Deployment logging to /var/log/ansible-vm-deployments.log
  - Success/failure tracking
- Improved task FQCNs (ansible.builtin.*)

## Handlers Implementation

### deploy_linux_vm Role - Complete Handler Suite
- VM Lifecycle Handlers:
  - restart vm, shutdown vm, destroy vm
- Cloud-Init Handlers:
  - regenerate cloud-init iso (full rebuild and reattach)
- Storage Handlers:
  - refresh libvirt storage pool
  - resize vm disk (with safe shutdown/start)
- Network Handlers:
  - refresh network configuration
  - restart libvirt network
- Libvirt Daemon Handlers:
  - restart libvirtd, reload libvirtd
- Cleanup Handlers:
  - cleanup temporary files
  - remove cloud-init iso
- Validation Handlers:
  - validate vm status
  - check connectivity

## Impact

### Security
- Eliminates hardcoded secrets from version control
- Implements industry best practices for secret management
- Provides clear guidance for secure deployment

### Maintainability
- CHANGELOGs enable version tracking and change auditing
- ROADMAPs provide clear development direction and prioritization
- Comprehensive error handling reduces debugging time
- Handlers enable modular, reusable state management

### Reliability
- Automatic rollback prevents partial deployments
- Comprehensive error messages reduce MTTR
- Handlers ensure consistent state management
- Better separation of concerns

### Compliance
- Aligns with CLAUDE.md security requirements
- Implements proper secrets management per organizational policy
- Provides audit trail through changelogs

## References

- ROLE_ANALYSIS_AND_IMPROVEMENTS.md: Initial analysis document
- CLAUDE.md: Organizational infrastructure standards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 02:21:38 +01:00

212 lines
7.5 KiB
YAML

---
# =============================================================================
# Deployment Tasks - Create and Start VM
# =============================================================================
- name: Build virt-install disk parameters
set_fact:
deploy_linux_vm_disk_params: >-
--disk path={{ deploy_linux_vm_disk_path }},format=qcow2,bus=virtio
{% if deploy_linux_vm_use_lvm | bool %}
--disk path={{ deploy_linux_vm_images_dir }}/{{ deploy_linux_vm_name }}-lvm.qcow2,format=qcow2,bus=virtio
{% endif %}
--disk path={{ deploy_linux_vm_cloud_init_iso_path }},device=cdrom
tags: [deploy]
- name: Deploy VM with error handling
block:
- name: Check if VM already exists
community.libvirt.virt:
command: list_vms
register: existing_vms
changed_when: false
- name: Fail if VM already exists
ansible.builtin.fail:
msg: "VM '{{ deploy_linux_vm_name }}' already exists. Use a different name or remove the existing VM."
when: deploy_linux_vm_name in existing_vms.list_vms
- name: Create VM using virt-install
ansible.builtin.command: >
virt-install
--name {{ deploy_linux_vm_name }}
--memory {{ deploy_linux_vm_memory_mb }}
--vcpus {{ deploy_linux_vm_vcpus }}
{{ deploy_linux_vm_disk_params }}
--network network={{ deploy_linux_vm_network }},model=virtio
--os-variant {{ deploy_linux_vm_distro_config.os_variant }}
--graphics none
--console pty,target_type=serial
--import
--noautoconsole
register: deploy_linux_vm_create
changed_when: deploy_linux_vm_create.rc == 0
- name: Display VM creation result
ansible.builtin.debug:
msg:
- "=== VM Created ==="
- "VM Name: {{ deploy_linux_vm_name }}"
- "Distribution: {{ deploy_linux_vm_os_distribution }}"
- "Waiting for boot and cloud-init..."
- name: Wait for VM to boot and cloud-init to complete
ansible.builtin.pause:
seconds: "{{ deploy_linux_vm_wait_for_boot_seconds }}"
prompt: "Waiting for VM to boot and cloud-init to complete configuration..."
- name: Get VM IP address
ansible.builtin.shell: |
virsh domifaddr {{ deploy_linux_vm_name }} | grep -oP '(\d{1,3}\.){3}\d{1,3}' | head -1
register: deploy_linux_vm_ip_result
retries: 15
delay: 10
until: deploy_linux_vm_ip_result.stdout != ""
changed_when: false
failed_when: false
- name: Check if IP address was obtained
ansible.builtin.fail:
msg: |
Failed to obtain IP address for VM {{ deploy_linux_vm_name }}.
Possible causes:
- VM failed to boot
- DHCP not configured properly
- Network interface not up
- Cloud-init configuration error
Check VM console: virsh console {{ deploy_linux_vm_name }}
when: deploy_linux_vm_ip_result.stdout == ""
- name: Set VM IP fact
ansible.builtin.set_fact:
deploy_linux_vm_ip: "{{ deploy_linux_vm_ip_result.stdout }}"
- name: Display VM information
ansible.builtin.debug:
msg:
- "=== VM Deployment Successful ==="
- "VM Name: {{ deploy_linux_vm_name }}"
- "Distribution: {{ deploy_linux_vm_os_distribution }}"
- "IP Address: {{ deploy_linux_vm_ip }}"
- "vCPUs: {{ deploy_linux_vm_vcpus }}"
- "Memory: {{ deploy_linux_vm_memory_mb }} MB"
- "Disk: {{ deploy_linux_vm_disk_size_gb }} GB"
- "OS Variant: {{ deploy_linux_vm_distro_config.os_variant }}"
- "Package Manager: {{ deploy_linux_vm_distro_config.package_manager }}"
- "LVM Enabled: {{ deploy_linux_vm_use_lvm }}"
- "Access: ssh {{ deploy_linux_vm_ansible_user }}@{{ deploy_linux_vm_ip }}"
- name: Test SSH connectivity to new VM
ansible.builtin.wait_for:
host: "{{ deploy_linux_vm_ip }}"
port: 22
timeout: "{{ deploy_linux_vm_ssh_wait_timeout }}"
state: started
rescue:
- name: VM deployment failed - gathering diagnostic information
ansible.builtin.debug:
msg:
- "=== VM Deployment Failed ==="
- "VM Name: {{ deploy_linux_vm_name }}"
- "Distribution: {{ deploy_linux_vm_os_distribution }}"
- "Error occurred during deployment"
- "Checking VM status..."
- name: Check if VM was partially created
ansible.builtin.command: virsh list --all
register: vm_list_all
changed_when: false
failed_when: false
- name: Display all VMs for debugging
ansible.builtin.debug:
var: vm_list_all.stdout_lines
- name: Check VM state if it exists
community.libvirt.virt:
name: "{{ deploy_linux_vm_name }}"
command: status
register: vm_status
failed_when: false
changed_when: false
- name: Display VM status
ansible.builtin.debug:
msg: "VM {{ deploy_linux_vm_name }} status: {{ vm_status.status | default('not found') }}"
when: vm_status is defined
- name: Attempt to get VM console log
ansible.builtin.command: virsh console {{ deploy_linux_vm_name }} --force
register: console_log
failed_when: false
changed_when: false
async: 5
poll: 0
- name: Rollback - Destroy partially created VM
community.libvirt.virt:
name: "{{ deploy_linux_vm_name }}"
state: destroyed
when:
- vm_status is defined
- vm_status.status is defined
failed_when: false
- name: Rollback - Undefine VM
community.libvirt.virt:
name: "{{ deploy_linux_vm_name }}"
command: undefine
when:
- vm_status is defined
- vm_status.status is defined
failed_when: false
- name: Rollback - Remove disk images
ansible.builtin.file:
path: "{{ item }}"
state: absent
loop:
- "{{ deploy_linux_vm_disk_path }}"
- "{{ deploy_linux_vm_cloud_init_iso_path }}"
- "{{ deploy_linux_vm_images_dir }}/{{ deploy_linux_vm_name }}-lvm.qcow2"
failed_when: false
- name: Display rollback completion message
ansible.builtin.debug:
msg:
- "=== Rollback Completed ==="
- "VM artifacts have been cleaned up"
- "Review error messages above for root cause"
- "Common issues:"
- " - Insufficient resources (disk space, memory)"
- " - Network configuration errors"
- " - Cloud-init syntax errors"
- " - OS variant not recognized"
- name: Fail with detailed error message
ansible.builtin.fail:
msg: |
VM deployment failed and rollback completed.
VM Name: {{ deploy_linux_vm_name }}
Please review the error messages above and verify:
1. Hypervisor has sufficient resources
2. Network '{{ deploy_linux_vm_network }}' exists
3. Cloud-init configuration is valid
4. OS variant '{{ deploy_linux_vm_distro_config.os_variant }}' is supported
Run 'virsh capabilities' to see supported OS variants.
always:
- name: Log deployment attempt
ansible.builtin.lineinfile:
path: /var/log/ansible-vm-deployments.log
line: "{{ ansible_date_time.iso8601 }} | {{ deploy_linux_vm_name }} | {{ deploy_linux_vm_os_distribution }} | {{ 'SUCCESS' if deploy_linux_vm_ip is defined else 'FAILED' }}"
create: yes
mode: '0644'
delegate_to: localhost
become: false
failed_when: false
tags: [deploy]