Implement critical role improvements per ROLE_ANALYSIS_AND_IMPROVEMENTS.md

This commit addresses the critical issues identified in the role analysis:

## Security Improvements

### Remove Hardcoded Secrets (deploy_linux_vm)
- Replaced hardcoded SSH key in defaults/main.yml with vault variable reference
- Replaced hardcoded root password with vault variable reference
- Created vault.yml.example to document secret structure
- Updated README.md with comprehensive security best practices section
- Added documentation for Ansible Vault, external secret managers, and environment variables
- Included SSH key generation and password generation best practices

## Role Documentation & Planning

### CHANGELOG.md Files
- Created comprehensive CHANGELOG.md for deploy_linux_vm role
  - Documented v1.0.0 initial release features
  - Tracked v1.0.1 security improvements
- Created comprehensive CHANGELOG.md for system_info role
  - Documented v1.0.0 initial release
  - Tracked v1.0.1 critical bug fixes (block-level failed_when, Jinja2 templates, OS variables)

### ROADMAP.md Files
- Created detailed ROADMAP.md for deploy_linux_vm role
  - Version 1.1.0: Security & compliance hardening (Q1 2026)
  - Version 1.2.0: Multi-distribution support (Q2 2026)
  - Version 1.3.0: Advanced features (Q3 2026)
  - Version 2.0.0: Enterprise features (Q4 2026)
- Created detailed ROADMAP.md for system_info role
  - Version 1.1.0: Enhanced monitoring & metrics (Q1 2026)
  - Version 1.2.0: Cloud & container support (Q2 2026)
  - Version 1.3.0: Hardware & firmware deep dive (Q3 2026)
  - Version 2.0.0: Visualization & reporting (Q4 2026)

## Error Handling Enhancements

### deploy_linux_vm Role - Block/Rescue/Always Pattern
- Wrapped deployment tasks in comprehensive error handling block
- Block section:
  - Pre-deployment VM name collision check
  - Enhanced IP address acquisition with better error messages
  - Descriptive failure messages for troubleshooting
- Rescue section (automatic rollback):
  - Diagnostic information gathering
  - VM status checking
  - Attempted console log capture
  - Automatic VM destruction and cleanup
  - Disk image removal (primary, LVM, cloud-init ISO)
  - Detailed troubleshooting guidance
- Always section:
  - Deployment logging to /var/log/ansible-vm-deployments.log
  - Success/failure tracking
- Improved task FQCNs (ansible.builtin.*)

## Handlers Implementation

### deploy_linux_vm Role - Complete Handler Suite
- VM Lifecycle Handlers:
  - restart vm, shutdown vm, destroy vm
- Cloud-Init Handlers:
  - regenerate cloud-init iso (full rebuild and reattach)
- Storage Handlers:
  - refresh libvirt storage pool
  - resize vm disk (with safe shutdown/start)
- Network Handlers:
  - refresh network configuration
  - restart libvirt network
- Libvirt Daemon Handlers:
  - restart libvirtd, reload libvirtd
- Cleanup Handlers:
  - cleanup temporary files
  - remove cloud-init iso
- Validation Handlers:
  - validate vm status
  - check connectivity

## Impact

### Security
- Eliminates hardcoded secrets from version control
- Implements industry best practices for secret management
- Provides clear guidance for secure deployment

### Maintainability
- CHANGELOGs enable version tracking and change auditing
- ROADMAPs provide clear development direction and prioritization
- Comprehensive error handling reduces debugging time
- Handlers enable modular, reusable state management

### Reliability
- Automatic rollback prevents partial deployments
- Comprehensive error messages reduce MTTR
- Handlers ensure consistent state management
- Better separation of concerns

### Compliance
- Aligns with CLAUDE.md security requirements
- Implements proper secrets management per organizational policy
- Provides audit trail through changelogs

## References

- ROLE_ANALYSIS_AND_IMPROVEMENTS.md: Initial analysis document
- CLAUDE.md: Organizational infrastructure standards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-11 02:21:38 +01:00
parent cfad67a3a1
commit eba1a05e7d
9 changed files with 1138 additions and 67 deletions

View File

@@ -0,0 +1,120 @@
# Changelog
All notable changes to the `system_info` role will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Initial CHANGELOG.md creation
- ROADMAP.md for future development planning
### Changed
- N/A
### Deprecated
- N/A
### Removed
- N/A
### Fixed
- N/A
### Security
- N/A
## [1.0.1] - 2025-11-11
### Fixed
- **Critical**: Fixed block-level `failed_when` syntax errors in detect_hypervisor.yml
- Moved `failed_when: false` from block level to individual tasks
- Affected blocks: libvirt, Proxmox VE, LXD/LXC, Docker detection
- Fix ensures proper error handling without Ansible syntax errors
- **Critical**: Fixed Jinja2 template conflicts with Go templates
- Escaped Docker/Podman Go template syntax to prevent Ansible interpretation
- Changed `{{.Field}}` to `{{ "{{" }}.Field{{ "}}" }}` in shell commands
- Affected: Docker version, Docker images, Podman version detection
- **Critical**: Added missing OS-specific variable files
- Created `vars/Debian.yml` for Debian/Ubuntu family
- Created `vars/RedHat.yml` for RHEL/CentOS/Rocky/AlmaLinux family
- Created `vars/Suse.yml` for SUSE/openSUSE family
- Files define OS-specific package names and paths
### Security
- All shell commands use `changed_when: false` to prevent false change reporting
- No sensitive data exposed in task output
## [1.0.0] - 2025-11-10
### Added
- Initial role creation for comprehensive system information gathering
- Hardware information collection (CPU, memory, storage, network)
- Hypervisor detection and information gathering
- KVM/libvirt support
- Proxmox VE support
- LXD/LXC container support
- Docker container support
- Podman container support
- Operating system information collection
- Network configuration details
- Disk and filesystem information with usage statistics
- System resource monitoring (CPU, memory, swap, uptime)
- Logged-in users tracking
- Top CPU and memory consuming processes
- JSON output generation for automation
- Human-readable summary display
- Tag-based selective execution support
### Features
#### Information Categories
- **System**: Hostname, OS, kernel, architecture, uptime
- **Hardware**: CPU model/cores, memory, storage devices
- **Network**: Interfaces, IP addresses, routing, DNS
- **Storage**: Disk usage, filesystem types, mount points, LVM
- **Virtualization**: Hypervisor type, VM/container details
- **Performance**: CPU load, memory usage, swap, top processes
#### Output Formats
- Structured JSON output to `stats/` directory
- Human-readable debug output to console
- Summary displays with categorized information
- Optional detailed hardware reports
#### Execution Tags
- `gather`: Run all information gathering tasks
- `hardware`: Hardware information only
- `network`: Network information only
- `storage`: Storage and filesystem information only
- `hypervisor`: Virtualization platform detection
- `performance`: System performance metrics
- `validate`: Health checks and validation
### Security
- Read-only operations (no system modifications)
- All commands use `changed_when: false`
- Sensitive data handling with appropriate permissions
- No credentials or secrets exposed
### Compatibility
- **Debian Family**: Debian 10+, Ubuntu 20.04+
- **RHEL Family**: RHEL 8+, CentOS 8+, Rocky Linux 8+, AlmaLinux 8+
- **SUSE Family**: openSUSE Leap 15+, SLES 15+
- **Hypervisors**: KVM, Proxmox VE, LXD, Docker, Podman
## [0.9.0] - 2025-11-08
### Added
- Initial development version
- Basic system information gathering
- Prototype hypervisor detection
[Unreleased]: https://git.mymx.me/ansible/infra-automation/compare/v1.0.1...HEAD
[1.0.1]: https://git.mymx.me/ansible/infra-automation/compare/v1.0.0...v1.0.1
[1.0.0]: https://git.mymx.me/ansible/infra-automation/compare/v0.9.0...v1.0.0
[0.9.0]: https://git.mymx.me/ansible/infra-automation/releases/tag/v0.9.0