Files
ansible eba1a05e7d Implement critical role improvements per ROLE_ANALYSIS_AND_IMPROVEMENTS.md
This commit addresses the critical issues identified in the role analysis:

## Security Improvements

### Remove Hardcoded Secrets (deploy_linux_vm)
- Replaced hardcoded SSH key in defaults/main.yml with vault variable reference
- Replaced hardcoded root password with vault variable reference
- Created vault.yml.example to document secret structure
- Updated README.md with comprehensive security best practices section
- Added documentation for Ansible Vault, external secret managers, and environment variables
- Included SSH key generation and password generation best practices

## Role Documentation & Planning

### CHANGELOG.md Files
- Created comprehensive CHANGELOG.md for deploy_linux_vm role
  - Documented v1.0.0 initial release features
  - Tracked v1.0.1 security improvements
- Created comprehensive CHANGELOG.md for system_info role
  - Documented v1.0.0 initial release
  - Tracked v1.0.1 critical bug fixes (block-level failed_when, Jinja2 templates, OS variables)

### ROADMAP.md Files
- Created detailed ROADMAP.md for deploy_linux_vm role
  - Version 1.1.0: Security & compliance hardening (Q1 2026)
  - Version 1.2.0: Multi-distribution support (Q2 2026)
  - Version 1.3.0: Advanced features (Q3 2026)
  - Version 2.0.0: Enterprise features (Q4 2026)
- Created detailed ROADMAP.md for system_info role
  - Version 1.1.0: Enhanced monitoring & metrics (Q1 2026)
  - Version 1.2.0: Cloud & container support (Q2 2026)
  - Version 1.3.0: Hardware & firmware deep dive (Q3 2026)
  - Version 2.0.0: Visualization & reporting (Q4 2026)

## Error Handling Enhancements

### deploy_linux_vm Role - Block/Rescue/Always Pattern
- Wrapped deployment tasks in comprehensive error handling block
- Block section:
  - Pre-deployment VM name collision check
  - Enhanced IP address acquisition with better error messages
  - Descriptive failure messages for troubleshooting
- Rescue section (automatic rollback):
  - Diagnostic information gathering
  - VM status checking
  - Attempted console log capture
  - Automatic VM destruction and cleanup
  - Disk image removal (primary, LVM, cloud-init ISO)
  - Detailed troubleshooting guidance
- Always section:
  - Deployment logging to /var/log/ansible-vm-deployments.log
  - Success/failure tracking
- Improved task FQCNs (ansible.builtin.*)

## Handlers Implementation

### deploy_linux_vm Role - Complete Handler Suite
- VM Lifecycle Handlers:
  - restart vm, shutdown vm, destroy vm
- Cloud-Init Handlers:
  - regenerate cloud-init iso (full rebuild and reattach)
- Storage Handlers:
  - refresh libvirt storage pool
  - resize vm disk (with safe shutdown/start)
- Network Handlers:
  - refresh network configuration
  - restart libvirt network
- Libvirt Daemon Handlers:
  - restart libvirtd, reload libvirtd
- Cleanup Handlers:
  - cleanup temporary files
  - remove cloud-init iso
- Validation Handlers:
  - validate vm status
  - check connectivity

## Impact

### Security
- Eliminates hardcoded secrets from version control
- Implements industry best practices for secret management
- Provides clear guidance for secure deployment

### Maintainability
- CHANGELOGs enable version tracking and change auditing
- ROADMAPs provide clear development direction and prioritization
- Comprehensive error handling reduces debugging time
- Handlers enable modular, reusable state management

### Reliability
- Automatic rollback prevents partial deployments
- Comprehensive error messages reduce MTTR
- Handlers ensure consistent state management
- Better separation of concerns

### Compliance
- Aligns with CLAUDE.md security requirements
- Implements proper secrets management per organizational policy
- Provides audit trail through changelogs

## References

- ROLE_ANALYSIS_AND_IMPROVEMENTS.md: Initial analysis document
- CLAUDE.md: Organizational infrastructure standards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 02:21:38 +01:00
..

System Information Gathering Role

Comprehensive Ansible role for gathering detailed system information including CPU, GPU, RAM, disk, network, and hypervisor details. Statistics are exported to JSON files organized by machine FQDN.

Description

This role performs a thorough scan of system hardware and software configurations, collecting detailed metrics and storing them in structured JSON format. It's designed to create a complete inventory of infrastructure resources for documentation, monitoring, and capacity planning purposes.

Requirements

Ansible Version

  • Ansible >= 2.9

OS Compatibility

  • Debian 11 (Bullseye), 12 (Bookworm)
  • Ubuntu 20.04 (Focal), 22.04 (Jammy), 24.04 (Noble)
  • RHEL 8, 9
  • Rocky Linux 8, 9
  • AlmaLinux 8, 9

Dependencies

  • Root/sudo privileges for hardware information gathering
  • Internet access for package installation (if required packages are missing)

Required Packages

The role will automatically install these packages if they're not present:

  • lshw - Hardware lister
  • dmidecode - DMI/SMBIOS information
  • pciutils - PCI utilities (lspci)
  • usbutils - USB utilities
  • smartmontools - SMART disk monitoring
  • ethtool - Network interface information

Role Variables

Main Configuration

Variable Default Description Required
system_info_stats_base_dir ./stats/machines Base directory for statistics storage Yes
system_info_create_stats_dir true Create stats directory if it doesn't exist No
system_info_timestamp_format %Y-%m-%d %H:%M:%S UTC Timestamp format for statistics No
system_info_json_indent 2 JSON output indentation No

Feature Toggles

Variable Default Description
system_info_gather_cpu true Gather CPU information
system_info_gather_gpu true Gather GPU information
system_info_gather_memory true Gather memory information
system_info_gather_disk true Gather disk information
system_info_gather_network true Gather network information
system_info_gather_system true Gather OS and system information
system_info_detect_hypervisor true Detect hypervisor capabilities
system_info_include_raw_output false Include raw command outputs in JSON

Information Collected

System Information

  • Hostname and FQDN
  • Operating system details (distribution, version, release)
  • Kernel version and architecture
  • System uptime and boot time
  • Hardware manufacturer, model, serial number, UUID
  • Security modules status (SELinux/AppArmor)

CPU Information

  • Model name and vendor
  • Architecture and CPU family
  • Physical CPUs, cores, and vCPUs count
  • Current, maximum, and minimum frequencies
  • CPU cache details (L1, L2, L3)
  • CPU flags and features
  • Virtualization support (Intel VT-x, AMD-V)
  • Current load average
  • CPU vulnerability mitigations

GPU Information

  • GPU detection and device listing
  • NVIDIA GPU details (via nvidia-smi)
  • AMD GPU details (via rocm-smi)
  • Intel integrated graphics detection
  • IOMMU/VT-d status for GPU passthrough
  • Detailed PCI information for graphics devices

Memory Information

  • Total, free, used, and available memory
  • Buffers and cached memory
  • Memory usage percentage
  • Physical memory modules count
  • Memory hardware details (type, speed, manufacturer)
  • Swap configuration and usage
  • Memory pressure statistics
  • Huge pages configuration

Disk Information

  • Disk usage (all filesystems)
  • Block device listing with details
  • LVM configuration (PVs, VGs, LVs)
  • Mount points and filesystem types
  • Software RAID (mdadm) status
  • Hardware RAID controller detection
  • Physical disk listing (SSD vs HDD detection)
  • SMART health status
  • I/O statistics

Network Information

  • Network interfaces and their states
  • IP addresses (IPv4 and IPv6)
  • MAC addresses and MTU settings
  • Routing table
  • DNS configuration
  • Listening ports
  • Network interface statistics

Hypervisor Detection

  • Virtualization type and role (guest/host)
  • KVM/Libvirt: Version, running VMs, networks, storage pools
  • Proxmox VE: Version, cluster status, VMs, containers, storage
  • LXD/LXC: Version, containers, storage, networks, cluster
  • Docker: Version, running/total containers, images count
  • Podman: Version and availability
  • VMware ESXi: Detection and version
  • Hyper-V: Detection via kernel modules

Output Structure

JSON File Location

Statistics are saved to:

<system_info_stats_base_dir>/<fqdn>/system_info.json
<system_info_stats_base_dir>/<fqdn>/system_info_<timestamp>.json (backup)
<system_info_stats_base_dir>/<fqdn>/summary.txt (human-readable)

JSON Structure

{
  "collection_info": {
    "timestamp": "ISO8601 timestamp",
    "collected_by": "ansible",
    "role_version": "1.0.0"
  },
  "host_info": { ... },
  "system": { ... },
  "kernel": { ... },
  "hardware": { ... },
  "security": { ... },
  "cpu": { ... },
  "gpu": { ... },
  "memory": { ... },
  "swap": { ... },
  "disk": { ... },
  "network": { ... },
  "hypervisor": { ... }
}

Dependencies

None. This role is standalone and has no dependencies on other roles.

Example Playbook

Basic Usage

---
- hosts: all
  become: true
  roles:
    - role: system_info

Custom Statistics Directory

---
- hosts: all
  become: true
  roles:
    - role: system_info
      vars:
        system_info_stats_base_dir: /var/lib/ansible/inventory

Selective Information Gathering

---
- hosts: servers
  become: true
  roles:
    - role: system_info
      vars:
        system_info_gather_cpu: true
        system_info_gather_gpu: false
        system_info_gather_memory: true
        system_info_detect_hypervisor: true

Using Tags for Partial Execution

# Gather only CPU information
ansible-playbook site.yml -t system_info,cpu

# Gather only hypervisor information
ansible-playbook site.yml -t system_info,hypervisor

# Run validation/health checks only
ansible-playbook site.yml -t system_info,validate

# Skip installation, only gather information
ansible-playbook site.yml -t system_info --skip-tags install

Available Tags

Tag Purpose
install Install required packages
gather All information gathering tasks
system System and OS information
cpu CPU information
gpu GPU information
memory Memory information
disk Disk information
network Network information
hypervisor Hypervisor detection
export Export statistics to JSON
statistics Statistics aggregation
validate Validation and health checks
health-check System health monitoring
security Security-related information

Security Considerations

Privileges

  • Requires root/sudo access for hardware information gathering
  • Uses become: true for privileged commands
  • DMI/SMBIOS information requires root access

Sensitive Data

  • Serial numbers and UUIDs are collected (can identify specific hardware)
  • Network configuration may reveal internal IP addressing
  • No secrets or credentials are collected
  • All data is stored locally on the control node

Data Privacy

  • Statistics files contain detailed system information
  • Restrict access to the statistics directory appropriately
  • Consider encryption for the statistics directory if storing sensitive infrastructure details

Performance Impact

  • Execution Time: 30-60 seconds per host (depends on hardware complexity)
  • Network Impact: Minimal - only package installation requires network
  • System Load: Very low - read-only operations
  • Disk I/O: Minimal - small JSON files (<100KB typically)

Troubleshooting

Common Issues

Issue: "dmidecode: command not found"

  • Solution: Role will install it automatically. Ensure internet access or pre-stage packages.

Issue: "Permission denied" errors

  • Solution: Ensure become: true is set in the playbook or role invocation.

Issue: SMART data not available

  • Solution: Not all systems/disks support SMART. This is expected and won't fail the role.

Issue: GPU information showing "No GPU detected"

  • Solution: Normal for VMs and servers without GPUs. Not an error condition.

Issue: Hypervisor commands timing out

  • Solution: Some hypervisor checks may be slow. Increase task timeout if needed.

Debug Mode

Run with verbose output:

ansible-playbook site.yml -t system_info -vvv

Compliance Requirements

  • Follows CIS Benchmark recommendations for system auditing
  • Supports security compliance documentation (NIST, PCI-DSS)
  • Enables infrastructure inventory for CMDB integration
  • Facilitates capacity planning and resource optimization

Testing

Manual Testing

# Test on a single host
ansible-playbook -i inventory/production site.yml -l testhost -t system_info

# Dry-run mode
ansible-playbook site.yml -t system_info --check

Validation

After execution, verify:

  1. Statistics directory created: ./stats/machines/<fqdn>/
  2. JSON file present and valid: system_info.json
  3. Summary file created: summary.txt
  4. No errors in Ansible output

Maintenance

Updates

  • Review and update the role quarterly
  • Test against new OS versions before production deployment
  • Keep documentation synchronized with code changes

Monitoring

  • Track execution time trends (performance degradation may indicate issues)
  • Monitor statistics file sizes (unexpected growth may indicate problems)
  • Validate JSON file integrity periodically

Version History

  • 1.0.0 (2025-01-11): Initial release
    • Complete system information gathering
    • CPU, GPU, RAM, Disk, Network detection
    • Hypervisor detection (KVM, Proxmox, LXD, Docker, etc.)
    • JSON export with timestamped backups
    • Human-readable summary generation

License

MIT

Author Information

Created by the Ansible Infrastructure Team for comprehensive system inventory and monitoring.

For issues, questions, or contributions, please refer to the project repository.

  • system_baseline - System hardening and baseline configuration
  • monitoring - System monitoring setup
  • inventory_sync - Dynamic inventory management

Additional Resources