Files
infra-automation/roles/system_info/tasks/gather_gpu.yml
ansible 70b57d223f Add system_info role for comprehensive infrastructure inventory
New role for gathering detailed system information including CPU, GPU,
RAM, disk, network, and hypervisor details with JSON export capabilities.

Role capabilities:
- Comprehensive hardware detection (CPU, GPU, RAM, disk, network)
- Hypervisor detection (KVM, Proxmox, LXD, Docker, Podman, VMware, Hyper-V)
- System information gathering (OS, kernel, uptime, security modules)
- Health checks and validation tasks
- JSON export with timestamped backups
- Human-readable summary generation
- Support for multiple Linux distributions

Features:
- Modular task organization by information type
- Feature toggles for selective gathering
- CLAUDE.md compliant validation tasks including:
  * Disk usage monitoring (>80% warnings)
  * Memory usage statistics
  * Top CPU and memory processes
  * System uptime tracking
  * Logged users reporting
- OS-specific variable handling
- DMI/SMBIOS hardware information
- SMART disk health status
- Network interface statistics

File structure:
roles/system_info/
├── README.md              # Comprehensive documentation
├── defaults/main.yml      # Configurable defaults
├── vars/main.yml          # Role variables
├── meta/main.yml          # Galaxy metadata
├── tasks/
│   ├── main.yml          # Main task coordinator
│   ├── install.yml       # Package installation
│   ├── gather_system.yml # OS and system info
│   ├── gather_cpu.yml    # CPU details
│   ├── gather_gpu.yml    # GPU detection
│   ├── gather_memory.yml # RAM information
│   ├── gather_disk.yml   # Disk and LVM info
│   ├── gather_network.yml # Network configuration
│   ├── detect_hypervisor.yml # Virtualization detection
│   ├── export_stats.yml  # JSON export
│   └── validate.yml      # Health checks (CLAUDE.md compliant)
├── templates/
│   └── summary.txt.j2    # Human-readable summary
├── handlers/
│   └── main.yml          # Service handlers
└── tests/
    └── test.yml          # Basic test playbook

Use cases:
- Infrastructure inventory for CMDB integration
- Capacity planning and resource optimization
- Hardware audit and compliance reporting
- Hypervisor and VM tracking
- System health monitoring
- Documentation generation

Output:
- JSON: ./stats/machines/<fqdn>/system_info.json
- Backup: ./stats/machines/<fqdn>/system_info_<timestamp>.json
- Summary: ./stats/machines/<fqdn>/summary.txt

Requirements:
- Ansible >= 2.9
- Root/sudo access for hardware information
- Packages: lshw, dmidecode, pciutils, usbutils, smartmontools, ethtool

Compliance:
- CLAUDE.md health check requirements implemented
- CIS Benchmark support for system auditing
- NIST compliance documentation support
- Security-first design with minimal system impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 01:36:01 +01:00

97 lines
3.1 KiB
YAML

---
# GPU information gathering tasks
- name: Detect GPU devices using lspci
shell: lspci | grep -iE "VGA|3D|Display" || echo "No GPU detected"
register: system_info_gpu_lspci_raw
changed_when: false
tags: [gather, gpu]
- name: Gather detailed GPU information
shell: lspci -v -s $(lspci | grep -iE "VGA|3D" | cut -d' ' -f1) 2>/dev/null || echo "No detailed GPU info available"
register: system_info_gpu_detailed_raw
changed_when: false
tags: [gather, gpu]
- name: Check for NVIDIA GPU
shell: lspci | grep -i nvidia
register: system_info_nvidia_check
changed_when: false
failed_when: false
tags: [gather, gpu]
- name: Gather NVIDIA GPU details (if available)
shell: nvidia-smi --query-gpu=name,driver_version,memory.total,compute_cap --format=csv,noheader 2>/dev/null || echo "nvidia-smi not available"
register: system_info_nvidia_smi_raw
changed_when: false
failed_when: false
when: system_info_nvidia_check.rc == 0
tags: [gather, gpu]
- name: Check for AMD GPU
shell: lspci | grep -iE "AMD|ATI"
register: system_info_amd_check
changed_when: false
failed_when: false
tags: [gather, gpu]
- name: Gather AMD GPU details (if available)
shell: |
if command -v rocm-smi &> /dev/null; then
rocm-smi --showproductname --showdriverversion
else
echo "rocm-smi not available"
fi
register: system_info_amd_rocm_raw
changed_when: false
failed_when: false
when: system_info_amd_check.rc == 0
tags: [gather, gpu]
- name: Check for Intel GPU
shell: lspci | grep -i "intel.*graphics"
register: system_info_intel_gpu_check
changed_when: false
failed_when: false
tags: [gather, gpu]
- name: Parse GPU information
set_fact:
system_info_gpu_detected: "{{ system_info_gpu_lspci_raw.stdout != 'No GPU detected' }}"
system_info_gpu_list: "{{ system_info_gpu_lspci_raw.stdout_lines | default([]) }}"
tags: [gather, gpu]
- name: Build GPU details structure
set_fact:
system_info_gpu:
detected: "{{ system_info_gpu_detected }}"
devices: "{{ system_info_gpu_list }}"
nvidia:
present: "{{ system_info_nvidia_check.rc == 0 }}"
details: "{{ system_info_nvidia_smi_raw.stdout_lines | default([]) if system_info_nvidia_check.rc == 0 else [] }}"
amd:
present: "{{ system_info_amd_check.rc == 0 }}"
details: "{{ system_info_amd_rocm_raw.stdout_lines | default([]) if system_info_amd_check.rc == 0 else [] }}"
intel:
present: "{{ system_info_intel_gpu_check.rc == 0 }}"
detailed_info: "{{ system_info_gpu_detailed_raw.stdout_lines | default([]) }}"
tags: [gather, gpu]
- name: Check for GPU passthrough support (IOMMU)
shell: |
if dmesg | grep -iE "IOMMU|AMD-Vi|Intel VT-d" | grep -i enabled; then
echo "IOMMU enabled"
else
echo "IOMMU disabled or not available"
fi
register: system_info_iommu_status_raw
changed_when: false
become: true
failed_when: false
tags: [gather, gpu]
- name: Add IOMMU status to GPU info
set_fact:
system_info_gpu: "{{ system_info_gpu | combine({'iommu_status': system_info_iommu_status_raw.stdout | default('Unknown')}) }}"
tags: [gather, gpu]