Files
infra-automation/roles/system_info/tasks/validate.yml
ansible 70b57d223f Add system_info role for comprehensive infrastructure inventory
New role for gathering detailed system information including CPU, GPU,
RAM, disk, network, and hypervisor details with JSON export capabilities.

Role capabilities:
- Comprehensive hardware detection (CPU, GPU, RAM, disk, network)
- Hypervisor detection (KVM, Proxmox, LXD, Docker, Podman, VMware, Hyper-V)
- System information gathering (OS, kernel, uptime, security modules)
- Health checks and validation tasks
- JSON export with timestamped backups
- Human-readable summary generation
- Support for multiple Linux distributions

Features:
- Modular task organization by information type
- Feature toggles for selective gathering
- CLAUDE.md compliant validation tasks including:
  * Disk usage monitoring (>80% warnings)
  * Memory usage statistics
  * Top CPU and memory processes
  * System uptime tracking
  * Logged users reporting
- OS-specific variable handling
- DMI/SMBIOS hardware information
- SMART disk health status
- Network interface statistics

File structure:
roles/system_info/
├── README.md              # Comprehensive documentation
├── defaults/main.yml      # Configurable defaults
├── vars/main.yml          # Role variables
├── meta/main.yml          # Galaxy metadata
├── tasks/
│   ├── main.yml          # Main task coordinator
│   ├── install.yml       # Package installation
│   ├── gather_system.yml # OS and system info
│   ├── gather_cpu.yml    # CPU details
│   ├── gather_gpu.yml    # GPU detection
│   ├── gather_memory.yml # RAM information
│   ├── gather_disk.yml   # Disk and LVM info
│   ├── gather_network.yml # Network configuration
│   ├── detect_hypervisor.yml # Virtualization detection
│   ├── export_stats.yml  # JSON export
│   └── validate.yml      # Health checks (CLAUDE.md compliant)
├── templates/
│   └── summary.txt.j2    # Human-readable summary
├── handlers/
│   └── main.yml          # Service handlers
└── tests/
    └── test.yml          # Basic test playbook

Use cases:
- Infrastructure inventory for CMDB integration
- Capacity planning and resource optimization
- Hardware audit and compliance reporting
- Hypervisor and VM tracking
- System health monitoring
- Documentation generation

Output:
- JSON: ./stats/machines/<fqdn>/system_info.json
- Backup: ./stats/machines/<fqdn>/system_info_<timestamp>.json
- Summary: ./stats/machines/<fqdn>/summary.txt

Requirements:
- Ansible >= 2.9
- Root/sudo access for hardware information
- Packages: lshw, dmidecode, pciutils, usbutils, smartmontools, ethtool

Compliance:
- CLAUDE.md health check requirements implemented
- CIS Benchmark support for system auditing
- NIST compliance documentation support
- Security-first design with minimal system impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 01:36:01 +01:00

102 lines
3.0 KiB
YAML

---
# Validation and health check tasks
- name: Gather disk usage statistics
shell: df -h | grep -vE '^Filesystem|tmpfs|cdrom'
register: validate_disk_usage
changed_when: false
failed_when: false
tags: [validate, health-check]
- name: Gather memory usage statistics
shell: free -h
register: validate_memory_usage
changed_when: false
tags: [validate, health-check]
- name: Gather swap usage statistics
shell: swapon --show
register: validate_swap_usage
changed_when: false
failed_when: false
tags: [validate, health-check]
- name: Gather system uptime
shell: uptime
register: validate_system_uptime
changed_when: false
tags: [validate, health-check]
- name: Gather logged-in users
shell: who
register: validate_logged_users
changed_when: false
failed_when: false
tags: [validate, health-check]
- name: Check high CPU processes
shell: ps aux --sort=-%cpu | head -10
register: validate_top_cpu_processes
changed_when: false
tags: [validate, health-check]
- name: Check high memory processes
shell: ps aux --sort=-%mem | head -10
register: validate_top_mem_processes
changed_when: false
tags: [validate, health-check]
- name: Check for disk usage warnings (>80%)
shell: df -h | awk 'NR>1 {gsub(/%/,"",$5); if($5>80) print $0}'
register: validate_disk_warnings
changed_when: false
failed_when: false
tags: [validate, health-check]
- name: Verify statistics directory exists
stat:
path: "{{ system_info_stats_dir }}"
register: validate_stats_dir
delegate_to: localhost
become: false
tags: [validate]
- name: Verify JSON file was created
stat:
path: "{{ system_info_stats_dir }}/system_info.json"
register: validate_json_file
delegate_to: localhost
become: false
tags: [validate]
- name: Display system health summary
debug:
msg:
- "=== System Health Check for {{ ansible_fqdn }} ==="
- "Uptime: {{ validate_system_uptime.stdout }}"
- ""
- "=== Disk Usage ==="
- "{{ validate_disk_usage.stdout_lines }}"
- ""
- "=== Memory Usage ==="
- "{{ validate_memory_usage.stdout_lines }}"
- ""
- "{% if validate_swap_usage.stdout_lines | length > 0 %}=== Swap Usage ==={{ validate_swap_usage.stdout_lines }}{% else %}No swap configured{% endif %}"
- ""
- "{% if validate_disk_warnings.stdout_lines | length > 0 %}=== DISK WARNINGS (>80% usage) ==={{ validate_disk_warnings.stdout_lines }}{% endif %}"
- ""
- "=== Logged Users ==="
- "{{ validate_logged_users.stdout_lines if validate_logged_users.stdout_lines | length > 0 else ['No users logged in'] }}"
- ""
- "=== Top CPU Processes ==="
- "{{ validate_top_cpu_processes.stdout_lines[:5] }}"
- ""
- "=== Top Memory Processes ==="
- "{{ validate_top_mem_processes.stdout_lines[:5] }}"
- ""
- "=== Statistics Files ==="
- "Directory exists: {{ validate_stats_dir.stat.exists }}"
- "JSON file created: {{ validate_json_file.stat.exists }}"
- "Location: {{ system_info_stats_dir }}"
tags: [validate, health-check]