Files
infra-automation/roles/system_info/README.md
ansible 70b57d223f Add system_info role for comprehensive infrastructure inventory
New role for gathering detailed system information including CPU, GPU,
RAM, disk, network, and hypervisor details with JSON export capabilities.

Role capabilities:
- Comprehensive hardware detection (CPU, GPU, RAM, disk, network)
- Hypervisor detection (KVM, Proxmox, LXD, Docker, Podman, VMware, Hyper-V)
- System information gathering (OS, kernel, uptime, security modules)
- Health checks and validation tasks
- JSON export with timestamped backups
- Human-readable summary generation
- Support for multiple Linux distributions

Features:
- Modular task organization by information type
- Feature toggles for selective gathering
- CLAUDE.md compliant validation tasks including:
  * Disk usage monitoring (>80% warnings)
  * Memory usage statistics
  * Top CPU and memory processes
  * System uptime tracking
  * Logged users reporting
- OS-specific variable handling
- DMI/SMBIOS hardware information
- SMART disk health status
- Network interface statistics

File structure:
roles/system_info/
├── README.md              # Comprehensive documentation
├── defaults/main.yml      # Configurable defaults
├── vars/main.yml          # Role variables
├── meta/main.yml          # Galaxy metadata
├── tasks/
│   ├── main.yml          # Main task coordinator
│   ├── install.yml       # Package installation
│   ├── gather_system.yml # OS and system info
│   ├── gather_cpu.yml    # CPU details
│   ├── gather_gpu.yml    # GPU detection
│   ├── gather_memory.yml # RAM information
│   ├── gather_disk.yml   # Disk and LVM info
│   ├── gather_network.yml # Network configuration
│   ├── detect_hypervisor.yml # Virtualization detection
│   ├── export_stats.yml  # JSON export
│   └── validate.yml      # Health checks (CLAUDE.md compliant)
├── templates/
│   └── summary.txt.j2    # Human-readable summary
├── handlers/
│   └── main.yml          # Service handlers
└── tests/
    └── test.yml          # Basic test playbook

Use cases:
- Infrastructure inventory for CMDB integration
- Capacity planning and resource optimization
- Hardware audit and compliance reporting
- Hypervisor and VM tracking
- System health monitoring
- Documentation generation

Output:
- JSON: ./stats/machines/<fqdn>/system_info.json
- Backup: ./stats/machines/<fqdn>/system_info_<timestamp>.json
- Summary: ./stats/machines/<fqdn>/summary.txt

Requirements:
- Ansible >= 2.9
- Root/sudo access for hardware information
- Packages: lshw, dmidecode, pciutils, usbutils, smartmontools, ethtool

Compliance:
- CLAUDE.md health check requirements implemented
- CIS Benchmark support for system auditing
- NIST compliance documentation support
- Security-first design with minimal system impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 01:36:01 +01:00

351 lines
10 KiB
Markdown

# System Information Gathering Role
Comprehensive Ansible role for gathering detailed system information including CPU, GPU, RAM, disk, network, and hypervisor details. Statistics are exported to JSON files organized by machine FQDN.
## Description
This role performs a thorough scan of system hardware and software configurations, collecting detailed metrics and storing them in structured JSON format. It's designed to create a complete inventory of infrastructure resources for documentation, monitoring, and capacity planning purposes.
## Requirements
### Ansible Version
- Ansible >= 2.9
### OS Compatibility
- Debian 11 (Bullseye), 12 (Bookworm)
- Ubuntu 20.04 (Focal), 22.04 (Jammy), 24.04 (Noble)
- RHEL 8, 9
- Rocky Linux 8, 9
- AlmaLinux 8, 9
### Dependencies
- Root/sudo privileges for hardware information gathering
- Internet access for package installation (if required packages are missing)
### Required Packages
The role will automatically install these packages if they're not present:
- `lshw` - Hardware lister
- `dmidecode` - DMI/SMBIOS information
- `pciutils` - PCI utilities (lspci)
- `usbutils` - USB utilities
- `smartmontools` - SMART disk monitoring
- `ethtool` - Network interface information
## Role Variables
### Main Configuration
| Variable | Default | Description | Required |
|----------|---------|-------------|----------|
| `system_info_stats_base_dir` | `./stats/machines` | Base directory for statistics storage | Yes |
| `system_info_create_stats_dir` | `true` | Create stats directory if it doesn't exist | No |
| `system_info_timestamp_format` | `%Y-%m-%d %H:%M:%S UTC` | Timestamp format for statistics | No |
| `system_info_json_indent` | `2` | JSON output indentation | No |
### Feature Toggles
| Variable | Default | Description |
|----------|---------|-------------|
| `system_info_gather_cpu` | `true` | Gather CPU information |
| `system_info_gather_gpu` | `true` | Gather GPU information |
| `system_info_gather_memory` | `true` | Gather memory information |
| `system_info_gather_disk` | `true` | Gather disk information |
| `system_info_gather_network` | `true` | Gather network information |
| `system_info_gather_system` | `true` | Gather OS and system information |
| `system_info_detect_hypervisor` | `true` | Detect hypervisor capabilities |
| `system_info_include_raw_output` | `false` | Include raw command outputs in JSON |
## Information Collected
### System Information
- Hostname and FQDN
- Operating system details (distribution, version, release)
- Kernel version and architecture
- System uptime and boot time
- Hardware manufacturer, model, serial number, UUID
- Security modules status (SELinux/AppArmor)
### CPU Information
- Model name and vendor
- Architecture and CPU family
- Physical CPUs, cores, and vCPUs count
- Current, maximum, and minimum frequencies
- CPU cache details (L1, L2, L3)
- CPU flags and features
- Virtualization support (Intel VT-x, AMD-V)
- Current load average
- CPU vulnerability mitigations
### GPU Information
- GPU detection and device listing
- NVIDIA GPU details (via nvidia-smi)
- AMD GPU details (via rocm-smi)
- Intel integrated graphics detection
- IOMMU/VT-d status for GPU passthrough
- Detailed PCI information for graphics devices
### Memory Information
- Total, free, used, and available memory
- Buffers and cached memory
- Memory usage percentage
- Physical memory modules count
- Memory hardware details (type, speed, manufacturer)
- Swap configuration and usage
- Memory pressure statistics
- Huge pages configuration
### Disk Information
- Disk usage (all filesystems)
- Block device listing with details
- LVM configuration (PVs, VGs, LVs)
- Mount points and filesystem types
- Software RAID (mdadm) status
- Hardware RAID controller detection
- Physical disk listing (SSD vs HDD detection)
- SMART health status
- I/O statistics
### Network Information
- Network interfaces and their states
- IP addresses (IPv4 and IPv6)
- MAC addresses and MTU settings
- Routing table
- DNS configuration
- Listening ports
- Network interface statistics
### Hypervisor Detection
- Virtualization type and role (guest/host)
- **KVM/Libvirt**: Version, running VMs, networks, storage pools
- **Proxmox VE**: Version, cluster status, VMs, containers, storage
- **LXD/LXC**: Version, containers, storage, networks, cluster
- **Docker**: Version, running/total containers, images count
- **Podman**: Version and availability
- **VMware ESXi**: Detection and version
- **Hyper-V**: Detection via kernel modules
## Output Structure
### JSON File Location
Statistics are saved to:
```
<system_info_stats_base_dir>/<fqdn>/system_info.json
<system_info_stats_base_dir>/<fqdn>/system_info_<timestamp>.json (backup)
<system_info_stats_base_dir>/<fqdn>/summary.txt (human-readable)
```
### JSON Structure
```json
{
"collection_info": {
"timestamp": "ISO8601 timestamp",
"collected_by": "ansible",
"role_version": "1.0.0"
},
"host_info": { ... },
"system": { ... },
"kernel": { ... },
"hardware": { ... },
"security": { ... },
"cpu": { ... },
"gpu": { ... },
"memory": { ... },
"swap": { ... },
"disk": { ... },
"network": { ... },
"hypervisor": { ... }
}
```
## Dependencies
None. This role is standalone and has no dependencies on other roles.
## Example Playbook
### Basic Usage
```yaml
---
- hosts: all
become: true
roles:
- role: system_info
```
### Custom Statistics Directory
```yaml
---
- hosts: all
become: true
roles:
- role: system_info
vars:
system_info_stats_base_dir: /var/lib/ansible/inventory
```
### Selective Information Gathering
```yaml
---
- hosts: servers
become: true
roles:
- role: system_info
vars:
system_info_gather_cpu: true
system_info_gather_gpu: false
system_info_gather_memory: true
system_info_detect_hypervisor: true
```
### Using Tags for Partial Execution
```bash
# Gather only CPU information
ansible-playbook site.yml -t system_info,cpu
# Gather only hypervisor information
ansible-playbook site.yml -t system_info,hypervisor
# Run validation/health checks only
ansible-playbook site.yml -t system_info,validate
# Skip installation, only gather information
ansible-playbook site.yml -t system_info --skip-tags install
```
## Available Tags
| Tag | Purpose |
|-----|---------|
| `install` | Install required packages |
| `gather` | All information gathering tasks |
| `system` | System and OS information |
| `cpu` | CPU information |
| `gpu` | GPU information |
| `memory` | Memory information |
| `disk` | Disk information |
| `network` | Network information |
| `hypervisor` | Hypervisor detection |
| `export` | Export statistics to JSON |
| `statistics` | Statistics aggregation |
| `validate` | Validation and health checks |
| `health-check` | System health monitoring |
| `security` | Security-related information |
## Security Considerations
### Privileges
- Requires root/sudo access for hardware information gathering
- Uses `become: true` for privileged commands
- DMI/SMBIOS information requires root access
### Sensitive Data
- Serial numbers and UUIDs are collected (can identify specific hardware)
- Network configuration may reveal internal IP addressing
- No secrets or credentials are collected
- All data is stored locally on the control node
### Data Privacy
- Statistics files contain detailed system information
- Restrict access to the statistics directory appropriately
- Consider encryption for the statistics directory if storing sensitive infrastructure details
## Performance Impact
- **Execution Time**: 30-60 seconds per host (depends on hardware complexity)
- **Network Impact**: Minimal - only package installation requires network
- **System Load**: Very low - read-only operations
- **Disk I/O**: Minimal - small JSON files (<100KB typically)
## Troubleshooting
### Common Issues
**Issue**: "dmidecode: command not found"
- **Solution**: Role will install it automatically. Ensure internet access or pre-stage packages.
**Issue**: "Permission denied" errors
- **Solution**: Ensure `become: true` is set in the playbook or role invocation.
**Issue**: SMART data not available
- **Solution**: Not all systems/disks support SMART. This is expected and won't fail the role.
**Issue**: GPU information showing "No GPU detected"
- **Solution**: Normal for VMs and servers without GPUs. Not an error condition.
**Issue**: Hypervisor commands timing out
- **Solution**: Some hypervisor checks may be slow. Increase task timeout if needed.
### Debug Mode
Run with verbose output:
```bash
ansible-playbook site.yml -t system_info -vvv
```
## Compliance Requirements
- Follows CIS Benchmark recommendations for system auditing
- Supports security compliance documentation (NIST, PCI-DSS)
- Enables infrastructure inventory for CMDB integration
- Facilitates capacity planning and resource optimization
## Testing
### Manual Testing
```bash
# Test on a single host
ansible-playbook -i inventory/production site.yml -l testhost -t system_info
# Dry-run mode
ansible-playbook site.yml -t system_info --check
```
### Validation
After execution, verify:
1. Statistics directory created: `./stats/machines/<fqdn>/`
2. JSON file present and valid: `system_info.json`
3. Summary file created: `summary.txt`
4. No errors in Ansible output
## Maintenance
### Updates
- Review and update the role quarterly
- Test against new OS versions before production deployment
- Keep documentation synchronized with code changes
### Monitoring
- Track execution time trends (performance degradation may indicate issues)
- Monitor statistics file sizes (unexpected growth may indicate problems)
- Validate JSON file integrity periodically
## Version History
- **1.0.0** (2025-01-11): Initial release
- Complete system information gathering
- CPU, GPU, RAM, Disk, Network detection
- Hypervisor detection (KVM, Proxmox, LXD, Docker, etc.)
- JSON export with timestamped backups
- Human-readable summary generation
## License
MIT
## Author Information
Created by the Ansible Infrastructure Team for comprehensive system inventory and monitoring.
For issues, questions, or contributions, please refer to the project repository.
## Related Roles
- `system_baseline` - System hardening and baseline configuration
- `monitoring` - System monitoring setup
- `inventory_sync` - Dynamic inventory management
## Additional Resources
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html)
- [Hardware Detection in Linux](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/index.html)
- [Virtualization Detection](https://people.redhat.com/~rjones/virt-what/)