# System Information Gathering Role Documentation ## Overview The `system_info` role provides comprehensive hardware and software inventory capabilities for infrastructure automation. It collects detailed metrics about CPU, GPU, memory, storage, network, and virtualization/hypervisor configurations. ## Purpose - **Infrastructure Inventory**: Maintain up-to-date hardware and software inventory - **Capacity Planning**: Track resource utilization and plan for scaling - **Compliance Documentation**: Support audit requirements with detailed system information - **Troubleshooting**: Provide baseline configuration data for issue resolution - **Monitoring Integration**: Feed data into monitoring and CMDB systems ## Architecture ### Data Collection Flow ``` ┌─────────────────┐ │ Ansible Facts │ │ (gathered) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Hardware Info │──────▶│ CPU Details │ │ Collection │ │ GPU Detection │ │ │ │ Memory Info │ └────────┬────────┘ │ Disk Layout │ │ └──────────────────┘ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Hypervisor │──────▶│ KVM/Libvirt │ │ Detection │ │ Proxmox VE │ │ │ │ LXD/Docker │ └────────┬────────┘ │ VMware/Hyper-V │ │ └──────────────────┘ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Aggregation │──────▶│ JSON Export │ │ & Export │ │ Summary Report │ │ │ │ Timestamped │ └─────────────────┘ └──────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ ./stats/machines// │ │ ├── system_info.json │ │ ├── system_info_.json │ │ └── summary.txt │ └─────────────────────────────────────┘ ``` ### Task Organization The role is organized into modular task files: - `main.yml`: Orchestration and task inclusion - `install.yml`: Package installation (OS-specific) - `gather_system.yml`: OS and system information - `gather_cpu.yml`: CPU details and capabilities - `gather_gpu.yml`: GPU detection and details - `gather_memory.yml`: Memory and swap information - `gather_disk.yml`: Disk, LVM, and RAID information - `gather_network.yml`: Network interfaces and configuration - `detect_hypervisor.yml`: Virtualization platform detection - `export_stats.yml`: JSON aggregation and export - `validate.yml`: Health checks and validation ## Integration Points ### With Other Roles The `system_info` role can be used in conjunction with: - **Monitoring roles**: Feed collected data into Prometheus, Grafana, or other monitoring systems - **CMDB integration**: Export to ServiceNow, NetBox, or other CMDBs - **Capacity planning tools**: Provide data for capacity analysis - **Compliance scanning**: Support CIS, NIST, or custom compliance checks ### With External Systems #### Example: Export to NetBox ```yaml - name: Sync to NetBox CMDB hosts: all tasks: - name: Include system_info role include_role: name: system_info - name: Push to NetBox uri: url: "https://netbox.example.com/api/dcim/devices/" method: POST body_format: json headers: Authorization: "Token {{ netbox_api_token }}" body: name: "{{ ansible_fqdn }}" device_type: "{{ system_info_hardware.product }}" custom_fields: cpu_model: "{{ system_info_cpu.model }}" memory_mb: "{{ system_info_memory.total_mb }}" delegate_to: localhost ``` #### Example: Prometheus Exporter ```yaml - name: Export metrics for Prometheus copy: content: | # HELP system_info_cpu_count Number of CPU cores # TYPE system_info_cpu_count gauge system_info_cpu_count{host="{{ ansible_fqdn }}"} {{ system_info_cpu.count.vcpus }} # HELP system_info_memory_total_mb Total memory in MB # TYPE system_info_memory_total_mb gauge system_info_memory_total_mb{host="{{ ansible_fqdn }}"} {{ system_info_memory.total_mb }} dest: "/var/lib/node_exporter/textfile_collector/system_info.prom" delegate_to: "{{ ansible_fqdn }}" ``` ## Data Dictionary ### JSON Schema The exported JSON follows this structure: ```json { "collection_info": { "timestamp": "ISO8601 datetime", "timestamp_epoch": "Unix epoch", "collected_by": "ansible", "role_version": "semver", "ansible_version": "version string" }, "host_info": { "hostname": "short hostname", "fqdn": "fully qualified domain name", "uptime": "human readable uptime", "boot_time": "boot timestamp" }, "system": { "distribution": "OS name", "distribution_version": "version", "distribution_release": "codename", "distribution_major_version": "major version", "os_family": "Debian|RedHat" }, "kernel": { "version": "kernel version", "architecture": "x86_64|aarch64|etc" }, "hardware": { "manufacturer": "hardware vendor", "product": "product name", "serial": "serial number", "uuid": "system UUID" }, "security": { "selinux": "Enforcing|Permissive|Disabled|N/A", "apparmor": "Enabled|Disabled|N/A" }, "cpu": { /* detailed CPU information */ }, "gpu": { /* GPU detection and details */ }, "memory": { /* memory statistics */ }, "swap": { /* swap configuration */ }, "disk": { /* disk and storage information */ }, "network": { /* network configuration */ }, "hypervisor": { /* virtualization details */ } } ``` ## Use Cases ### 1. Infrastructure Audit Generate a complete inventory of all infrastructure: ```bash # Gather information from all hosts ansible-playbook playbooks/gather_system_info.yml # Generate CSV report jq -r '["FQDN","OS","CPU","Memory","Disk","Hypervisor"], ([.host_info.fqdn, .system.distribution, .cpu.model, (.memory.total_mb|tostring), (.disk.physical_disks|length|tostring), (.hypervisor.is_hypervisor|tostring)]) | @csv' \ stats/machines/*/system_info.json > infrastructure_inventory.csv ``` ### 2. License Compliance Track CPU cores for license management: ```bash # Count total CPU cores across infrastructure jq -s 'map(.cpu.count.total_cores | tonumber) | add' \ stats/machines/*/system_info.json ``` ### 3. Capacity Planning Identify hosts nearing resource limits: ```bash # Find hosts with >80% memory usage jq -r 'select(.memory.usage_percent > 80) | "\(.host_info.fqdn): \(.memory.usage_percent)%"' \ stats/machines/*/system_info.json # Find hosts with low disk space jq -r 'select(.disk.usage_human[] | contains("9[0-9]%") or contains("100%")) | .host_info.fqdn' \ stats/machines/*/system_info.json ``` ### 4. Hypervisor Inventory List all hypervisors and their VM counts: ```bash # KVM/Libvirt hypervisors jq -r 'select(.hypervisor.kvm_libvirt.installed == true) | "\(.host_info.fqdn): \(.hypervisor.kvm_libvirt.running_vms) running, \(.hypervisor.kvm_libvirt.total_vms) total"' \ stats/machines/*/system_info.json # Proxmox hosts jq -r 'select(.hypervisor.proxmox.installed == true) | "\(.host_info.fqdn): \(.hypervisor.proxmox.version)"' \ stats/machines/*/system_info.json ``` ### 5. Security Compliance Verify SELinux/AppArmor status: ```bash # Check SELinux enforcement jq -r 'select(.security.selinux != "Enforcing" and .security.selinux != "N/A") | "\(.host_info.fqdn): SELinux is \(.security.selinux)"' \ stats/machines/*/system_info.json # List CPU vulnerabilities jq -r '"\(.host_info.fqdn):", .cpu.vulnerabilities[]' \ stats/machines/*/system_info.json ``` ## Performance Considerations ### Execution Time Typical execution times per host: - **Minimal gathering** (CPU, memory only): 15-20 seconds - **Standard gathering** (all defaults): 30-45 seconds - **Comprehensive** (with raw outputs): 45-60 seconds Factors affecting performance: - Number of network interfaces - Number of disk devices - Hypervisor API response time - SMART disk scanning (slowest component) ### Optimization Strategies 1. **Parallel execution**: Use `-f` flag to increase parallelism ```bash ansible-playbook site.yml -t system_info -f 20 ``` 2. **Skip slow components**: Disable unnecessary gathering ```yaml system_info_gather_network: false # Skip if not needed ``` 3. **Cache facts**: Enable fact caching in ansible.cfg ```ini [defaults] fact_caching = jsonfile fact_caching_connection = /tmp/ansible_facts fact_caching_timeout = 3600 ``` ## Security Best Practices ### Data Protection - **Sensitive information**: Statistics include serial numbers, UUIDs, and network topology - **Access control**: Restrict read access to statistics directory - **Encryption**: Consider encrypting the statistics directory for sensitive environments - **Retention**: Implement rotation policy for timestamped backups ### Execution Security - **Privilege escalation**: Role requires sudo/root for hardware information - **Audit logging**: All executions are logged via Ansible - **Read-only**: Role performs no modifications to managed systems - **No secrets**: Role does not collect or expose credentials ## Troubleshooting Guide ### Common Problems #### Problem: "Package installation failed" **Symptoms**: Role fails during install phase **Cause**: No internet access or repository issues **Solution**: ```bash # Pre-install packages manually ansible all -m package -a "name=lshw,dmidecode,pciutils state=present" --become # Or skip installation ansible-playbook site.yml -t system_info --skip-tags install ``` #### Problem: "Statistics directory not created" **Symptoms**: No output files generated **Cause**: Permission issues on control node **Solution**: ```bash # Check permissions mkdir -p ./stats/machines chmod 755 ./stats/machines # Or specify writable directory ansible-playbook site.yml -e "system_info_stats_base_dir=/tmp/stats" ``` #### Problem: "Invalid JSON output" **Symptoms**: jq reports parsing errors **Cause**: Incomplete execution or disk full **Solution**: ```bash # Validate JSON files for f in ./stats/machines/*/system_info.json; do jq empty "$f" 2>&1 || echo "Invalid: $f" done # Re-run for failed hosts ansible-playbook site.yml -l failed_host -t system_info ``` ## Maintenance ### Regular Updates - **Quarterly review**: Update role for new hypervisor versions - **OS compatibility**: Test with new OS releases - **Package updates**: Verify new package versions don't break collection - **Documentation**: Keep examples and use cases current ### Monitoring Track role health metrics: - Execution success rate - Average execution time - Output file sizes - JSON validation failures ### Backup Strategy ```bash # Daily backup of statistics 0 3 * * * tar -czf /backup/ansible-stats-$(date +\%Y\%m\%d).tar.gz \ /opt/ansible/stats/machines/ # Cleanup old backups (keep 30 days) 0 4 * * * find /backup/ansible-stats-*.tar.gz -mtime +30 -delete ``` ## Advanced Usage ### Custom Filters Create custom Ansible filters for data processing: ```python # filter_plugins/system_info_filters.py def format_memory(value_mb): """Convert MB to human readable format""" if value_mb < 1024: return f"{value_mb} MB" elif value_mb < 1048576: return f"{value_mb/1024:.1f} GB" else: return f"{value_mb/1048576:.1f} TB" class FilterModule(object): def filters(self): return { 'format_memory': format_memory } ``` ### Dynamic Inventory Integration Use collected data for dynamic grouping: ```python # inventory_plugins/system_info_inventory.py # Create dynamic groups based on collected information import json import glob groups = { 'hypervisors': [], 'virtual_machines': [], 'high_memory': [], 'gpu_enabled': [] } for stats_file in glob.glob('stats/machines/*/system_info.json'): with open(stats_file) as f: data = json.load(f) fqdn = data['host_info']['fqdn'] if data['hypervisor']['is_hypervisor']: groups['hypervisors'].append(fqdn) if data['hypervisor']['is_virtual']: groups['virtual_machines'].append(fqdn) if data['memory']['total_mb'] > 64000: groups['high_memory'].append(fqdn) if data['gpu']['detected']: groups['gpu_enabled'].append(fqdn) ``` ## Related Documentation - [Main README](../../roles/system_info/README.md) - [Cheatsheet](../../cheatsheets/system_info.md) - [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html) ## Changelog See role README.md for version history and changes. --- **Document Version**: 1.0.0 **Last Updated**: 2025-01-11 **Maintained By**: Ansible Infrastructure Team