# System Maintenance Playbook Cheatsheet Quick reference for using the system maintenance playbook. ## Quick Start ```bash # Run maintenance on all hosts ansible-playbook playbooks/maintenance.yml # Maintenance on specific environment ansible-playbook -i inventories/staging playbooks/maintenance.yml # Check mode (dry-run) ansible-playbook playbooks/maintenance.yml --check ``` ## Common Usage ### Security Updates Only (Default) ```bash # Update all hosts with security patches ansible-playbook playbooks/maintenance.yml # Specific environment ansible-playbook -i inventories/production playbooks/maintenance.yml # Specific host group ansible-playbook playbooks/maintenance.yml --limit webservers ``` ### Full System Upgrade ```bash # CAUTION: Full upgrade including non-security updates ansible-playbook playbooks/maintenance.yml \ --tags updates \ --extra-vars "maintenance_security_only=false" ``` ### Selective Maintenance ```bash # Package updates only ansible-playbook playbooks/maintenance.yml --tags updates # Cleanup only (no updates) ansible-playbook playbooks/maintenance.yml --tags cleanup # System optimization only ansible-playbook playbooks/maintenance.yml --tags optimize # Verification only ansible-playbook playbooks/maintenance.yml --tags verify ``` ## Available Tags | Tag | Description | |-----|-------------| | `updates` | Package updates (security only by default) | | `cleanup` | Disk cleanup and log rotation | | `optimize` | System optimization | | `verify` | Post-maintenance verification | | `reboot` | System reboot (requires --tags reboot) | ## Extra Variables | Variable | Default | Description | |----------|---------|-------------| | `maintenance_security_only` | `true` | Only install security updates | | `maintenance_autoremove` | `true` | Remove unused packages | | `maintenance_serial` | `100%` | Parallelism control | ## Maintenance Tasks ### Package Updates - ✅ Security updates (Debian/Ubuntu) - ✅ Security updates (RHEL family) - ✅ Auto-remove unused packages - ✅ Clean package cache ### Cleanup Tasks - ✅ Force log rotation - ✅ Find old log files (30+ days) - ✅ Clean /tmp directory (10+ days) - ✅ Clean /var/tmp (30+ days) - ✅ Vacuum systemd journal (30 days) - ✅ Docker cleanup (if installed) - ✅ Podman cleanup (if installed) ### Optimization - ✅ Update locate database - ✅ Sync filesystem caches ### Verification - ✅ Check disk usage - ✅ Check memory usage - ✅ Verify critical services - ✅ Check if reboot required ## Reboot Management ### Check Reboot Status ```bash # Run maintenance and check reboot status ansible-playbook playbooks/maintenance.yml # Look for: "Reboot required: true" in output ``` ### Perform Reboot ```bash # WARNING: This will reboot hosts one at a time! ansible-playbook playbooks/maintenance.yml --tags reboot # Reboot specific environment ansible-playbook -i inventories/staging playbooks/maintenance.yml --tags reboot # Control reboot parallelism ansible-playbook playbooks/maintenance.yml --tags reboot \ --extra-vars "maintenance_serial=1" ``` ## Serial Execution Control how many hosts are updated simultaneously: ```bash # Update all hosts in parallel (default) ansible-playbook playbooks/maintenance.yml # Update one host at a time ansible-playbook playbooks/maintenance.yml \ --extra-vars "maintenance_serial=1" # Update 25% of hosts at a time ansible-playbook playbooks/maintenance.yml \ --extra-vars "maintenance_serial=25%" ``` ## Output and Logs Logs saved to: `./logs/maintenance//_maintenance.log` ## Example Output ``` ========================================= Maintenance Summary ========================================= Host: webserver01 Environment: production Completed: 2025-01-11T10:30:00Z === Updates === Packages updated: true === Cleanup === Old logs found: 42 Journal cleaned: Yes === System State === Disk usage after: /dev/sda1 50G 25G 25G 50% / === Reboot Status === Reboot required: false ========================================= ``` ## Troubleshooting ### Package updates fail Check update repositories: ```bash # Debian/Ubuntu ansible all -m shell -a "apt update" # RHEL/CentOS ansible all -m shell -a "dnf check-update" ``` ### Disk space warnings Free up space manually before maintenance: ```bash ansible-playbook playbooks/maintenance.yml --tags cleanup ``` ### Service not running after update Check service status: ```bash ansible all -m shell -a "systemctl status " ``` ## Scheduling Maintenance ### Cron Example ```bash # Daily security updates at 2 AM 0 2 * * * cd /opt/ansible && ansible-playbook playbooks/maintenance.yml ``` ### SystemD Timer Example ```ini # /etc/systemd/system/ansible-maintenance.timer [Unit] Description=Ansible Maintenance [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target ``` ## Best Practices 1. **Test in staging first** - Always run in staging before production 2. **Monitor during updates** - Watch for failures 3. **Check reboot requirements** - Plan reboots during maintenance windows 4. **Review logs** - Check maintenance logs for issues 5. **Use serial execution** for production - Update hosts gradually 6. **Schedule appropriately** - Run during low-traffic periods ## Quick Reference Commands ```bash # Dry-run (no changes) ansible-playbook playbooks/maintenance.yml --check # Staging environment ansible-playbook -i inventories/staging playbooks/maintenance.yml # Production (one host at a time) ansible-playbook -i inventories/production playbooks/maintenance.yml \ --extra-vars "maintenance_serial=1" # Updates only, no cleanup ansible-playbook playbooks/maintenance.yml --tags updates # Full upgrade (non-security too) ansible-playbook playbooks/maintenance.yml \ --extra-vars "maintenance_security_only=false" # Cleanup only ansible-playbook playbooks/maintenance.yml --tags cleanup # Check if reboot needed ansible-playbook playbooks/maintenance.yml --tags verify # Reboot if needed ansible-playbook playbooks/maintenance.yml --tags reboot ``` ## See Also - [Maintenance Playbook](../../playbooks/maintenance.yml) - [Backup Playbook](../../playbooks/backup.yml) - [CLAUDE.md Guidelines](../../CLAUDE.md)