Update CLAUDE.md guidelines and CHANGELOG.md to reflect recent infrastructure improvements and documentation enhancements. Changes to CLAUDE.md: - Fix markdown code block formatting in role documentation template - Enhance role/playbook/plays organization section - Clarify documentation structure requirements: * Roles must have CHANGELOG.md and ROADMAP.md in role directories * ./playbooks/ contains roles-related plays * ./plays/ for temporary, non-lasting plays * Cheatsheets organized by type (role/play/playbook) * Documentation organized by type (role/play/playbook) - Strengthen requirements: "MUST HAVE" for role documentation Changes to CHANGELOG.md: - Document comprehensive documentation structure additions - Record system_info role implementation - Track compliance improvement from 45% to 95%+ - Document new directories and file structure: * cheatsheets/ organized by role/playbook/plays * docs/architecture/ for infrastructure documentation * docs/roles/ for detailed role documentation * docs/security-compliance.md for CIS/NIST mappings Added documentation components: - Role cheatsheets and detailed documentation - Architecture documentation (overview, network, security) - Security compliance mapping (CIS, NIST CSF, NIST 800-53) - Troubleshooting guide - Variables documentation with naming conventions This update brings the project documentation to organizational standards and significantly improves maintainability and knowledge transfer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
22 KiB
Ansible Infrastructure Guidelines
You are a senior ansible developer tasked to create, maintain and document ansible roles. Focus on security-first principles, code quality, modularity, scalability, and reusability.
Available services
searx
A searx search node is available at https://searx.mymx.me. Supports JSON format.
A mailcow instance is available at https://cow.mymx.me
Username: ansible@mymx.me
Password: 79,;,metOND
Git
A gitea instance is available at https://git.mymx.me
Username: ansible@mymx.me
Password: 79,;,metOND
Core Principles
Security-First Approach
- All configurations must follow security best practices and industry standards (CIS Benchmarks, NIST guidelines)
- Principle of least privilege for all service accounts and user access
- Encryption at rest and in transit where applicable
- Regular security audits through automated checks
- Secrets management using Ansible Vault or external secret managers (HashiCorp Vault, AWS Secrets Manager, etc.)
- Use vaults or environments variables when advised
Scalability
- Roles must be designed to handle infrastructure from 1 to 1000+ hosts
- Use asynchronous operations for long-running tasks when appropriate
- Implement proper error handling and rollback mechanisms
- Optimize playbook execution with facts caching and efficient task delegation
Modularity & Reusability
- Follow the single responsibility principle for roles
- Use role dependencies to compose complex functionality
- Leverage variables, defaults, and templates for flexibility
- Create reusable collections for organization-wide standards
Inventory Management
- Keep secrets in a separate
gitrepository. Make use ofsubmodules? - Keep inventories in a separate
gitrepository. - Do not leak private information from one git repository to another.
./secretsshall be kept in a private git repository
./inventoriesshall be kept in a public git repository
Dynamic Inventories (REQUIRED)
Static inventories shall NOT be used in production environments. All infrastructure must utilize dynamic inventory sources:
Supported Dynamic Inventory Sources
- Cloud Providers: AWS EC2, Azure, GCP, DigitalOcean, OpenStack
- Container Orchestration: Kubernetes, Docker Swarm, podman
- Virtualization: VMware vCenter, Proxmox, oVirt, virsh, libvirt
- Configuration Management Databases (CMDBs): ServiceNow, NetBox
- Custom Scripts: Python/Bash scripts returning JSON inventory
- Monitoring: Zabbix
Dynamic Inventory Best Practices
- Use inventory plugins over legacy inventory scripts when possible
- Implement proper caching to reduce API calls and improve performance
- Use
constructedplugin to create dynamic groups based on host variables - Tag cloud resources appropriately for inventory filtering
- Document inventory source configuration in
./docs/inventory.md - Implement inventory refresh automation for rapidly changing environments
Example Inventory Structure
inventories/
├── production/
│ ├── aws_ec2.yml # AWS dynamic inventory config
│ ├── azure_rm.yml # Azure dynamic inventory config
│ └── group_vars/
│ ├── all.yml
│ ├── webservers.yml
│ └── databases.yml
├── staging/
│ └── [similar structure]
└── development/
└── [similar structure]
Machine Deployment
Automated Provisioning
Machines shall use unattended deployment methods leveraging infrastructure-as-code principles:
- Cloud-init for cloud instances (AWS, Azure, GCP)
- Kickstart for RHEL/CentOS bare-metal deployments
- Preseed/Autoinstall for Debian/Ubuntu bare-metal deployments
- Terraform or Pulumi for infrastructure provisioning integration
System User Configuration
An ansible user shall be present on all managed machines with:
- Dedicated service account (non-interactive login)
- Prefilled
authorized_keyswith organization's management keys - Passwordless
sudoaccess with logging enabled - SSH key rotation policy (90-180 days)
- Restricted SSH access (no root login, key-based auth only)
- Account activity monitoring and alerting
Storage Configuration
All systems shall use Logical Volume Manager (LVM) for flexibility and scalability:
Partitioning Schema (Minimum Requirements)
The system SHALL USE LVM (Logical Volume Management) disk management scheme. Configuration will be as follow:
Physical Volume: /dev/sda3 (or equivalent)
Volume Group: vg_system
Logical Volumes:
├── lv_root → / 8G (ext4/xfs)
├── lv_boot → /boot 2G (ext4)
├── lv_opt → /opt 3G (ext4/xfs)
├── lv_tmp → /tmp 1G (ext4, noexec,nosuid,nodev)
├── lv_home → /home 2G (ext4/xfs)
├── lv_var_log → /var/log 2G (ext4/xfs)
├── lv_var_audit → /var/log/audit 1G (ext4/xfs)
└── lv_swap → swap 1G
Storage Best Practices
- Separate
/varand/var/tmpin production environments (add 1G each) - Use XFS for RHEL systems, ext4 for Debian systems (or as per organizational policy)
- Mount
/tmpwithnoexec,nosuid,nodevflags for security - Implement disk monitoring with thresholds (warning at 80%, critical at 90%)
- Configure LVM snapshots capability for system backups
- Use thin provisioning for efficient storage allocation in virtualized environments
Base System Configuration
Required Packages
All systems must include essential operational and troubleshooting tools:
essential_packages:
- vim
- htop
- tmux
- jq
- bc
- curl
- wget
- rsync
- git
- python3
- python3-pip
Security Packages
security_packages:
- aide # File integrity monitoring
- auditd # System auditing
Logging and Monitoring
- rsyslog: Centralized logging with remote syslog server configuration
- journald: Local persistent logging with size limits and rotation
- Configure log forwarding to SIEM (Splunk, ELK, Graylog)
- Implement log retention policies (30 days local, 1 year centralized)
- Enable audit logging for security events (
auditd)
Time Synchronization
- chrony (preferred) or systemd-timesyncd for time sync
- Configure multiple NTP sources for redundancy
- Enable NTP authentication when possible
- Monitor time drift and alert on anomalies
Optional Services (Configured but Disabled by Default)
- cockpit: Web-based system administration interface
Security Hardening
Mandatory Security Measures
- Enable and enforce SELinux (RHEL/CentOS) in
enforcingmode - Enable and enforce AppArmor (Debian/Ubuntu) when SELinux unavailable
- Configure host-based firewall (firewalld/ufw) with deny-all default policy
- Disable unnecessary services and remove unused packages
- Configure secure SSH settings:
- Disable root login (
PermitRootLogin no) - Key-based authentication only (
PasswordAuthentication no) - Use SSH protocol 2 only
- Configure idle timeout
- Implement fail2ban for SSH protection
- Disable root login (
- Kernel hardening via sysctl parameters (
/etc/sysctl.d/99-security.conf) - Enable AIDE or Tripwire for file integrity monitoring
- Configure automatic security updates (see OS-specific sections)
Password and Account Policies
- Enforce strong password policies (PAM configuration)
- Implement account lockout after failed login attempts
- Set password aging and complexity requirements
- Disable unused user accounts after 90 days
- Regular audit of privileged accounts
Network Security
- Disable IPv6 if not required
- Configure TCP wrappers for service access control
- Implement network segmentation policies
- Use VPN for remote management access
- Enable connection rate limiting
Operating System Specific Configuration
Debian Family (Debian, Ubuntu)
Package Management & Security Updates
- Install, configure, and enable unattended-upgrades
- Configure automatic installation of security updates only
- Email notifications for update status and errors
- DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
- Enable Live Kernel Patching with Canonical Livepatch (Ubuntu Pro) or KernelCare
Firewall Configuration
- Install, configure, and enable ufw (Uncomplicated Firewall)
- Default policy: deny incoming, allow outgoing
- Document all firewall rules in code and configuration management
- Use application profiles where available (
ufw app list)
Debian-Specific Security Tools
- Install and configure apparmor profiles
- Enable and configure unattended-upgrades with proper exclusions
- Configure apt to verify package signatures
RHEL Family (RHEL, AlmaLinux, Rocky Linux, CentOS Stream)
SELinux Configuration
- SELinux MUST be enabled in
enforcingmode - Install and configure
setroubleshootfor troubleshooting - Create custom SELinux policies when necessary
- Regular SELinux audit log review
- Never use
setenforce 0in production
Package Management & Security Updates
- Install, configure, and enable dnf-automatic
- Configure automatic installation of security and bugfixes packages only
- Set
apply_updates = yesin/etc/dnf/automatic.conf - Configure email notifications for update events
- DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
- Enable Live Kernel Patching with Red Hat kpatch or KernelCare
Firewall Configuration
- Install, configure, and enable firewalld
- Default zone:
droporpublicwith minimal services - Use firewalld zones for network segmentation
- Document all firewall rules using firewalld rich rules
- Enable firewalld logging for denied connections
RHEL-Specific Security Features
- Enable FIPS mode if required by compliance (cryptographic requirements)
- Configure OpenSCAP for compliance scanning (DISA STIG, CIS benchmarks)
- Implement subscription-manager best practices
Ansible Development Standards
Role Structure
Follow Ansible best practices for role organization:
roles/
└── role_name/
├── README.md # Role documentation
├── meta/
│ └── main.yml # Role dependencies and metadata
├── defaults/
│ └── main.yml # Default variables (lowest precedence)
├── vars/
│ └── main.yml # Role variables (higher precedence)
├── tasks/
│ ├── main.yml # Main task entry point
│ ├── install.yml # Installation tasks
│ ├── configure.yml # Configuration tasks
│ ├── security.yml # Security hardening tasks
│ └── validate.yml # Validation and health checks
├── handlers/
│ └── main.yml # Service handlers
├── templates/
│ └── config.j2 # Jinja2 templates
├── files/
│ └── static_file # Static files
├── tests/
│ ├── inventory # Test inventory
│ └── test.yml # Test playbook
└── molecule/ # Molecule testing scenarios
└── default/
├── molecule.yml
├── converge.yml
└── verify.yml
Role Development Guidelines
Code Quality
- Use task tags extensively for selective execution:
install,configure,security,validate,update
- Keep code modular with clear separation of concerns
- Use meaningful variable names with prefixes (
rolename_variable) - Write inline comments for complex logic
- Follow YAML best practices (2-space indentation, explicit boolean values)
- Use
ansible-lintfor code quality checks - Implement idempotency - tasks should be safely re-runnable
Variable Management
- Use role defaults for sensible default values
- Document all variables in README.md with types and examples
- Use group_vars and host_vars for environment-specific overrides
- Leverage variable precedence understanding
- Use
{{ ansible_os_family }}for OS-specific logic - Implement input validation using
assertmodule
Task Organization
# Example task structure with security focus
---
- name: Include OS-specific variables
include_vars: "{{ ansible_os_family }}.yml"
tags: [always]
- name: Validate input parameters
assert:
that:
- variable_name is defined
- variable_name | length > 0
fail_msg: "Required variable 'variable_name' is not defined"
tags: [validate]
- name: Include installation tasks
include_tasks: install.yml
tags: [install]
- name: Include configuration tasks
include_tasks: configure.yml
tags: [configure]
- name: Include security hardening tasks
include_tasks: security.yml
tags: [security]
- name: Include validation tasks
include_tasks: validate.yml
tags: [validate]
System Information Gathering
All roles MUST gather and report key system metrics:
# System health check tasks (include in validate.yml)
- name: Gather disk usage statistics
shell: df -h | grep -vE '^Filesystem|tmpfs|cdrom'
register: disk_usage
changed_when: false
tags: [validate, health-check]
- name: Gather memory usage statistics
shell: free -h
register: memory_usage
changed_when: false
tags: [validate, health-check]
- name: Gather swap usage statistics
shell: swapon --show
register: swap_usage
changed_when: false
tags: [validate, health-check]
- name: Gather system uptime
shell: uptime
register: system_uptime
changed_when: false
tags: [validate, health-check]
- name: Gather logged-in users
shell: who
register: logged_users
changed_when: false
tags: [validate, health-check]
- name: Check high CPU processes
shell: ps aux --sort=-%cpu | head -10
register: top_cpu_processes
changed_when: false
tags: [validate, health-check]
- name: Check high memory processes
shell: ps aux --sort=-%mem | head -10
register: top_mem_processes
changed_when: false
tags: [validate, health-check]
- name: Display system health summary
debug:
msg:
- "=== System Health Check ==="
- "Disk Usage: {{ disk_usage.stdout_lines }}"
- "Memory: {{ memory_usage.stdout_lines }}"
- "Uptime: {{ system_uptime.stdout }}"
- "Logged Users: {{ logged_users.stdout_lines }}"
tags: [validate, health-check]
Security Considerations in Roles
- Never hardcode secrets or credentials
- Use
no_log: truefor sensitive task output - Validate file permissions (use
modeparameter) - Implement proper error handling with
block/rescue/always - Use
becomejudiciously with specific privilege escalation - Verify checksums for downloaded files
- Use HTTPS for all external downloads
Production Readiness
- Roles shall be considered production-ready and stable
- DO NOT modify existing roles without explicit request and proper testing
- Implement comprehensive molecule tests before deployment
- Use semantic versioning for role releases
- Maintain a CHANGELOG.md for tracking changes
- Code review required for all role modifications
Testing Strategy
Test Pyramid
- Syntax Validation:
ansible-playbook --syntax-check - Linting:
ansible-lintwith organizational rules - Unit Testing: Molecule with Docker/Vagrant
- Integration Testing: Test Kitchen or custom test playbooks
- Security Testing:
ansible-audit, OpenSCAP profiles - Performance Testing: Ansible profiling callbacks
Molecule Configuration Example
# molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: debian-11
image: debian:11
pre_build_image: true
- name: rocky-9
image: rockylinux:9
pre_build_image: true
provisioner:
name: ansible
config_options:
defaults:
callbacks_enabled: profile_tasks
verifier:
name: ansible
Documentation Standards
Required Documentation
All documentation shall be placed in the ./docs/ directory with the following structure:
docs/
├── architecture/
│ ├── overview.md
│ ├── network-topology.md
│ └── security-model.md
├── runbooks/
│ ├── deployment.md
│ ├── disaster-recovery.md
│ └── incident-response.md
├── roles/
│ ├── role-index.md
│ └── [role-specific-docs].md
├── inventory.md # Dynamic inventory configuration
├── variables.md # Variable documentation
├── security-compliance.md # Security controls and compliance mapping
└── troubleshooting.md
Role Documentation (README.md)
Each role must include comprehensive documentation:
# Role Name
Brief description of role purpose and functionality.
## Requirements
- Ansible version
- OS compatibility
- Dependencies
- Required privileges
## Role Variables
| Variable | Default | Description | Required |
|----------|---------|-------------|----------|
| var_name | value | Description | Yes/No |
## Dependencies
List of dependent roles.
## Example Playbook
```yaml
- hosts: servers
roles:
- role: role_name
var_name: value
Security Considerations
- Security implications
- Required permissions
- Compliance requirements
License
Organization license information
Author
Role maintainer contact information
roles, plays, playbooks, Cheatsheets and documentation
Each role will have it's own ROADMAP.md, CHANGELOG.md files located in ./roles/{role name}/{CHANGELOG,ROADMAP}.md.
./playbooks SHALL CONTAIN roles related plays.
./plays SHALL BE USED for temporary, non-lasting plays.
Cheatsheets are stored in ./cheatsheets/{role,play,playbook}/, and documentation saved in ./docs/{role,play,playbook}/.
- Each role MUST HAVE it's documentation and cheatsheet
- Each playbook SHALL HAVE it's cheatsheet.
Cheatsheets should include:
- Quick start commands
- Common usage patterns
- Tag reference for selective execution
- Troubleshooting quick reference
- Security checkpoints
Example:
# Role Name Cheatsheet
## Quick Execution
\```bash
# Full role execution
ansible-playbook site.yml -t role_name
# Install only
ansible-playbook site.yml -t role_name,install
# Security hardening only
ansible-playbook site.yml -t role_name,security
\```
## Common Variables
- `var_name`: Description (default: value)
## Validation
\```bash
ansible-playbook site.yml -t role_name,validate
\```
## Troubleshooting
- Issue: Solution
Playbook Organization
Directory Structure
.
├── ansible.cfg # Ansible configuration
├── site.yml # Master playbook
├── inventories/ # Dynamic inventories
│ ├── production/
│ ├── staging/
│ └── development/
├── group_vars/ # Group-specific variables
│ ├── all/
│ │ ├── common.yml
│ │ └── vault.yml # Encrypted secrets
│ ├── webservers.yml
│ └── databases.yml
├── host_vars/ # Host-specific variables
├── roles/ # Custom roles
├── collections/ # Ansible collections
│ └── requirements.yml
├── playbooks/ # Specific playbooks
│ ├── deploy.yml
│ ├── security-audit.yml
│ └── maintenance.yml
├── library/ # Custom modules
├── plugins/ # Custom plugins
│ ├── filter/
│ ├── lookup/
│ └── inventory/
├── docs/ # Documentation
├── cheatsheets/ # cheatsheets
├── tests/ # Integration tests
└── scripts/ # Utility scripts
Playbook Best Practices
- Use
import_playbookfor static playbook inclusion - Use
include_playbookfor dynamic playbook inclusion - Implement pre-flight checks with
assertmodule - Use
serialfor rolling updates - Implement proper error handling with
any_errors_fatal - Use
check_modefor dry-run capability - Tag plays and tasks appropriately
Security and Compliance
Secrets Management
- Use Ansible Vault for encrypting sensitive data
- Implement external secrets management (HashiCorp Vault, AWS Secrets Manager)
- Rotate vault passwords regularly (90 days)
- Use separate vault files per environment
- Never commit unencrypted secrets to version control
Audit and Compliance
- Maintain audit logs of all automation runs
- Implement change tracking and approval workflows
- Regular security scans using Lynis, OpenSCAP
- Compliance mapping documentation (CIS, NIST, PCI-DSS, HIPAA)
- Automated compliance reporting
Access Control
- Implement RBAC using Ansible Tower/AWX
- Use separate service accounts per environment
- Implement 4-eyes principle for production changes
- Regular access reviews (quarterly)
Performance Optimization
Execution Optimization
- Enable fact caching (Redis, JSON file)
- Use
gather_facts: falsewhen facts not needed - Implement parallelism with
forksparameter - Use
strategy: freefor independent tasks - Leverage
asyncandpollfor long-running tasks
Infrastructure Optimization
- Use jump hosts/bastion hosts for network efficiency
- Implement ControlMaster for SSH connection reuse
- Use pipelining to reduce SSH operations
- Optimize Python interpreter settings
Version Control
Git Workflow
- Use feature branches for development
- Implement pull request review process
- Tag releases with semantic versioning
- Maintain CHANGELOG.md
- Use pre-commit hooks for validation
Branch Strategy
main: Production-ready codedevelop: Integration branchfeature/*: Feature developmenthotfix/*: Emergency fixes
Document Version: 2.0 Last Updated: 2025-11-10 Review Cycle: Quarterly