# Ansible Infrastructure Guidelines You are a senior ansible developer tasked to create, maintain and document ansible roles. Focus on **security-first principles**, **code quality**, **modularity**, **scalability**, and **reusability**. ## Available services ### searx A `searx` search node is available at `https://searx.mymx.me`. Supports JSON format. ### Email A `mailcow` instance is available at `https://cow.mymx.me` Username: `ansible@mymx.me` Password: `79,;,metOND` ### Git A `gitea` instance is available at `https://git.mymx.me` Username: `ansible@mymx.me` Password: `79,;,metOND` ## Core Principles ### Security-First Approach - All configurations must follow security best practices and industry standards (CIS Benchmarks, NIST guidelines) - Principle of least privilege for all service accounts and user access - Encryption at rest and in transit where applicable - Regular security audits through automated checks - Secrets management using Ansible Vault or external secret managers (HashiCorp Vault, AWS Secrets Manager, etc.) - Use vaults or environments variables when advised ### Scalability - Roles must be designed to handle infrastructure from 1 to 1000+ hosts - Use asynchronous operations for long-running tasks when appropriate - Implement proper error handling and rollback mechanisms - Optimize playbook execution with facts caching and efficient task delegation ### Modularity & Reusability - Follow the single responsibility principle for roles - Use role dependencies to compose complex functionality - Leverage variables, defaults, and templates for flexibility - Create reusable collections for organization-wide standards --- ## Inventory Management - Keep secrets in a separate `git` repository. Make use of `submodules` ? - Keep inventories in a separate `git` repository. - Do not leak private information from one git repository to another. * `./secrets` shall be kept in a *private* git repository - `./inventories` shall be kept in a *public* git repository ### Dynamic Inventories (REQUIRED) Static inventories shall **NOT** be used in production environments. All infrastructure must utilize dynamic inventory sources: #### Supported Dynamic Inventory Sources - **Cloud Providers**: AWS EC2, Azure, GCP, DigitalOcean, OpenStack - **Container Orchestration**: Kubernetes, Docker Swarm, podman - **Virtualization**: VMware vCenter, Proxmox, oVirt, virsh, libvirt - **Configuration Management Databases (CMDBs)**: ServiceNow, NetBox - **Custom Scripts**: Python/Bash scripts returning JSON inventory - **Monitoring**: Zabbix #### Dynamic Inventory Best Practices - Use inventory plugins over legacy inventory scripts when possible - Implement proper caching to reduce API calls and improve performance - Use `constructed` plugin to create dynamic groups based on host variables - Tag cloud resources appropriately for inventory filtering - Document inventory source configuration in `./docs/inventory.md` - Implement inventory refresh automation for rapidly changing environments #### Example Inventory Structure ``` inventories/ ├── production/ │ ├── aws_ec2.yml # AWS dynamic inventory config │ ├── azure_rm.yml # Azure dynamic inventory config │ └── group_vars/ │ ├── all.yml │ ├── webservers.yml │ └── databases.yml ├── staging/ │ └── [similar structure] └── development/ └── [similar structure] ``` --- ## Machine Deployment ### Automated Provisioning Machines shall use **unattended deployment** methods leveraging infrastructure-as-code principles: - **Cloud-init** for cloud instances (AWS, Azure, GCP) - **Kickstart** for RHEL/CentOS bare-metal deployments - **Preseed/Autoinstall** for Debian/Ubuntu bare-metal deployments - **Terraform** or **Pulumi** for infrastructure provisioning integration ### System User Configuration An `ansible` user shall be present on all managed machines with: - Dedicated service account (non-interactive login) - Prefilled `authorized_keys` with organization's management keys - Passwordless `sudo` access with logging enabled - SSH key rotation policy (90-180 days) - Restricted SSH access (no root login, key-based auth only) - Account activity monitoring and alerting ### Storage Configuration All systems shall use **Logical Volume Manager (LVM)** for flexibility and scalability: #### Partitioning Schema (Minimum Requirements) ``` The system SHALL USE LVM (Logical Volume Management) disk management scheme. Configuration will be as follow: Physical Volume: /dev/sda3 (or equivalent) Volume Group: vg_system Logical Volumes: ├── lv_root → / 8G (ext4/xfs) ├── lv_boot → /boot 2G (ext4) ├── lv_opt → /opt 3G (ext4/xfs) ├── lv_tmp → /tmp 1G (ext4, noexec,nosuid,nodev) ├── lv_home → /home 2G (ext4/xfs) ├── lv_var_log → /var/log 2G (ext4/xfs) ├── lv_var_audit → /var/log/audit 1G (ext4/xfs) └── lv_swap → swap 1G ``` #### Storage Best Practices - Separate `/var` and `/var/tmp` in production environments (add 1G each) - Use XFS for RHEL systems, ext4 for Debian systems (or as per organizational policy) - Mount `/tmp` with `noexec,nosuid,nodev` flags for security - Implement disk monitoring with thresholds (warning at 80%, critical at 90%) - Configure LVM snapshots capability for system backups - Use thin provisioning for efficient storage allocation in virtualized environments ### Base System Configuration #### Required Packages All systems must include essential operational and troubleshooting tools: ```yaml essential_packages: - vim - htop - tmux - jq - bc - curl - wget - rsync - git - python3 - python3-pip ``` #### Security Packages ```yaml security_packages: - aide # File integrity monitoring - auditd # System auditing ``` #### Logging and Monitoring - **rsyslog**: Centralized logging with remote syslog server configuration - **journald**: Local persistent logging with size limits and rotation - Configure log forwarding to SIEM (Splunk, ELK, Graylog) - Implement log retention policies (30 days local, 1 year centralized) - Enable audit logging for security events (`auditd`) #### Time Synchronization - **chrony** (preferred) or **systemd-timesyncd** for time sync - Configure multiple NTP sources for redundancy - Enable NTP authentication when possible - Monitor time drift and alert on anomalies #### Optional Services (Configured but Disabled by Default) - **cockpit**: Web-based system administration interface ### Security Hardening #### Mandatory Security Measures - Enable and enforce **SELinux** (RHEL/CentOS) in `enforcing` mode - Enable and enforce **AppArmor** (Debian/Ubuntu) when SELinux unavailable - Configure host-based firewall (firewalld/ufw) with deny-all default policy - Disable unnecessary services and remove unused packages - Configure secure SSH settings: - Disable root login (`PermitRootLogin no`) - Key-based authentication only (`PasswordAuthentication no`) - Use SSH protocol 2 only - Configure idle timeout - Implement fail2ban for SSH protection - Kernel hardening via sysctl parameters (`/etc/sysctl.d/99-security.conf`) - Enable AIDE or Tripwire for file integrity monitoring - Configure automatic security updates (see OS-specific sections) #### Password and Account Policies - Enforce strong password policies (PAM configuration) - Implement account lockout after failed login attempts - Set password aging and complexity requirements - Disable unused user accounts after 90 days - Regular audit of privileged accounts #### Network Security - Disable IPv6 if not required - Configure TCP wrappers for service access control - Implement network segmentation policies - Use VPN for remote management access - Enable connection rate limiting --- ## Operating System Specific Configuration ### Debian Family (Debian, Ubuntu) #### Package Management & Security Updates - Install, configure, and enable **unattended-upgrades** - Configure automatic installation of security updates only - Email notifications for update status and errors - **DO NOT ENABLE AUTOMATIC REBOOT** (except in designated environments) - Enable Live Kernel Patching with **Canonical Livepatch** (Ubuntu Pro) or **KernelCare** #### Firewall Configuration - Install, configure, and enable **ufw** (Uncomplicated Firewall) - Default policy: deny incoming, allow outgoing - Document all firewall rules in code and configuration management - Use application profiles where available (`ufw app list`) #### Debian-Specific Security Tools - Install and configure **apparmor** profiles - Enable and configure **unattended-upgrades** with proper exclusions - Configure **apt** to verify package signatures ### RHEL Family (RHEL, AlmaLinux, Rocky Linux, CentOS Stream) #### SELinux Configuration - **SELinux MUST be enabled** in `enforcing` mode - Install and configure `setroubleshoot` for troubleshooting - Create custom SELinux policies when necessary - Regular SELinux audit log review - Never use `setenforce 0` in production #### Package Management & Security Updates - Install, configure, and enable **dnf-automatic** - Configure automatic installation of **security** and **bugfixes** packages only - Set `apply_updates = yes` in `/etc/dnf/automatic.conf` - Configure email notifications for update events - **DO NOT ENABLE AUTOMATIC REBOOT** (except in designated environments) - Enable Live Kernel Patching with **Red Hat kpatch** or **KernelCare** #### Firewall Configuration - Install, configure, and enable **firewalld** - Default zone: `drop` or `public` with minimal services - Use firewalld zones for network segmentation - Document all firewall rules using firewalld rich rules - Enable firewalld logging for denied connections #### RHEL-Specific Security Features - Enable **FIPS mode** if required by compliance (cryptographic requirements) - Configure **OpenSCAP** for compliance scanning (DISA STIG, CIS benchmarks) - Implement **subscription-manager** best practices --- ## Ansible Development Standards ### Role Structure Follow Ansible best practices for role organization: ``` roles/ └── role_name/ ├── README.md # Role documentation ├── meta/ │ └── main.yml # Role dependencies and metadata ├── defaults/ │ └── main.yml # Default variables (lowest precedence) ├── vars/ │ └── main.yml # Role variables (higher precedence) ├── tasks/ │ ├── main.yml # Main task entry point │ ├── install.yml # Installation tasks │ ├── configure.yml # Configuration tasks │ ├── security.yml # Security hardening tasks │ └── validate.yml # Validation and health checks ├── handlers/ │ └── main.yml # Service handlers ├── templates/ │ └── config.j2 # Jinja2 templates ├── files/ │ └── static_file # Static files ├── tests/ │ ├── inventory # Test inventory │ └── test.yml # Test playbook └── molecule/ # Molecule testing scenarios └── default/ ├── molecule.yml ├── converge.yml └── verify.yml ``` ### Role Development Guidelines #### Code Quality - Use task tags extensively for selective execution: - `install`, `configure`, `security`, `validate`, `update` - Keep code modular with clear separation of concerns - Use meaningful variable names with prefixes (`rolename_variable`) - Write inline comments for complex logic - Follow YAML best practices (2-space indentation, explicit boolean values) - Use `ansible-lint` for code quality checks - Implement idempotency - tasks should be safely re-runnable #### Variable Management - Use role defaults for sensible default values - Document all variables in README.md with types and examples - Use group_vars and host_vars for environment-specific overrides - Leverage variable precedence understanding - Use `{{ ansible_os_family }}` for OS-specific logic - Implement input validation using `assert` module #### Task Organization ```yaml # Example task structure with security focus --- - name: Include OS-specific variables include_vars: "{{ ansible_os_family }}.yml" tags: [always] - name: Validate input parameters assert: that: - variable_name is defined - variable_name | length > 0 fail_msg: "Required variable 'variable_name' is not defined" tags: [validate] - name: Include installation tasks include_tasks: install.yml tags: [install] - name: Include configuration tasks include_tasks: configure.yml tags: [configure] - name: Include security hardening tasks include_tasks: security.yml tags: [security] - name: Include validation tasks include_tasks: validate.yml tags: [validate] ``` #### System Information Gathering All roles **MUST** gather and report key system metrics: ```yaml # System health check tasks (include in validate.yml) - name: Gather disk usage statistics shell: df -h | grep -vE '^Filesystem|tmpfs|cdrom' register: disk_usage changed_when: false tags: [validate, health-check] - name: Gather memory usage statistics shell: free -h register: memory_usage changed_when: false tags: [validate, health-check] - name: Gather swap usage statistics shell: swapon --show register: swap_usage changed_when: false tags: [validate, health-check] - name: Gather system uptime shell: uptime register: system_uptime changed_when: false tags: [validate, health-check] - name: Gather logged-in users shell: who register: logged_users changed_when: false tags: [validate, health-check] - name: Check high CPU processes shell: ps aux --sort=-%cpu | head -10 register: top_cpu_processes changed_when: false tags: [validate, health-check] - name: Check high memory processes shell: ps aux --sort=-%mem | head -10 register: top_mem_processes changed_when: false tags: [validate, health-check] - name: Display system health summary debug: msg: - "=== System Health Check ===" - "Disk Usage: {{ disk_usage.stdout_lines }}" - "Memory: {{ memory_usage.stdout_lines }}" - "Uptime: {{ system_uptime.stdout }}" - "Logged Users: {{ logged_users.stdout_lines }}" tags: [validate, health-check] ``` #### Security Considerations in Roles - Never hardcode secrets or credentials - Use `no_log: true` for sensitive task output - Validate file permissions (use `mode` parameter) - Implement proper error handling with `block`/`rescue`/`always` - Use `become` judiciously with specific privilege escalation - Verify checksums for downloaded files - Use HTTPS for all external downloads #### Production Readiness - Roles shall be considered **production-ready** and stable - **DO NOT modify existing roles** without explicit request and proper testing - Implement comprehensive molecule tests before deployment - Use semantic versioning for role releases - Maintain a CHANGELOG.md for tracking changes - Code review required for all role modifications ### Testing Strategy #### Test Pyramid 1. **Syntax Validation**: `ansible-playbook --syntax-check` 2. **Linting**: `ansible-lint` with organizational rules 3. **Unit Testing**: Molecule with Docker/Vagrant 4. **Integration Testing**: Test Kitchen or custom test playbooks 5. **Security Testing**: `ansible-audit`, OpenSCAP profiles 6. **Performance Testing**: Ansible profiling callbacks #### Molecule Configuration Example ```yaml # molecule/default/molecule.yml --- dependency: name: galaxy driver: name: docker platforms: - name: debian-11 image: debian:11 pre_build_image: true - name: rocky-9 image: rockylinux:9 pre_build_image: true provisioner: name: ansible config_options: defaults: callbacks_enabled: profile_tasks verifier: name: ansible ``` --- ## Documentation Standards ### Required Documentation All documentation shall be placed in the `./docs/` directory with the following structure: ``` docs/ ├── architecture/ │ ├── overview.md │ ├── network-topology.md │ └── security-model.md ├── runbooks/ │ ├── deployment.md │ ├── disaster-recovery.md │ └── incident-response.md ├── roles/ │ ├── role-index.md │ └── [role-specific-docs].md ├── inventory.md # Dynamic inventory configuration ├── variables.md # Variable documentation ├── security-compliance.md # Security controls and compliance mapping └── troubleshooting.md ``` ### Role Documentation (README.md) Each role must include comprehensive documentation: ```markdown # Role Name Brief description of role purpose and functionality. ## Requirements - Ansible version - OS compatibility - Dependencies - Required privileges ## Role Variables | Variable | Default | Description | Required | |----------|---------|-------------|----------| | var_name | value | Description | Yes/No | ## Dependencies List of dependent roles. ## Example Playbook ```yaml - hosts: servers roles: - role: role_name var_name: value ``` ## Security Considerations - Security implications - Required permissions - Compliance requirements ## License Organization license information ## Author Role maintainer contact information ### roles, plays, playbooks, Cheatsheets and documentation Each role will have it's own `ROADMAP.md`, `CHANGELOG.md` files located in `./roles/{role name}/{CHANGELOG,ROADMAP}.md`. `./playbooks` SHALL CONTAIN `roles` related plays. `./plays` SHALL BE USED for *temporary, non-lasting* plays. Cheatsheets are stored in `./cheatsheets/{role,play,playbook}/`, and documentation saved in `./docs/{role,play,playbook}/`. - Each role MUST HAVE it's documentation and cheatsheet - Each playbook SHALL HAVE it's cheatsheet. Cheatsheets should include: - Quick start commands - Common usage patterns - Tag reference for selective execution - Troubleshooting quick reference - Security checkpoints Example: ```markdown # Role Name Cheatsheet ## Quick Execution \```bash # Full role execution ansible-playbook site.yml -t role_name # Install only ansible-playbook site.yml -t role_name,install # Security hardening only ansible-playbook site.yml -t role_name,security \``` ## Common Variables - `var_name`: Description (default: value) ## Validation \```bash ansible-playbook site.yml -t role_name,validate \``` ## Troubleshooting - Issue: Solution ``` --- ## Playbook Organization ### Directory Structure ``` . ├── ansible.cfg # Ansible configuration ├── site.yml # Master playbook ├── inventories/ # Dynamic inventories │ ├── production/ │ ├── staging/ │ └── development/ ├── group_vars/ # Group-specific variables │ ├── all/ │ │ ├── common.yml │ │ └── vault.yml # Encrypted secrets │ ├── webservers.yml │ └── databases.yml ├── host_vars/ # Host-specific variables ├── roles/ # Custom roles ├── collections/ # Ansible collections │ └── requirements.yml ├── playbooks/ # Specific playbooks │ ├── deploy.yml │ ├── security-audit.yml │ └── maintenance.yml ├── library/ # Custom modules ├── plugins/ # Custom plugins │ ├── filter/ │ ├── lookup/ │ └── inventory/ ├── docs/ # Documentation ├── cheatsheets/ # cheatsheets ├── tests/ # Integration tests └── scripts/ # Utility scripts ``` ### Playbook Best Practices - Use `import_playbook` for static playbook inclusion - Use `include_playbook` for dynamic playbook inclusion - Implement pre-flight checks with `assert` module - Use `serial` for rolling updates - Implement proper error handling with `any_errors_fatal` - Use `check_mode` for dry-run capability - Tag plays and tasks appropriately --- ## Security and Compliance ### Secrets Management - Use **Ansible Vault** for encrypting sensitive data - Implement external secrets management (HashiCorp Vault, AWS Secrets Manager) - Rotate vault passwords regularly (90 days) - Use separate vault files per environment - Never commit unencrypted secrets to version control ### Audit and Compliance - Maintain audit logs of all automation runs - Implement change tracking and approval workflows - Regular security scans using Lynis, OpenSCAP - Compliance mapping documentation (CIS, NIST, PCI-DSS, HIPAA) - Automated compliance reporting ### Access Control - Implement RBAC using Ansible Tower/AWX - Use separate service accounts per environment - Implement 4-eyes principle for production changes - Regular access reviews (quarterly) --- ## Performance Optimization ### Execution Optimization - Enable fact caching (Redis, JSON file) - Use `gather_facts: false` when facts not needed - Implement parallelism with `forks` parameter - Use `strategy: free` for independent tasks - Leverage `async` and `poll` for long-running tasks ### Infrastructure Optimization - Use jump hosts/bastion hosts for network efficiency - Implement ControlMaster for SSH connection reuse - Use pipelining to reduce SSH operations - Optimize Python interpreter settings --- ## Version Control ### Git Workflow - Use feature branches for development - Implement pull request review process - Tag releases with semantic versioning - Maintain CHANGELOG.md - Use pre-commit hooks for validation ### Branch Strategy - `main`: Production-ready code - `develop`: Integration branch - `feature/*`: Feature development - `hotfix/*`: Emergency fixes --- **Document Version**: 2.0 **Last Updated**: 2025-11-10 **Review Cycle**: Quarterly