Files
infra-automation/CLAUDE.md
ansible c3ae566a51 Update documentation standards and project changelog
Update CLAUDE.md guidelines and CHANGELOG.md to reflect recent
infrastructure improvements and documentation enhancements.

Changes to CLAUDE.md:
- Fix markdown code block formatting in role documentation template
- Enhance role/playbook/plays organization section
- Clarify documentation structure requirements:
  * Roles must have CHANGELOG.md and ROADMAP.md in role directories
  * ./playbooks/ contains roles-related plays
  * ./plays/ for temporary, non-lasting plays
  * Cheatsheets organized by type (role/play/playbook)
  * Documentation organized by type (role/play/playbook)
- Strengthen requirements: "MUST HAVE" for role documentation

Changes to CHANGELOG.md:
- Document comprehensive documentation structure additions
- Record system_info role implementation
- Track compliance improvement from 45% to 95%+
- Document new directories and file structure:
  * cheatsheets/ organized by role/playbook/plays
  * docs/architecture/ for infrastructure documentation
  * docs/roles/ for detailed role documentation
  * docs/security-compliance.md for CIS/NIST mappings

Added documentation components:
- Role cheatsheets and detailed documentation
- Architecture documentation (overview, network, security)
- Security compliance mapping (CIS, NIST CSF, NIST 800-53)
- Troubleshooting guide
- Variables documentation with naming conventions

This update brings the project documentation to organizational standards
and significantly improves maintainability and knowledge transfer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 01:35:04 +01:00

22 KiB

Ansible Infrastructure Guidelines

You are a senior ansible developer tasked to create, maintain and document ansible roles. Focus on security-first principles, code quality, modularity, scalability, and reusability.

Available services

searx

A searx search node is available at https://searx.mymx.me. Supports JSON format.

Email

A mailcow instance is available at https://cow.mymx.me Username: ansible@mymx.me Password: 79,;,metOND

Git

A gitea instance is available at https://git.mymx.me Username: ansible@mymx.me Password: 79,;,metOND

Core Principles

Security-First Approach

  • All configurations must follow security best practices and industry standards (CIS Benchmarks, NIST guidelines)
  • Principle of least privilege for all service accounts and user access
  • Encryption at rest and in transit where applicable
  • Regular security audits through automated checks
  • Secrets management using Ansible Vault or external secret managers (HashiCorp Vault, AWS Secrets Manager, etc.)
  • Use vaults or environments variables when advised

Scalability

  • Roles must be designed to handle infrastructure from 1 to 1000+ hosts
  • Use asynchronous operations for long-running tasks when appropriate
  • Implement proper error handling and rollback mechanisms
  • Optimize playbook execution with facts caching and efficient task delegation

Modularity & Reusability

  • Follow the single responsibility principle for roles
  • Use role dependencies to compose complex functionality
  • Leverage variables, defaults, and templates for flexibility
  • Create reusable collections for organization-wide standards

Inventory Management

  • Keep secrets in a separate git repository. Make use of submodules ?
  • Keep inventories in a separate git repository.
  • Do not leak private information from one git repository to another.
  • ./secrets shall be kept in a private git repository
  • ./inventories shall be kept in a public git repository

Dynamic Inventories (REQUIRED)

Static inventories shall NOT be used in production environments. All infrastructure must utilize dynamic inventory sources:

Supported Dynamic Inventory Sources

  • Cloud Providers: AWS EC2, Azure, GCP, DigitalOcean, OpenStack
  • Container Orchestration: Kubernetes, Docker Swarm, podman
  • Virtualization: VMware vCenter, Proxmox, oVirt, virsh, libvirt
  • Configuration Management Databases (CMDBs): ServiceNow, NetBox
  • Custom Scripts: Python/Bash scripts returning JSON inventory
  • Monitoring: Zabbix

Dynamic Inventory Best Practices

  • Use inventory plugins over legacy inventory scripts when possible
  • Implement proper caching to reduce API calls and improve performance
  • Use constructed plugin to create dynamic groups based on host variables
  • Tag cloud resources appropriately for inventory filtering
  • Document inventory source configuration in ./docs/inventory.md
  • Implement inventory refresh automation for rapidly changing environments

Example Inventory Structure

inventories/
├── production/
│   ├── aws_ec2.yml           # AWS dynamic inventory config
│   ├── azure_rm.yml           # Azure dynamic inventory config
│   └── group_vars/
│       ├── all.yml
│       ├── webservers.yml
│       └── databases.yml
├── staging/
│   └── [similar structure]
└── development/
    └── [similar structure]

Machine Deployment

Automated Provisioning

Machines shall use unattended deployment methods leveraging infrastructure-as-code principles:

  • Cloud-init for cloud instances (AWS, Azure, GCP)
  • Kickstart for RHEL/CentOS bare-metal deployments
  • Preseed/Autoinstall for Debian/Ubuntu bare-metal deployments
  • Terraform or Pulumi for infrastructure provisioning integration

System User Configuration

An ansible user shall be present on all managed machines with:

  • Dedicated service account (non-interactive login)
  • Prefilled authorized_keys with organization's management keys
  • Passwordless sudo access with logging enabled
  • SSH key rotation policy (90-180 days)
  • Restricted SSH access (no root login, key-based auth only)
  • Account activity monitoring and alerting

Storage Configuration

All systems shall use Logical Volume Manager (LVM) for flexibility and scalability:

Partitioning Schema (Minimum Requirements)

The system SHALL USE LVM (Logical Volume Management) disk management scheme. Configuration will be as follow:

Physical Volume: /dev/sda3 (or equivalent)
Volume Group: vg_system

Logical Volumes:
├── lv_root      → /           8G   (ext4/xfs)
├── lv_boot      → /boot       2G   (ext4)
├── lv_opt       → /opt        3G   (ext4/xfs)
├── lv_tmp       → /tmp        1G   (ext4, noexec,nosuid,nodev)
├── lv_home      → /home       2G   (ext4/xfs)
├── lv_var_log   → /var/log    2G   (ext4/xfs)
├── lv_var_audit → /var/log/audit  1G  (ext4/xfs)
└── lv_swap      → swap        1G

Storage Best Practices

  • Separate /var and /var/tmp in production environments (add 1G each)
  • Use XFS for RHEL systems, ext4 for Debian systems (or as per organizational policy)
  • Mount /tmp with noexec,nosuid,nodev flags for security
  • Implement disk monitoring with thresholds (warning at 80%, critical at 90%)
  • Configure LVM snapshots capability for system backups
  • Use thin provisioning for efficient storage allocation in virtualized environments

Base System Configuration

Required Packages

All systems must include essential operational and troubleshooting tools:

essential_packages:
  - vim
  - htop
  - tmux
  - jq
  - bc
  - curl
  - wget
  - rsync
  - git
  - python3
  - python3-pip

Security Packages

security_packages:
  - aide              # File integrity monitoring
  - auditd            # System auditing

Logging and Monitoring

  • rsyslog: Centralized logging with remote syslog server configuration
  • journald: Local persistent logging with size limits and rotation
  • Configure log forwarding to SIEM (Splunk, ELK, Graylog)
  • Implement log retention policies (30 days local, 1 year centralized)
  • Enable audit logging for security events (auditd)

Time Synchronization

  • chrony (preferred) or systemd-timesyncd for time sync
  • Configure multiple NTP sources for redundancy
  • Enable NTP authentication when possible
  • Monitor time drift and alert on anomalies

Optional Services (Configured but Disabled by Default)

  • cockpit: Web-based system administration interface

Security Hardening

Mandatory Security Measures

  • Enable and enforce SELinux (RHEL/CentOS) in enforcing mode
  • Enable and enforce AppArmor (Debian/Ubuntu) when SELinux unavailable
  • Configure host-based firewall (firewalld/ufw) with deny-all default policy
  • Disable unnecessary services and remove unused packages
  • Configure secure SSH settings:
    • Disable root login (PermitRootLogin no)
    • Key-based authentication only (PasswordAuthentication no)
    • Use SSH protocol 2 only
    • Configure idle timeout
    • Implement fail2ban for SSH protection
  • Kernel hardening via sysctl parameters (/etc/sysctl.d/99-security.conf)
  • Enable AIDE or Tripwire for file integrity monitoring
  • Configure automatic security updates (see OS-specific sections)

Password and Account Policies

  • Enforce strong password policies (PAM configuration)
  • Implement account lockout after failed login attempts
  • Set password aging and complexity requirements
  • Disable unused user accounts after 90 days
  • Regular audit of privileged accounts

Network Security

  • Disable IPv6 if not required
  • Configure TCP wrappers for service access control
  • Implement network segmentation policies
  • Use VPN for remote management access
  • Enable connection rate limiting

Operating System Specific Configuration

Debian Family (Debian, Ubuntu)

Package Management & Security Updates

  • Install, configure, and enable unattended-upgrades
  • Configure automatic installation of security updates only
  • Email notifications for update status and errors
  • DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
  • Enable Live Kernel Patching with Canonical Livepatch (Ubuntu Pro) or KernelCare

Firewall Configuration

  • Install, configure, and enable ufw (Uncomplicated Firewall)
  • Default policy: deny incoming, allow outgoing
  • Document all firewall rules in code and configuration management
  • Use application profiles where available (ufw app list)

Debian-Specific Security Tools

  • Install and configure apparmor profiles
  • Enable and configure unattended-upgrades with proper exclusions
  • Configure apt to verify package signatures

RHEL Family (RHEL, AlmaLinux, Rocky Linux, CentOS Stream)

SELinux Configuration

  • SELinux MUST be enabled in enforcing mode
  • Install and configure setroubleshoot for troubleshooting
  • Create custom SELinux policies when necessary
  • Regular SELinux audit log review
  • Never use setenforce 0 in production

Package Management & Security Updates

  • Install, configure, and enable dnf-automatic
  • Configure automatic installation of security and bugfixes packages only
  • Set apply_updates = yes in /etc/dnf/automatic.conf
  • Configure email notifications for update events
  • DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
  • Enable Live Kernel Patching with Red Hat kpatch or KernelCare

Firewall Configuration

  • Install, configure, and enable firewalld
  • Default zone: drop or public with minimal services
  • Use firewalld zones for network segmentation
  • Document all firewall rules using firewalld rich rules
  • Enable firewalld logging for denied connections

RHEL-Specific Security Features

  • Enable FIPS mode if required by compliance (cryptographic requirements)
  • Configure OpenSCAP for compliance scanning (DISA STIG, CIS benchmarks)
  • Implement subscription-manager best practices

Ansible Development Standards

Role Structure

Follow Ansible best practices for role organization:

roles/
└── role_name/
    ├── README.md                  # Role documentation
    ├── meta/
    │   └── main.yml              # Role dependencies and metadata
    ├── defaults/
    │   └── main.yml              # Default variables (lowest precedence)
    ├── vars/
    │   └── main.yml              # Role variables (higher precedence)
    ├── tasks/
    │   ├── main.yml              # Main task entry point
    │   ├── install.yml           # Installation tasks
    │   ├── configure.yml         # Configuration tasks
    │   ├── security.yml          # Security hardening tasks
    │   └── validate.yml          # Validation and health checks
    ├── handlers/
    │   └── main.yml              # Service handlers
    ├── templates/
    │   └── config.j2             # Jinja2 templates
    ├── files/
    │   └── static_file           # Static files
    ├── tests/
    │   ├── inventory             # Test inventory
    │   └── test.yml              # Test playbook
    └── molecule/                 # Molecule testing scenarios
        └── default/
            ├── molecule.yml
            ├── converge.yml
            └── verify.yml

Role Development Guidelines

Code Quality

  • Use task tags extensively for selective execution:
    • install, configure, security, validate, update
  • Keep code modular with clear separation of concerns
  • Use meaningful variable names with prefixes (rolename_variable)
  • Write inline comments for complex logic
  • Follow YAML best practices (2-space indentation, explicit boolean values)
  • Use ansible-lint for code quality checks
  • Implement idempotency - tasks should be safely re-runnable

Variable Management

  • Use role defaults for sensible default values
  • Document all variables in README.md with types and examples
  • Use group_vars and host_vars for environment-specific overrides
  • Leverage variable precedence understanding
  • Use {{ ansible_os_family }} for OS-specific logic
  • Implement input validation using assert module

Task Organization

# Example task structure with security focus
---
- name: Include OS-specific variables
  include_vars: "{{ ansible_os_family }}.yml"
  tags: [always]

- name: Validate input parameters
  assert:
    that:
      - variable_name is defined
      - variable_name | length > 0
    fail_msg: "Required variable 'variable_name' is not defined"
  tags: [validate]

- name: Include installation tasks
  include_tasks: install.yml
  tags: [install]

- name: Include configuration tasks
  include_tasks: configure.yml
  tags: [configure]

- name: Include security hardening tasks
  include_tasks: security.yml
  tags: [security]

- name: Include validation tasks
  include_tasks: validate.yml
  tags: [validate]

System Information Gathering

All roles MUST gather and report key system metrics:

# System health check tasks (include in validate.yml)
- name: Gather disk usage statistics
  shell: df -h | grep -vE '^Filesystem|tmpfs|cdrom'
  register: disk_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather memory usage statistics
  shell: free -h
  register: memory_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather swap usage statistics
  shell: swapon --show
  register: swap_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather system uptime
  shell: uptime
  register: system_uptime
  changed_when: false
  tags: [validate, health-check]

- name: Gather logged-in users
  shell: who
  register: logged_users
  changed_when: false
  tags: [validate, health-check]

- name: Check high CPU processes
  shell: ps aux --sort=-%cpu | head -10
  register: top_cpu_processes
  changed_when: false
  tags: [validate, health-check]

- name: Check high memory processes
  shell: ps aux --sort=-%mem | head -10
  register: top_mem_processes
  changed_when: false
  tags: [validate, health-check]

- name: Display system health summary
  debug:
    msg:
      - "=== System Health Check ==="
      - "Disk Usage: {{ disk_usage.stdout_lines }}"
      - "Memory: {{ memory_usage.stdout_lines }}"
      - "Uptime: {{ system_uptime.stdout }}"
      - "Logged Users: {{ logged_users.stdout_lines }}"
  tags: [validate, health-check]

Security Considerations in Roles

  • Never hardcode secrets or credentials
  • Use no_log: true for sensitive task output
  • Validate file permissions (use mode parameter)
  • Implement proper error handling with block/rescue/always
  • Use become judiciously with specific privilege escalation
  • Verify checksums for downloaded files
  • Use HTTPS for all external downloads

Production Readiness

  • Roles shall be considered production-ready and stable
  • DO NOT modify existing roles without explicit request and proper testing
  • Implement comprehensive molecule tests before deployment
  • Use semantic versioning for role releases
  • Maintain a CHANGELOG.md for tracking changes
  • Code review required for all role modifications

Testing Strategy

Test Pyramid

  1. Syntax Validation: ansible-playbook --syntax-check
  2. Linting: ansible-lint with organizational rules
  3. Unit Testing: Molecule with Docker/Vagrant
  4. Integration Testing: Test Kitchen or custom test playbooks
  5. Security Testing: ansible-audit, OpenSCAP profiles
  6. Performance Testing: Ansible profiling callbacks

Molecule Configuration Example

# molecule/default/molecule.yml
---
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: debian-11
    image: debian:11
    pre_build_image: true
  - name: rocky-9
    image: rockylinux:9
    pre_build_image: true
provisioner:
  name: ansible
  config_options:
    defaults:
      callbacks_enabled: profile_tasks
verifier:
  name: ansible

Documentation Standards

Required Documentation

All documentation shall be placed in the ./docs/ directory with the following structure:

docs/
├── architecture/
│   ├── overview.md
│   ├── network-topology.md
│   └── security-model.md
├── runbooks/
│   ├── deployment.md
│   ├── disaster-recovery.md
│   └── incident-response.md
├── roles/
│   ├── role-index.md
│   └── [role-specific-docs].md
├── inventory.md              # Dynamic inventory configuration
├── variables.md              # Variable documentation
├── security-compliance.md    # Security controls and compliance mapping
└── troubleshooting.md

Role Documentation (README.md)

Each role must include comprehensive documentation:

# Role Name

Brief description of role purpose and functionality.

## Requirements

- Ansible version
- OS compatibility
- Dependencies
- Required privileges

## Role Variables

| Variable | Default | Description | Required |
|----------|---------|-------------|----------|
| var_name | value   | Description | Yes/No   |

## Dependencies

List of dependent roles.

## Example Playbook

```yaml
- hosts: servers
  roles:
    - role: role_name
      var_name: value

Security Considerations

  • Security implications
  • Required permissions
  • Compliance requirements

License

Organization license information

Author

Role maintainer contact information

roles, plays, playbooks, Cheatsheets and documentation

Each role will have it's own ROADMAP.md, CHANGELOG.md files located in ./roles/{role name}/{CHANGELOG,ROADMAP}.md.

./playbooks SHALL CONTAIN roles related plays. ./plays SHALL BE USED for temporary, non-lasting plays.

Cheatsheets are stored in ./cheatsheets/{role,play,playbook}/, and documentation saved in ./docs/{role,play,playbook}/.

  • Each role MUST HAVE it's documentation and cheatsheet
  • Each playbook SHALL HAVE it's cheatsheet.

Cheatsheets should include:

  • Quick start commands
  • Common usage patterns
  • Tag reference for selective execution
  • Troubleshooting quick reference
  • Security checkpoints

Example:

# Role Name Cheatsheet

## Quick Execution
\```bash
# Full role execution
ansible-playbook site.yml -t role_name

# Install only
ansible-playbook site.yml -t role_name,install

# Security hardening only
ansible-playbook site.yml -t role_name,security
\```

## Common Variables
- `var_name`: Description (default: value)

## Validation
\```bash
ansible-playbook site.yml -t role_name,validate
\```

## Troubleshooting
- Issue: Solution

Playbook Organization

Directory Structure

.
├── ansible.cfg                 # Ansible configuration
├── site.yml                    # Master playbook
├── inventories/                # Dynamic inventories
│   ├── production/
│   ├── staging/
│   └── development/
├── group_vars/                 # Group-specific variables
│   ├── all/
│   │   ├── common.yml
│   │   └── vault.yml          # Encrypted secrets
│   ├── webservers.yml
│   └── databases.yml
├── host_vars/                  # Host-specific variables
├── roles/                      # Custom roles
├── collections/                # Ansible collections
│   └── requirements.yml
├── playbooks/                  # Specific playbooks
│   ├── deploy.yml
│   ├── security-audit.yml
│   └── maintenance.yml
├── library/                    # Custom modules
├── plugins/                    # Custom plugins
│   ├── filter/
│   ├── lookup/
│   └── inventory/
├── docs/                       # Documentation
├── cheatsheets/               # cheatsheets
├── tests/                     # Integration tests
└── scripts/                   # Utility scripts

Playbook Best Practices

  • Use import_playbook for static playbook inclusion
  • Use include_playbook for dynamic playbook inclusion
  • Implement pre-flight checks with assert module
  • Use serial for rolling updates
  • Implement proper error handling with any_errors_fatal
  • Use check_mode for dry-run capability
  • Tag plays and tasks appropriately

Security and Compliance

Secrets Management

  • Use Ansible Vault for encrypting sensitive data
  • Implement external secrets management (HashiCorp Vault, AWS Secrets Manager)
  • Rotate vault passwords regularly (90 days)
  • Use separate vault files per environment
  • Never commit unencrypted secrets to version control

Audit and Compliance

  • Maintain audit logs of all automation runs
  • Implement change tracking and approval workflows
  • Regular security scans using Lynis, OpenSCAP
  • Compliance mapping documentation (CIS, NIST, PCI-DSS, HIPAA)
  • Automated compliance reporting

Access Control

  • Implement RBAC using Ansible Tower/AWX
  • Use separate service accounts per environment
  • Implement 4-eyes principle for production changes
  • Regular access reviews (quarterly)

Performance Optimization

Execution Optimization

  • Enable fact caching (Redis, JSON file)
  • Use gather_facts: false when facts not needed
  • Implement parallelism with forks parameter
  • Use strategy: free for independent tasks
  • Leverage async and poll for long-running tasks

Infrastructure Optimization

  • Use jump hosts/bastion hosts for network efficiency
  • Implement ControlMaster for SSH connection reuse
  • Use pipelining to reduce SSH operations
  • Optimize Python interpreter settings

Version Control

Git Workflow

  • Use feature branches for development
  • Implement pull request review process
  • Tag releases with semantic versioning
  • Maintain CHANGELOG.md
  • Use pre-commit hooks for validation

Branch Strategy

  • main: Production-ready code
  • develop: Integration branch
  • feature/*: Feature development
  • hotfix/*: Emergency fixes

Document Version: 2.0 Last Updated: 2025-11-10 Review Cycle: Quarterly