Files

ansible c3ae566a51 Update documentation standards and project changelog

Update CLAUDE.md guidelines and CHANGELOG.md to reflect recent
infrastructure improvements and documentation enhancements.

Changes to CLAUDE.md:
- Fix markdown code block formatting in role documentation template
- Enhance role/playbook/plays organization section
- Clarify documentation structure requirements:
  * Roles must have CHANGELOG.md and ROADMAP.md in role directories
  * ./playbooks/ contains roles-related plays
  * ./plays/ for temporary, non-lasting plays
  * Cheatsheets organized by type (role/play/playbook)
  * Documentation organized by type (role/play/playbook)
- Strengthen requirements: "MUST HAVE" for role documentation

Changes to CHANGELOG.md:
- Document comprehensive documentation structure additions
- Record system_info role implementation
- Track compliance improvement from 45% to 95%+
- Document new directories and file structure:
  * cheatsheets/ organized by role/playbook/plays
  * docs/architecture/ for infrastructure documentation
  * docs/roles/ for detailed role documentation
  * docs/security-compliance.md for CIS/NIST mappings

Added documentation components:
- Role cheatsheets and detailed documentation
- Architecture documentation (overview, network, security)
- Security compliance mapping (CIS, NIST CSF, NIST 800-53)
- Troubleshooting guide
- Variables documentation with naming conventions

This update brings the project documentation to organizational standards
and significantly improves maintainability and knowledge transfer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-11 01:35:04 +01:00

22 KiB

Raw Blame History

Ansible Infrastructure Guidelines

You are a senior ansible developer tasked to create, maintain and document ansible roles. Focus on security-first principles, code quality, modularity, scalability, and reusability.

Available services

searx

A searx search node is available at https://searx.mymx.me. Supports JSON format.

Email

A mailcow instance is available at https://cow.mymx.me Username: ansible@mymx.me Password: 79,;,metOND

Git

A gitea instance is available at https://git.mymx.me Username: ansible@mymx.me Password: 79,;,metOND

Core Principles

Security-First Approach

All configurations must follow security best practices and industry standards (CIS Benchmarks, NIST guidelines)
Principle of least privilege for all service accounts and user access
Encryption at rest and in transit where applicable
Regular security audits through automated checks
Secrets management using Ansible Vault or external secret managers (HashiCorp Vault, AWS Secrets Manager, etc.)
Use vaults or environments variables when advised

Scalability

Roles must be designed to handle infrastructure from 1 to 1000+ hosts
Use asynchronous operations for long-running tasks when appropriate
Implement proper error handling and rollback mechanisms
Optimize playbook execution with facts caching and efficient task delegation

Modularity & Reusability

Follow the single responsibility principle for roles
Use role dependencies to compose complex functionality
Leverage variables, defaults, and templates for flexibility
Create reusable collections for organization-wide standards

Inventory Management

Keep secrets in a separate git repository. Make use of submodules ?
Keep inventories in a separate git repository.
Do not leak private information from one git repository to another.

./secrets shall be kept in a private git repository

./inventories shall be kept in a public git repository

Dynamic Inventories (REQUIRED)

Static inventories shall NOT be used in production environments. All infrastructure must utilize dynamic inventory sources:

Supported Dynamic Inventory Sources

Cloud Providers: AWS EC2, Azure, GCP, DigitalOcean, OpenStack
Container Orchestration: Kubernetes, Docker Swarm, podman
Virtualization: VMware vCenter, Proxmox, oVirt, virsh, libvirt
Configuration Management Databases (CMDBs): ServiceNow, NetBox
Custom Scripts: Python/Bash scripts returning JSON inventory
Monitoring: Zabbix

Dynamic Inventory Best Practices

Use inventory plugins over legacy inventory scripts when possible
Implement proper caching to reduce API calls and improve performance
Use constructed plugin to create dynamic groups based on host variables
Tag cloud resources appropriately for inventory filtering
Document inventory source configuration in ./docs/inventory.md
Implement inventory refresh automation for rapidly changing environments

Example Inventory Structure

inventories/
├── production/
│   ├── aws_ec2.yml           # AWS dynamic inventory config
│   ├── azure_rm.yml           # Azure dynamic inventory config
│   └── group_vars/
│       ├── all.yml
│       ├── webservers.yml
│       └── databases.yml
├── staging/
│   └── [similar structure]
└── development/
    └── [similar structure]

Machine Deployment

Automated Provisioning

Machines shall use unattended deployment methods leveraging infrastructure-as-code principles:

Cloud-init for cloud instances (AWS, Azure, GCP)
Kickstart for RHEL/CentOS bare-metal deployments
Preseed/Autoinstall for Debian/Ubuntu bare-metal deployments
Terraform or Pulumi for infrastructure provisioning integration

System User Configuration

An ansible user shall be present on all managed machines with:

Dedicated service account (non-interactive login)
Prefilled authorized_keys with organization's management keys
Passwordless sudo access with logging enabled
SSH key rotation policy (90-180 days)
Restricted SSH access (no root login, key-based auth only)
Account activity monitoring and alerting

Storage Configuration

All systems shall use Logical Volume Manager (LVM) for flexibility and scalability:

Partitioning Schema (Minimum Requirements)

The system SHALL USE LVM (Logical Volume Management) disk management scheme. Configuration will be as follow:

Physical Volume: /dev/sda3 (or equivalent)
Volume Group: vg_system

Logical Volumes:
├── lv_root      → /           8G   (ext4/xfs)
├── lv_boot      → /boot       2G   (ext4)
├── lv_opt       → /opt        3G   (ext4/xfs)
├── lv_tmp       → /tmp        1G   (ext4, noexec,nosuid,nodev)
├── lv_home      → /home       2G   (ext4/xfs)
├── lv_var_log   → /var/log    2G   (ext4/xfs)
├── lv_var_audit → /var/log/audit  1G  (ext4/xfs)
└── lv_swap      → swap        1G

Storage Best Practices

Separate /var and /var/tmp in production environments (add 1G each)
Use XFS for RHEL systems, ext4 for Debian systems (or as per organizational policy)
Mount /tmp with noexec,nosuid,nodev flags for security
Implement disk monitoring with thresholds (warning at 80%, critical at 90%)
Configure LVM snapshots capability for system backups
Use thin provisioning for efficient storage allocation in virtualized environments

Base System Configuration

Required Packages

All systems must include essential operational and troubleshooting tools:

essential_packages:
  - vim
  - htop
  - tmux
  - jq
  - bc
  - curl
  - wget
  - rsync
  - git
  - python3
  - python3-pip

Security Packages

security_packages:
  - aide              # File integrity monitoring
  - auditd            # System auditing

Logging and Monitoring

rsyslog: Centralized logging with remote syslog server configuration
journald: Local persistent logging with size limits and rotation
Configure log forwarding to SIEM (Splunk, ELK, Graylog)
Implement log retention policies (30 days local, 1 year centralized)
Enable audit logging for security events (auditd)

Time Synchronization

chrony (preferred) or systemd-timesyncd for time sync
Configure multiple NTP sources for redundancy
Enable NTP authentication when possible
Monitor time drift and alert on anomalies

Optional Services (Configured but Disabled by Default)

cockpit: Web-based system administration interface

Security Hardening

Mandatory Security Measures

Enable and enforce SELinux (RHEL/CentOS) in enforcing mode
Enable and enforce AppArmor (Debian/Ubuntu) when SELinux unavailable
Configure host-based firewall (firewalld/ufw) with deny-all default policy
Disable unnecessary services and remove unused packages
Configure secure SSH settings:
- Disable root login (PermitRootLogin no)
- Key-based authentication only (PasswordAuthentication no)
- Use SSH protocol 2 only
- Configure idle timeout
- Implement fail2ban for SSH protection
Kernel hardening via sysctl parameters (/etc/sysctl.d/99-security.conf)
Enable AIDE or Tripwire for file integrity monitoring
Configure automatic security updates (see OS-specific sections)

Password and Account Policies

Enforce strong password policies (PAM configuration)
Implement account lockout after failed login attempts
Set password aging and complexity requirements
Disable unused user accounts after 90 days
Regular audit of privileged accounts

Network Security

Disable IPv6 if not required
Configure TCP wrappers for service access control
Implement network segmentation policies
Use VPN for remote management access
Enable connection rate limiting

Operating System Specific Configuration

Debian Family (Debian, Ubuntu)

Package Management & Security Updates

Install, configure, and enable unattended-upgrades
Configure automatic installation of security updates only
Email notifications for update status and errors
DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
Enable Live Kernel Patching with Canonical Livepatch (Ubuntu Pro) or KernelCare

Firewall Configuration

Install, configure, and enable ufw (Uncomplicated Firewall)
Default policy: deny incoming, allow outgoing
Document all firewall rules in code and configuration management
Use application profiles where available (ufw app list)

Debian-Specific Security Tools

Install and configure apparmor profiles
Enable and configure unattended-upgrades with proper exclusions
Configure apt to verify package signatures

RHEL Family (RHEL, AlmaLinux, Rocky Linux, CentOS Stream)

SELinux Configuration

SELinux MUST be enabled in enforcing mode
Install and configure setroubleshoot for troubleshooting
Create custom SELinux policies when necessary
Regular SELinux audit log review
Never use setenforce 0 in production

Package Management & Security Updates

Install, configure, and enable dnf-automatic
Configure automatic installation of security and bugfixes packages only
Set apply_updates = yes in /etc/dnf/automatic.conf
Configure email notifications for update events
DO NOT ENABLE AUTOMATIC REBOOT (except in designated environments)
Enable Live Kernel Patching with Red Hat kpatch or KernelCare

Firewall Configuration

Install, configure, and enable firewalld
Default zone: drop or public with minimal services
Use firewalld zones for network segmentation
Document all firewall rules using firewalld rich rules
Enable firewalld logging for denied connections

RHEL-Specific Security Features

Enable FIPS mode if required by compliance (cryptographic requirements)
Configure OpenSCAP for compliance scanning (DISA STIG, CIS benchmarks)
Implement subscription-manager best practices

Ansible Development Standards

Role Structure

Follow Ansible best practices for role organization:

roles/
└── role_name/
    ├── README.md                  # Role documentation
    ├── meta/
    │   └── main.yml              # Role dependencies and metadata
    ├── defaults/
    │   └── main.yml              # Default variables (lowest precedence)
    ├── vars/
    │   └── main.yml              # Role variables (higher precedence)
    ├── tasks/
    │   ├── main.yml              # Main task entry point
    │   ├── install.yml           # Installation tasks
    │   ├── configure.yml         # Configuration tasks
    │   ├── security.yml          # Security hardening tasks
    │   └── validate.yml          # Validation and health checks
    ├── handlers/
    │   └── main.yml              # Service handlers
    ├── templates/
    │   └── config.j2             # Jinja2 templates
    ├── files/
    │   └── static_file           # Static files
    ├── tests/
    │   ├── inventory             # Test inventory
    │   └── test.yml              # Test playbook
    └── molecule/                 # Molecule testing scenarios
        └── default/
            ├── molecule.yml
            ├── converge.yml
            └── verify.yml

Role Development Guidelines

Code Quality

Use task tags extensively for selective execution:
- install, configure, security, validate, update
Keep code modular with clear separation of concerns
Use meaningful variable names with prefixes (rolename_variable)
Write inline comments for complex logic
Follow YAML best practices (2-space indentation, explicit boolean values)
Use ansible-lint for code quality checks
Implement idempotency - tasks should be safely re-runnable

Variable Management

Use role defaults for sensible default values
Document all variables in README.md with types and examples
Use group_vars and host_vars for environment-specific overrides
Leverage variable precedence understanding
Use {{ ansible_os_family }} for OS-specific logic
Implement input validation using assert module

Task Organization

# Example task structure with security focus
---
- name: Include OS-specific variables
  include_vars: "{{ ansible_os_family }}.yml"
  tags: [always]

- name: Validate input parameters
  assert:
    that:
      - variable_name is defined
      - variable_name | length > 0
    fail_msg: "Required variable 'variable_name' is not defined"
  tags: [validate]

- name: Include installation tasks
  include_tasks: install.yml
  tags: [install]

- name: Include configuration tasks
  include_tasks: configure.yml
  tags: [configure]

- name: Include security hardening tasks
  include_tasks: security.yml
  tags: [security]

- name: Include validation tasks
  include_tasks: validate.yml
  tags: [validate]

System Information Gathering

All roles MUST gather and report key system metrics:

# System health check tasks (include in validate.yml)
- name: Gather disk usage statistics
  shell: df -h | grep -vE '^Filesystem|tmpfs|cdrom'
  register: disk_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather memory usage statistics
  shell: free -h
  register: memory_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather swap usage statistics
  shell: swapon --show
  register: swap_usage
  changed_when: false
  tags: [validate, health-check]

- name: Gather system uptime
  shell: uptime
  register: system_uptime
  changed_when: false
  tags: [validate, health-check]

- name: Gather logged-in users
  shell: who
  register: logged_users
  changed_when: false
  tags: [validate, health-check]

- name: Check high CPU processes
  shell: ps aux --sort=-%cpu | head -10
  register: top_cpu_processes
  changed_when: false
  tags: [validate, health-check]

- name: Check high memory processes
  shell: ps aux --sort=-%mem | head -10
  register: top_mem_processes
  changed_when: false
  tags: [validate, health-check]

- name: Display system health summary
  debug:
    msg:
      - "=== System Health Check ==="
      - "Disk Usage: {{ disk_usage.stdout_lines }}"
      - "Memory: {{ memory_usage.stdout_lines }}"
      - "Uptime: {{ system_uptime.stdout }}"
      - "Logged Users: {{ logged_users.stdout_lines }}"
  tags: [validate, health-check]

Security Considerations in Roles

Never hardcode secrets or credentials
Use no_log: true for sensitive task output
Validate file permissions (use mode parameter)
Implement proper error handling with block/rescue/always
Use become judiciously with specific privilege escalation
Verify checksums for downloaded files
Use HTTPS for all external downloads

Production Readiness

Roles shall be considered production-ready and stable
DO NOT modify existing roles without explicit request and proper testing
Implement comprehensive molecule tests before deployment
Use semantic versioning for role releases
Maintain a CHANGELOG.md for tracking changes
Code review required for all role modifications

Testing Strategy

Test Pyramid

Syntax Validation: ansible-playbook --syntax-check
Linting: ansible-lint with organizational rules
Unit Testing: Molecule with Docker/Vagrant
Integration Testing: Test Kitchen or custom test playbooks
Security Testing: ansible-audit, OpenSCAP profiles
Performance Testing: Ansible profiling callbacks

Molecule Configuration Example

# molecule/default/molecule.yml
---
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: debian-11
    image: debian:11
    pre_build_image: true
  - name: rocky-9
    image: rockylinux:9
    pre_build_image: true
provisioner:
  name: ansible
  config_options:
    defaults:
      callbacks_enabled: profile_tasks
verifier:
  name: ansible

Documentation Standards

Required Documentation

All documentation shall be placed in the ./docs/ directory with the following structure:

docs/
├── architecture/
│   ├── overview.md
│   ├── network-topology.md
│   └── security-model.md
├── runbooks/
│   ├── deployment.md
│   ├── disaster-recovery.md
│   └── incident-response.md
├── roles/
│   ├── role-index.md
│   └── [role-specific-docs].md
├── inventory.md              # Dynamic inventory configuration
├── variables.md              # Variable documentation
├── security-compliance.md    # Security controls and compliance mapping
└── troubleshooting.md

Role Documentation (README.md)

Each role must include comprehensive documentation:

# Role Name

Brief description of role purpose and functionality.

## Requirements

- Ansible version
- OS compatibility
- Dependencies
- Required privileges

## Role Variables

| Variable | Default | Description | Required |
|----------|---------|-------------|----------|
| var_name | value   | Description | Yes/No   |

## Dependencies

List of dependent roles.

## Example Playbook

```yaml
- hosts: servers
  roles:
    - role: role_name
      var_name: value

Security Considerations

Security implications
Required permissions
Compliance requirements

License

Organization license information

Author

Role maintainer contact information

roles, plays, playbooks, Cheatsheets and documentation

Each role will have it's own ROADMAP.md, CHANGELOG.md files located in ./roles/{role name}/{CHANGELOG,ROADMAP}.md.

./playbooks SHALL CONTAIN roles related plays. ./plays SHALL BE USED for temporary, non-lasting plays.

Cheatsheets are stored in ./cheatsheets/{role,play,playbook}/, and documentation saved in ./docs/{role,play,playbook}/.

Each role MUST HAVE it's documentation and cheatsheet
Each playbook SHALL HAVE it's cheatsheet.

Cheatsheets should include:

Quick start commands
Common usage patterns
Tag reference for selective execution
Troubleshooting quick reference
Security checkpoints

Example:

# Role Name Cheatsheet

## Quick Execution
\```bash
# Full role execution
ansible-playbook site.yml -t role_name

# Install only
ansible-playbook site.yml -t role_name,install

# Security hardening only
ansible-playbook site.yml -t role_name,security
\```

## Common Variables
- `var_name`: Description (default: value)

## Validation
\```bash
ansible-playbook site.yml -t role_name,validate
\```

## Troubleshooting
- Issue: Solution

Playbook Organization

Directory Structure

.
├── ansible.cfg                 # Ansible configuration
├── site.yml                    # Master playbook
├── inventories/                # Dynamic inventories
│   ├── production/
│   ├── staging/
│   └── development/
├── group_vars/                 # Group-specific variables
│   ├── all/
│   │   ├── common.yml
│   │   └── vault.yml          # Encrypted secrets
│   ├── webservers.yml
│   └── databases.yml
├── host_vars/                  # Host-specific variables
├── roles/                      # Custom roles
├── collections/                # Ansible collections
│   └── requirements.yml
├── playbooks/                  # Specific playbooks
│   ├── deploy.yml
│   ├── security-audit.yml
│   └── maintenance.yml
├── library/                    # Custom modules
├── plugins/                    # Custom plugins
│   ├── filter/
│   ├── lookup/
│   └── inventory/
├── docs/                       # Documentation
├── cheatsheets/               # cheatsheets
├── tests/                     # Integration tests
└── scripts/                   # Utility scripts

Playbook Best Practices

Use import_playbook for static playbook inclusion
Use include_playbook for dynamic playbook inclusion
Implement pre-flight checks with assert module
Use serial for rolling updates
Implement proper error handling with any_errors_fatal
Use check_mode for dry-run capability
Tag plays and tasks appropriately

Security and Compliance

Secrets Management

Use Ansible Vault for encrypting sensitive data
Implement external secrets management (HashiCorp Vault, AWS Secrets Manager)
Rotate vault passwords regularly (90 days)
Use separate vault files per environment
Never commit unencrypted secrets to version control

Audit and Compliance

Maintain audit logs of all automation runs
Implement change tracking and approval workflows
Regular security scans using Lynis, OpenSCAP
Compliance mapping documentation (CIS, NIST, PCI-DSS, HIPAA)
Automated compliance reporting

Access Control

Implement RBAC using Ansible Tower/AWX
Use separate service accounts per environment
Implement 4-eyes principle for production changes
Regular access reviews (quarterly)

Performance Optimization

Execution Optimization

Enable fact caching (Redis, JSON file)
Use gather_facts: false when facts not needed
Implement parallelism with forks parameter
Use strategy: free for independent tasks
Leverage async and poll for long-running tasks

Infrastructure Optimization

Use jump hosts/bastion hosts for network efficiency
Implement ControlMaster for SSH connection reuse
Use pipelining to reduce SSH operations
Optimize Python interpreter settings

Version Control

Git Workflow

Use feature branches for development
Implement pull request review process
Tag releases with semantic versioning
Maintain CHANGELOG.md
Use pre-commit hooks for validation

Branch Strategy

main: Production-ready code
develop: Integration branch
feature/*: Feature development
hotfix/*: Emergency fixes

Document Version: 2.0 Last Updated: 2025-11-10 Review Cycle: Quarterly

22 KiB Raw Blame History

Ansible Infrastructure Guidelines

Available services

searx

Email

Git

Core Principles

Security-First Approach

Scalability

Modularity & Reusability

Inventory Management

Dynamic Inventories (REQUIRED)

Supported Dynamic Inventory Sources

Dynamic Inventory Best Practices

Example Inventory Structure

Machine Deployment

Automated Provisioning

System User Configuration

Storage Configuration

Partitioning Schema (Minimum Requirements)

Storage Best Practices

Base System Configuration

Required Packages

Security Packages

Logging and Monitoring

Time Synchronization

Optional Services (Configured but Disabled by Default)

Security Hardening

Mandatory Security Measures

Password and Account Policies

Network Security

Operating System Specific Configuration

Debian Family (Debian, Ubuntu)

Package Management & Security Updates

Firewall Configuration

Debian-Specific Security Tools

RHEL Family (RHEL, AlmaLinux, Rocky Linux, CentOS Stream)

SELinux Configuration

Package Management & Security Updates

Firewall Configuration

RHEL-Specific Security Features

Ansible Development Standards

Role Structure

Role Development Guidelines

Code Quality

Variable Management

Task Organization

System Information Gathering

Security Considerations in Roles

Production Readiness

Testing Strategy

Test Pyramid

Molecule Configuration Example

Documentation Standards

Required Documentation

Role Documentation (README.md)

Security Considerations

License

Author

roles, plays, playbooks, Cheatsheets and documentation

Playbook Organization

Directory Structure

Playbook Best Practices

Security and Compliance

Secrets Management

Audit and Compliance

Access Control

Performance Optimization

Execution Optimization

Infrastructure Optimization

Version Control

Git Workflow

Branch Strategy

22 KiB

Raw Blame History