Files
infra-automation/TASKS_WEEK_47.md
ansible f6d0ac0a9d Add comprehensive project improvement planning documents
Strategic and tactical planning documents for 12-week improvement
initiative across 7 key improvement areas.

IMPROVEMENT_PLAN.md (831 lines):
- Strategic 12-week improvement roadmap
- 7 improvement areas with priorities
- Infrastructure operations (P0/P1)
- Development quality & testing (P1/P2)
- Security & compliance (P1)
- Role development & expansion (P2/P3)
- Documentation & standards (P2/P3)
- Performance & scalability (P3)
- Detailed task breakdowns with time estimates
- Success metrics and KPIs
- Risk assessment and mitigation strategies
- Resource requirements (136 hours over 6 weeks)

TASKS_WEEK_47.md (832 lines):
- Detailed executable task plan for Week 47
- Day-by-day breakdown (Monday-Friday)
- Copy-paste ready bash commands
- Acceptance criteria for each task
- Rollback procedures
- Metrics tracking table
- Blocker identification

ASSESSMENT_SUMMARY.md (455 lines):
- Comprehensive project assessment
- Current state analysis (72/100 health score)
- Strengths and critical gaps identified
- Priority classification (P0-P3)
- Infrastructure status (67% connectivity)
- Role inventory (2 production-ready)
- Development quality gaps highlighted
- Next steps and immediate actions

Key Insights:
- Infrastructure: 67% operational (2/3 VMs reachable)
- Role compliance: 95% (excellent)
- Testing: 0% coverage (critical gap)
- CI/CD: Not implemented (critical gap)
- Documentation: 100% (excellent)

Planning Approach:
- Prioritized by impact and urgency
- Executable tasks with clear deliverables
- Time-boxed milestones
- Risk-aware with mitigation strategies
- Realistic resource estimates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 07:47:37 +01:00

22 KiB

Week 47 - Executable Task Plan

Week: November 11-17, 2025 Focus: Critical Infrastructure Recovery & Security Status: 🔴 ACTIVE


Overview

This week focuses on restoring full infrastructure operational status and addressing critical security concerns. All tasks are executable and have clear acceptance criteria.

Goals:

  • 100% VM connectivity (3/3 operational)
  • Git operations unblocked
  • Docker security baseline established
  • Documentation current

Daily Breakdown

Monday, Nov 11 (Day 1)

Task 1.1: Recover derp VM Connectivity [P0 - CRITICAL]

Priority: P0 - CRITICAL Estimated Time: 3-4 hours Status: 🔴 NOT STARTED

Issue:

  • derp VM (192.168.122.99) unreachable via SSH
  • Error: Permission denied (publickey,password)
  • Blocking system analysis and compliance verification

Execution Steps:

# Step 1: Access VM console
virsh console derp
# Login with root or available credentials

# Step 2: Verify ansible user exists
id ansible
# If not exists: useradd -m -s /bin/bash ansible

# Step 3: Configure sudo
echo "ansible ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ansible
chmod 0440 /etc/sudoers.d/ansible

# Step 4: Create .ssh directory
mkdir -p /home/ansible/.ssh
chmod 700 /home/ansible/.ssh
chown ansible:ansible /home/ansible/.ssh

# Step 5: Deploy SSH public key
# From control node:
cat ~/.ssh/id_rsa.pub
# Copy and paste into derp:/home/ansible/.ssh/authorized_keys

# On derp:
vi /home/ansible/.ssh/authorized_keys
# Paste public key
chmod 600 /home/ansible/.ssh/authorized_keys
chown ansible:ansible /home/ansible/.ssh/authorized_keys

# Step 6: Verify SSH configuration
grep -E "PubkeyAuthentication|PasswordAuthentication" /etc/ssh/sshd_config
systemctl restart sshd

# Step 7: Test from control node
ansible derp -m ping
ansible derp -m setup -a "filter=ansible_distribution*"

Acceptance Criteria:

  • ansible derp -m ping returns SUCCESS
  • Can execute playbooks against derp
  • Passwordless sudo works
  • SSH key authentication functional

Deliverables:

  • derp VM accessible via Ansible
  • Recovery procedure documented in docs/runbooks/vm-recovery.md

Rollback Plan:

  • Console access remains available if SSH fails
  • Can rebuild VM using deploy_linux_vm role if unrecoverable

Task 1.2: Fix Git Push Permission Issue [P0 - CRITICAL]

Priority: P0 - CRITICAL Estimated Time: 1-2 hours Status: 🔴 NOT STARTED

Issue:

  • Git push blocked by Gitea pre-receive hook
  • Blocking version control and collaboration

Execution Steps:

# Step 1: Attempt push with verbose output
GIT_TRACE=1 GIT_SSH_COMMAND="ssh -vvv" git push origin master 2>&1 | tee git-push-debug.log

# Step 2: Check repository permissions on Gitea
# Access Gitea web UI: https://git.mymx.me
# Login as ansible@mymx.me
# Check repository settings → Collaborators & permissions

# Step 3: Verify SSH key registered
# Gitea UI → Settings → SSH Keys
# Ensure control node's public key is registered

# Step 4: Check pre-receive hooks on server
ssh ansible@cow.mymx.me
find /path/to/gitea/repositories -name "pre-receive" -exec ls -la {} \;

# Step 5: Review hook script
cat /path/to/gitea/repositories/ansible/infrastructure/pre-receive
# Check for permission/ownership requirements

# Step 6: Test with minimal commit
echo "# Test" > TEST.md
git add TEST.md
git commit -m "Test commit for debugging git push"
git push origin master

# Step 7: If successful, remove test file
git rm TEST.md
git commit -m "Remove test file"
git push origin master

Acceptance Criteria:

  • git push succeeds without errors
  • Can push to master branch
  • Pre-receive hooks pass
  • Remote repository updated

Deliverables:

  • Git push operational
  • Git workflow documented
  • Issue root cause identified

Rollback Plan:

  • Local repository remains intact
  • Can work locally until resolved
  • Can use alternative git hosting if needed

Tuesday, Nov 12 (Day 2)

Task 2.1: Execute System Info Against derp [P1 - HIGH]

Priority: P1 - HIGH Estimated Time: 30 minutes Status: 🟡 DEPENDS ON: Task 1.1 Prerequisites: derp connectivity restored

Execution Steps:

# Step 1: Test connectivity
ansible derp -m ping

# Step 2: Run system info playbook
ansible-playbook playbooks/gather_system_info.yml --limit derp

# Step 3: Review collected data
cat stats/machines/$(ansible derp -m debug -a "var=ansible_fqdn" | grep -oP '(?<=: ").*(?=")' | head -1)/summary.txt

# Step 4: Analyze compliance gaps
# Compare against CLAUDE.md requirements
# Check for LVM configuration
# Check for swap configuration
# Check for QEMU agent

# Step 5: Update SYSTEM_ANALYSIS_AND_REMEDIATION.md
# Add derp section with findings

Acceptance Criteria:

  • System info collected successfully
  • JSON and summary files created
  • Compliance gaps identified
  • Remediation tasks added to TODO.md

Deliverables:

  • stats/machines/derp.*/system_info.json
  • stats/machines/derp.*/summary.txt
  • Updated SYSTEM_ANALYSIS_AND_REMEDIATION.md with derp findings

Task 2.2: Install QEMU Guest Agent on mymx [P1 - HIGH]

Priority: P1 - HIGH Estimated Time: 30-45 minutes Status: 🔴 NOT STARTED

Issue:

  • mymx missing QEMU agent functionality
  • Cannot perform graceful shutdowns via libvirt
  • Limited resource monitoring

Execution Steps:

# Step 1: Verify VM has virtio-serial channel
virsh dumpxml mymx | grep -A5 "channel type"

# Step 2: Add channel if missing
virsh edit mymx
# Add inside <devices> section:
#   <channel type='unix'>
#     <target type='virtio' name='org.qemu.guest_agent.0'/>
#     <address type='virtio-serial' controller='0' bus='0' port='1'/>
#   </channel>

# Step 3: Verify controller exists
virsh dumpxml mymx | grep virtio-serial

# Step 4: If controller missing, add:
#   <controller type='virtio-serial' index='0'>
#     <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
#   </controller>

# Step 5: Restart VM if XML changed
virsh shutdown mymx
# Wait for graceful shutdown (may timeout without agent)
virsh destroy mymx  # Force if timeout
virsh start mymx

# Step 6: Execute playbook
ansible-playbook playbooks/install_qemu_agent.yml --limit mymx

# Step 7: Verify agent is running
virsh qemu-agent-command mymx '{"execute":"guest-ping"}'
virsh domifaddr mymx --source agent

# Step 8: Test guest commands
ansible mymx -m setup -a "filter=ansible_virtualization*"

Acceptance Criteria:

  • virtio-serial channel configured in VM XML
  • qemu-guest-agent package installed
  • Service running and enabled
  • Agent responds to libvirt queries
  • Can retrieve IP via guest agent

Deliverables:

  • mymx QEMU agent operational
  • Can use virsh qemu-agent-command
  • Graceful shutdowns possible

Rollback Plan:

  • Remove channel from XML if issues
  • Agent package can be removed: apt remove qemu-guest-agent

Wednesday, Nov 13 (Day 3)

Task 3.1: Configure Swap on derp [P1 - HIGH]

Priority: P1 - HIGH Estimated Time: 15 minutes Status: 🟡 DEPENDS ON: Task 1.1 Prerequisites: derp connectivity restored

Execution Steps:

# Step 1: Execute swap configuration playbook
ansible-playbook playbooks/configure_swap.yml --limit derp

# Step 2: Verify swap is active
ansible derp -m shell -a "swapon --show"
ansible derp -m shell -a "free -h | grep -i swap"

# Step 3: Verify persistence
ansible derp -m shell -a "grep swap /etc/fstab"

# Step 4: Test reboot persistence (optional)
# virsh reboot derp
# Wait 1 minute
# ansible derp -m shell -a "swapon --show"

# Step 5: Update compliance metrics
# Update SUMMARY.md: derp compliance score

Acceptance Criteria:

  • 2GB swap configured
  • Swap active and persistent
  • /etc/fstab entry correct
  • Survives reboot

Deliverables:

  • derp has compliant swap configuration
  • Compliance score updated

Task 3.2: Docker Security Audit Playbook - Part 1 [P1 - HIGH]

Priority: P1 - HIGH Estimated Time: 3-4 hours Status: 🔴 NOT STARTED

Objective: Create comprehensive Docker security audit playbook

Execution Steps:

# Step 1: Create playbook structure
mkdir -p playbooks/roles/audit_docker
cd playbooks

# Step 2: Create playbooks/audit_docker.yml
cat > audit_docker.yml <<'EOF'
---
- name: Docker Security Audit
  hosts: all
  become: true
  gather_facts: true

  vars:
    audit_output_dir: "./stats/docker_audits"

  tasks:
    - name: Check if Docker is installed
      ansible.builtin.command: docker --version
      register: docker_version
      failed_when: false
      changed_when: false

    - name: Skip audit if Docker not installed
      ansible.builtin.meta: end_host
      when: docker_version.rc != 0

    - name: Create audit output directory
      ansible.builtin.file:
        path: "{{ audit_output_dir }}/{{ inventory_hostname }}"
        state: directory
        mode: '0755'
      delegate_to: localhost

    - name: Audit Docker daemon configuration
      ansible.builtin.slurp:
        src: /etc/docker/daemon.json
      register: docker_daemon_config
      failed_when: false

    - name: Check Docker daemon security options
      ansible.builtin.shell: |
        docker info --format '{{ .SecurityOptions }}'
      register: docker_security_options
      changed_when: false

    - name: List running containers
      ansible.builtin.command: docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
      register: docker_containers
      changed_when: false

    - name: Audit container privileges
      ansible.builtin.shell: |
        docker inspect $(docker ps -q) --format '{{.Name}}: Privileged={{.HostConfig.Privileged}}'
      register: container_privileges
      changed_when: false
      failed_when: false

    - name: Check user namespace remapping
      ansible.builtin.shell: |
        docker info --format '{{ .SecurityOptions }}' | grep -i userns
      register: userns_check
      changed_when: false
      failed_when: false

    - name: Audit AppArmor/SELinux profiles
      ansible.builtin.shell: |
        docker inspect $(docker ps -q) --format '{{.Name}}: AppArmor={{.AppArmorProfile}} SELinux={{.HostConfig.SecurityOpt}}'
      register: security_profiles
      changed_when: false
      failed_when: false

    - name: Check network modes
      ansible.builtin.shell: |
        docker inspect $(docker ps -q) --format '{{.Name}}: NetworkMode={{.HostConfig.NetworkMode}}'
      register: network_modes
      changed_when: false
      failed_when: false

    - name: Check resource limits
      ansible.builtin.shell: |
        docker inspect $(docker ps -q) --format '{{.Name}}: Memory={{.HostConfig.Memory}} CPU={{.HostConfig.CpuShares}}'
      register: resource_limits
      changed_when: false
      failed_when: false

    - name: Check for exposed privileged ports
      ansible.builtin.shell: |
        docker ps --format "{{.Names}}: {{.Ports}}"
      register: exposed_ports
      changed_when: false

    - name: Generate audit report
      ansible.builtin.template:
        src: templates/docker_audit_report.j2
        dest: "{{ audit_output_dir }}/{{ inventory_hostname }}/docker_audit_{{ ansible_date_time.epoch }}.txt"
      delegate_to: localhost

    - name: Display audit summary
      ansible.builtin.debug:
        msg:
          - "=== Docker Security Audit Summary ==="
          - "Host: {{ inventory_hostname }}"
          - "Docker Version: {{ docker_version.stdout }}"
          - "Running Containers: {{ docker_containers.stdout_lines | length }}"
          - "Security Options: {{ docker_security_options.stdout }}"
          - "Report saved to: {{ audit_output_dir }}/{{ inventory_hostname }}/"
EOF

# Step 3: Create template for audit report
mkdir -p templates
cat > templates/docker_audit_report.j2 <<'EOF'
Docker Security Audit Report
========================================
Host: {{ inventory_hostname }}
Date: {{ ansible_date_time.iso8601 }}
Auditor: Ansible Automation

System Information
------------------
Hostname: {{ ansible_hostname }}
OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
Kernel: {{ ansible_kernel }}

Docker Information
------------------
Version: {{ docker_version.stdout }}
Security Options: {{ docker_security_options.stdout }}

Running Containers
------------------
{{ docker_containers.stdout }}

Container Privilege Audit
--------------------------
{{ container_privileges.stdout | default('No containers running') }}

User Namespace Remapping
-------------------------
{{ userns_check.stdout | default('Not configured') }}

Security Profiles (AppArmor/SELinux)
-------------------------------------
{{ security_profiles.stdout | default('No containers running') }}

Network Modes
-------------
{{ network_modes.stdout | default('No containers running') }}

Resource Limits
---------------
{{ resource_limits.stdout | default('No containers running') }}

Exposed Ports
-------------
{{ exposed_ports.stdout }}

Security Findings
-----------------
{% if container_privileges.stdout is defined %}
  {% if 'Privileged=true' in container_privileges.stdout %}
⚠️  CRITICAL: Privileged containers detected!
  {% endif %}
{% endif %}

{% if network_modes.stdout is defined %}
  {% if 'NetworkMode=host' in network_modes.stdout %}
⚠️  WARNING: Containers using host network mode detected!
  {% endif %}
{% endif %}

{% if 'userns' not in (userns_check.stdout | default('')) %}
⚠️  WARNING: User namespace remapping not configured!
{% endif %}

Recommendations
---------------
1. Disable privileged mode unless absolutely necessary
2. Use bridge network mode instead of host mode
3. Configure user namespace remapping
4. Set resource limits on all containers
5. Use AppArmor/SELinux profiles
6. Regular image vulnerability scanning
7. Minimize exposed ports

EOF
chmod 644 templates/docker_audit_report.j2

Acceptance Criteria:

  • playbooks/audit_docker.yml created
  • Template file created
  • Playbook syntax valid
  • Can run in check mode

Deliverables:

  • playbooks/audit_docker.yml
  • templates/docker_audit_report.j2

Thursday, Nov 14 (Day 4)

Task 4.1: Execute Docker Security Audit [P1 - HIGH]

Priority: P1 - HIGH Estimated Time: 1-2 hours Status: 🟡 DEPENDS ON: Task 3.2 Prerequisites: Audit playbook created

Execution Steps:

# Step 1: Test playbook syntax
ansible-playbook playbooks/audit_docker.yml --syntax-check

# Step 2: Run in check mode
ansible-playbook playbooks/audit_docker.yml --check

# Step 3: Execute against pihole (has Docker)
ansible-playbook playbooks/audit_docker.yml --limit pihole

# Step 4: Review audit report
cat stats/docker_audits/pihole.*/docker_audit_*.txt

# Step 5: Analyze findings
# Document critical issues
# Create remediation tasks

# Step 6: Execute against all hosts
ansible-playbook playbooks/audit_docker.yml

# Step 7: Create summary document
# Consolidate findings
# Prioritize remediation actions

Acceptance Criteria:

  • Audit completed successfully on pihole
  • Audit report generated
  • Critical findings documented
  • Remediation tasks created

Deliverables:

  • Audit reports in stats/docker_audits/
  • Summary of findings
  • Remediation plan for Docker security

Task 4.2: Update CHANGELOG.md [P2 - MEDIUM]

Priority: P2 - MEDIUM Estimated Time: 1 hour Status: 🔴 NOT STARTED

Objective: Document Week 46 achievements

Execution Steps:

# Edit CHANGELOG.md and add Week 46 section

Additions to CHANGELOG.md:

## [0.2.0] - 2025-11-11

### Added - Week 46 Achievements

#### Infrastructure Improvements
- System analysis and remediation framework (SYSTEM_ANALYSIS_AND_REMEDIATION.md)
- Automated remediation playbooks:
  - playbooks/configure_swap.yml (automated swap configuration)
  - playbooks/install_qemu_agent.yml (QEMU guest agent deployment)
- SSH jump host / bastion documentation (543 lines)
- Dynamic inventory migration (removed static inventory files)

#### Role Compliance Improvements
- deploy_linux_vm role: 70% → 95% CLAUDE.md compliance
  - Added comprehensive error handling (block/rescue/always)
  - Complete handler suite (15 handlers)
  - Vault variable integration for secrets
  - CHANGELOG.md and ROADMAP.md
  - Enhanced documentation (899 lines)
- system_info role: 70% → 95% CLAUDE.md compliance
  - Added validation tasks
  - Health check implementation
  - CHANGELOG.md and ROADMAP.md
  - Production-ready status

#### Documentation
- Project tracking documents:
  - TODO.md (85 lines)
  - SUMMARY.md (95 lines)
  - ROADMAP.md updates (537 lines)
- Network access patterns documentation
- Role-specific documentation expansion
- Cheatsheet updates

### Changed - Week 46
- Removed static inventory files (inventory-debian-vm.ini, etc.)
- Improved SSH connectivity (mymx restored from 0% to 90% compliance)
- Fixed Jinja2 template conflicts in Docker/Podman detection

### Fixed - Week 46
- Critical playbook execution errors in system_info role
- Block-level failed_when syntax errors
- SSH authentication issues on mymx
- GSSAPI SSH warnings

### Infrastructure Status - Week 46
- pihole: 60% → 75% compliance (+15%)
  - ✅ Swap configured (2GB)
  - ✅ QEMU agent operational
  - ⏳ LVM migration pending
- mymx: 0% → 90% compliance (+90%)
  - ✅ SSH access restored
  - ✅ LVM configured
  - ✅ Swap configured
  - ⏳ QEMU agent needs channel configuration
- derp: Unreachable (pending recovery)

### Metrics - Week 46
- **Time to Resolution:** <3 minutes for critical remediations
  - Swap configuration: 12 seconds
  - QEMU agent installation: 7 seconds
- **Documentation Growth:** 2,100+ lines added
- **Role Compliance:** +25% improvement average
- **Infrastructure Connectivity:** 67% (2/3 VMs operational)

Acceptance Criteria:

  • CHANGELOG.md updated with Week 46 achievements
  • Version 0.2.0 tagged
  • All improvements documented

Friday, Nov 15 (Day 5)

Task 5.1: Fix Ansible Galaxy Configuration [P2 - MEDIUM]

Priority: P2 - MEDIUM Estimated Time: 30 minutes Status: 🔴 NOT STARTED

Issue:

ERROR! No setting was provided for required configuration plugin_type: galaxy_server plugin: automation_hub setting: url

Execution Steps:

# Step 1: Review current ansible.cfg
grep -A10 "galaxy_server" ansible.cfg

# Step 2: Fix galaxy_server configuration
# Edit ansible.cfg and remove/comment out incomplete sections

# Step 3: Test configuration
ansible-galaxy collection list

# Step 4: Verify collections are installed
ansible-galaxy collection install -r collections/requirements.yml --force

# Step 5: List installed collections
ansible-galaxy collection list | head -20

Fix for ansible.cfg:

[galaxy]
server_list = galaxy

[galaxy_server.galaxy]
url = https://galaxy.ansible.com

# Remove or comment out incomplete automation_hub section

Acceptance Criteria:

  • ansible-galaxy commands work without errors
  • Can list installed collections
  • Can install new collections

Deliverables:

  • ansible.cfg corrected
  • Collections verified

Task 5.2: Weekly Review and Planning [P2 - MEDIUM]

Priority: P2 - MEDIUM Estimated Time: 1-2 hours Status: 🔴 NOT STARTED

Execution Steps:

# Step 1: Review completed tasks
# Check TODO.md completion status
# Verify all Week 47 P0/P1 tasks complete

# Step 2: Update metrics in SUMMARY.md
# VM connectivity: should be 3/3 = 100%
# Compliance scores updated
# New playbooks added to count

# Step 3: Update TODO.md
# Move completed items to done
# Add new items from audit findings
# Plan Week 48 tasks

# Step 4: Git commit and push (if unblocked)
git add CHANGELOG.md TODO.md SUMMARY.md IMPROVEMENT_PLAN.md TASKS_WEEK_47.md
git commit -m "Week 47 completion: Infrastructure recovery and security audit"
git push origin master

# Step 5: Create Week 48 task plan
# Copy this file structure
# Update tasks based on IMPROVEMENT_PLAN.md Week 48 section

Acceptance Criteria:

  • All P0/P1 tasks completed or documented as blocked
  • Metrics updated
  • Week 48 plan created
  • Changes committed to git

Deliverables:

  • Updated TODO.md
  • Updated SUMMARY.md
  • TASKS_WEEK_48.md created

Success Criteria

Must Complete (P0 - Critical)

  • derp VM connectivity restored
  • Git push permissions fixed
  • System info collected from all 3 VMs

Should Complete (P1 - High Priority)

  • QEMU agent installed on mymx
  • Swap configured on derp
  • Docker security audit playbook created
  • Docker security audit executed
  • CHANGELOG.md updated

Nice to Have (P2 - Medium Priority)

  • Ansible Galaxy configuration fixed
  • Weekly review completed
  • Week 48 plan created

Metrics Tracking

Metric Start of Week Target Current
VM Connectivity 67% (2/3) 100% (3/3) ___
Git Operations 0% (blocked) 100% ___
QEMU Agent Coverage 33% (1/3) 67% (2/3) ___
Swap Coverage 67% (2/3) 100% (3/3) ___
Docker Security Audit 0% 100% ___
Documentation Current 90% 100% ___

Blockers and Risks

Current Blockers

  • None at start of week

Potential Risks

  1. derp VM console access issues

    • Mitigation: Can rebuild VM if unrecoverable
  2. Git push issue requires Gitea server access

    • Mitigation: Can work locally, push later
  3. Docker audit findings may require extensive remediation

    • Mitigation: Document findings, plan Week 48 remediation
  4. Time constraints

    • Mitigation: Focus on P0/P1, defer P2 if needed

Daily Standup Template

What was completed yesterday:

What will be done today:

Blockers:

Updated Metrics:



Week Start: 2025-11-11 (Monday) Week End: 2025-11-17 (Sunday) Review Date: 2025-11-15 (Friday) Next Planning: 2025-11-18 (Monday) - Week 48