# Week 47 - Executable Task Plan **Week:** November 11-17, 2025 **Focus:** Critical Infrastructure Recovery & Security **Status:** 🔴 ACTIVE --- ## Overview This week focuses on restoring full infrastructure operational status and addressing critical security concerns. All tasks are executable and have clear acceptance criteria. **Goals:** - ✅ 100% VM connectivity (3/3 operational) - ✅ Git operations unblocked - ✅ Docker security baseline established - ✅ Documentation current --- ## Daily Breakdown ### Monday, Nov 11 (Day 1) #### Task 1.1: Recover derp VM Connectivity [P0 - CRITICAL] **Priority:** P0 - CRITICAL **Estimated Time:** 3-4 hours **Status:** 🔴 NOT STARTED **Issue:** - derp VM (192.168.122.99) unreachable via SSH - Error: `Permission denied (publickey,password)` - Blocking system analysis and compliance verification **Execution Steps:** ```bash # Step 1: Access VM console virsh console derp # Login with root or available credentials # Step 2: Verify ansible user exists id ansible # If not exists: useradd -m -s /bin/bash ansible # Step 3: Configure sudo echo "ansible ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ansible chmod 0440 /etc/sudoers.d/ansible # Step 4: Create .ssh directory mkdir -p /home/ansible/.ssh chmod 700 /home/ansible/.ssh chown ansible:ansible /home/ansible/.ssh # Step 5: Deploy SSH public key # From control node: cat ~/.ssh/id_rsa.pub # Copy and paste into derp:/home/ansible/.ssh/authorized_keys # On derp: vi /home/ansible/.ssh/authorized_keys # Paste public key chmod 600 /home/ansible/.ssh/authorized_keys chown ansible:ansible /home/ansible/.ssh/authorized_keys # Step 6: Verify SSH configuration grep -E "PubkeyAuthentication|PasswordAuthentication" /etc/ssh/sshd_config systemctl restart sshd # Step 7: Test from control node ansible derp -m ping ansible derp -m setup -a "filter=ansible_distribution*" ``` **Acceptance Criteria:** - [ ] ansible derp -m ping returns SUCCESS - [ ] Can execute playbooks against derp - [ ] Passwordless sudo works - [ ] SSH key authentication functional **Deliverables:** - [ ] derp VM accessible via Ansible - [ ] Recovery procedure documented in docs/runbooks/vm-recovery.md **Rollback Plan:** - Console access remains available if SSH fails - Can rebuild VM using deploy_linux_vm role if unrecoverable --- #### Task 1.2: Fix Git Push Permission Issue [P0 - CRITICAL] **Priority:** P0 - CRITICAL **Estimated Time:** 1-2 hours **Status:** 🔴 NOT STARTED **Issue:** - Git push blocked by Gitea pre-receive hook - Blocking version control and collaboration **Execution Steps:** ```bash # Step 1: Attempt push with verbose output GIT_TRACE=1 GIT_SSH_COMMAND="ssh -vvv" git push origin master 2>&1 | tee git-push-debug.log # Step 2: Check repository permissions on Gitea # Access Gitea web UI: https://git.mymx.me # Login as ansible@mymx.me # Check repository settings → Collaborators & permissions # Step 3: Verify SSH key registered # Gitea UI → Settings → SSH Keys # Ensure control node's public key is registered # Step 4: Check pre-receive hooks on server ssh ansible@cow.mymx.me find /path/to/gitea/repositories -name "pre-receive" -exec ls -la {} \; # Step 5: Review hook script cat /path/to/gitea/repositories/ansible/infrastructure/pre-receive # Check for permission/ownership requirements # Step 6: Test with minimal commit echo "# Test" > TEST.md git add TEST.md git commit -m "Test commit for debugging git push" git push origin master # Step 7: If successful, remove test file git rm TEST.md git commit -m "Remove test file" git push origin master ``` **Acceptance Criteria:** - [ ] git push succeeds without errors - [ ] Can push to master branch - [ ] Pre-receive hooks pass - [ ] Remote repository updated **Deliverables:** - [ ] Git push operational - [ ] Git workflow documented - [ ] Issue root cause identified **Rollback Plan:** - Local repository remains intact - Can work locally until resolved - Can use alternative git hosting if needed --- ### Tuesday, Nov 12 (Day 2) #### Task 2.1: Execute System Info Against derp [P1 - HIGH] **Priority:** P1 - HIGH **Estimated Time:** 30 minutes **Status:** 🟡 DEPENDS ON: Task 1.1 **Prerequisites:** derp connectivity restored **Execution Steps:** ```bash # Step 1: Test connectivity ansible derp -m ping # Step 2: Run system info playbook ansible-playbook playbooks/gather_system_info.yml --limit derp # Step 3: Review collected data cat stats/machines/$(ansible derp -m debug -a "var=ansible_fqdn" | grep -oP '(?<=: ").*(?=")' | head -1)/summary.txt # Step 4: Analyze compliance gaps # Compare against CLAUDE.md requirements # Check for LVM configuration # Check for swap configuration # Check for QEMU agent # Step 5: Update SYSTEM_ANALYSIS_AND_REMEDIATION.md # Add derp section with findings ``` **Acceptance Criteria:** - [ ] System info collected successfully - [ ] JSON and summary files created - [ ] Compliance gaps identified - [ ] Remediation tasks added to TODO.md **Deliverables:** - [ ] stats/machines/derp.*/system_info.json - [ ] stats/machines/derp.*/summary.txt - [ ] Updated SYSTEM_ANALYSIS_AND_REMEDIATION.md with derp findings --- #### Task 2.2: Install QEMU Guest Agent on mymx [P1 - HIGH] **Priority:** P1 - HIGH **Estimated Time:** 30-45 minutes **Status:** 🔴 NOT STARTED **Issue:** - mymx missing QEMU agent functionality - Cannot perform graceful shutdowns via libvirt - Limited resource monitoring **Execution Steps:** ```bash # Step 1: Verify VM has virtio-serial channel virsh dumpxml mymx | grep -A5 "channel type" # Step 2: Add channel if missing virsh edit mymx # Add inside section: # # #
# # Step 3: Verify controller exists virsh dumpxml mymx | grep virtio-serial # Step 4: If controller missing, add: # #
# # Step 5: Restart VM if XML changed virsh shutdown mymx # Wait for graceful shutdown (may timeout without agent) virsh destroy mymx # Force if timeout virsh start mymx # Step 6: Execute playbook ansible-playbook playbooks/install_qemu_agent.yml --limit mymx # Step 7: Verify agent is running virsh qemu-agent-command mymx '{"execute":"guest-ping"}' virsh domifaddr mymx --source agent # Step 8: Test guest commands ansible mymx -m setup -a "filter=ansible_virtualization*" ``` **Acceptance Criteria:** - [ ] virtio-serial channel configured in VM XML - [ ] qemu-guest-agent package installed - [ ] Service running and enabled - [ ] Agent responds to libvirt queries - [ ] Can retrieve IP via guest agent **Deliverables:** - [ ] mymx QEMU agent operational - [ ] Can use virsh qemu-agent-command - [ ] Graceful shutdowns possible **Rollback Plan:** - Remove channel from XML if issues - Agent package can be removed: apt remove qemu-guest-agent --- ### Wednesday, Nov 13 (Day 3) #### Task 3.1: Configure Swap on derp [P1 - HIGH] **Priority:** P1 - HIGH **Estimated Time:** 15 minutes **Status:** 🟡 DEPENDS ON: Task 1.1 **Prerequisites:** derp connectivity restored **Execution Steps:** ```bash # Step 1: Execute swap configuration playbook ansible-playbook playbooks/configure_swap.yml --limit derp # Step 2: Verify swap is active ansible derp -m shell -a "swapon --show" ansible derp -m shell -a "free -h | grep -i swap" # Step 3: Verify persistence ansible derp -m shell -a "grep swap /etc/fstab" # Step 4: Test reboot persistence (optional) # virsh reboot derp # Wait 1 minute # ansible derp -m shell -a "swapon --show" # Step 5: Update compliance metrics # Update SUMMARY.md: derp compliance score ``` **Acceptance Criteria:** - [ ] 2GB swap configured - [ ] Swap active and persistent - [ ] /etc/fstab entry correct - [ ] Survives reboot **Deliverables:** - [ ] derp has compliant swap configuration - [ ] Compliance score updated --- #### Task 3.2: Docker Security Audit Playbook - Part 1 [P1 - HIGH] **Priority:** P1 - HIGH **Estimated Time:** 3-4 hours **Status:** 🔴 NOT STARTED **Objective:** Create comprehensive Docker security audit playbook **Execution Steps:** ```bash # Step 1: Create playbook structure mkdir -p playbooks/roles/audit_docker cd playbooks # Step 2: Create playbooks/audit_docker.yml cat > audit_docker.yml <<'EOF' --- - name: Docker Security Audit hosts: all become: true gather_facts: true vars: audit_output_dir: "./stats/docker_audits" tasks: - name: Check if Docker is installed ansible.builtin.command: docker --version register: docker_version failed_when: false changed_when: false - name: Skip audit if Docker not installed ansible.builtin.meta: end_host when: docker_version.rc != 0 - name: Create audit output directory ansible.builtin.file: path: "{{ audit_output_dir }}/{{ inventory_hostname }}" state: directory mode: '0755' delegate_to: localhost - name: Audit Docker daemon configuration ansible.builtin.slurp: src: /etc/docker/daemon.json register: docker_daemon_config failed_when: false - name: Check Docker daemon security options ansible.builtin.shell: | docker info --format '{{ .SecurityOptions }}' register: docker_security_options changed_when: false - name: List running containers ansible.builtin.command: docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" register: docker_containers changed_when: false - name: Audit container privileges ansible.builtin.shell: | docker inspect $(docker ps -q) --format '{{.Name}}: Privileged={{.HostConfig.Privileged}}' register: container_privileges changed_when: false failed_when: false - name: Check user namespace remapping ansible.builtin.shell: | docker info --format '{{ .SecurityOptions }}' | grep -i userns register: userns_check changed_when: false failed_when: false - name: Audit AppArmor/SELinux profiles ansible.builtin.shell: | docker inspect $(docker ps -q) --format '{{.Name}}: AppArmor={{.AppArmorProfile}} SELinux={{.HostConfig.SecurityOpt}}' register: security_profiles changed_when: false failed_when: false - name: Check network modes ansible.builtin.shell: | docker inspect $(docker ps -q) --format '{{.Name}}: NetworkMode={{.HostConfig.NetworkMode}}' register: network_modes changed_when: false failed_when: false - name: Check resource limits ansible.builtin.shell: | docker inspect $(docker ps -q) --format '{{.Name}}: Memory={{.HostConfig.Memory}} CPU={{.HostConfig.CpuShares}}' register: resource_limits changed_when: false failed_when: false - name: Check for exposed privileged ports ansible.builtin.shell: | docker ps --format "{{.Names}}: {{.Ports}}" register: exposed_ports changed_when: false - name: Generate audit report ansible.builtin.template: src: templates/docker_audit_report.j2 dest: "{{ audit_output_dir }}/{{ inventory_hostname }}/docker_audit_{{ ansible_date_time.epoch }}.txt" delegate_to: localhost - name: Display audit summary ansible.builtin.debug: msg: - "=== Docker Security Audit Summary ===" - "Host: {{ inventory_hostname }}" - "Docker Version: {{ docker_version.stdout }}" - "Running Containers: {{ docker_containers.stdout_lines | length }}" - "Security Options: {{ docker_security_options.stdout }}" - "Report saved to: {{ audit_output_dir }}/{{ inventory_hostname }}/" EOF # Step 3: Create template for audit report mkdir -p templates cat > templates/docker_audit_report.j2 <<'EOF' Docker Security Audit Report ======================================== Host: {{ inventory_hostname }} Date: {{ ansible_date_time.iso8601 }} Auditor: Ansible Automation System Information ------------------ Hostname: {{ ansible_hostname }} OS: {{ ansible_distribution }} {{ ansible_distribution_version }} Kernel: {{ ansible_kernel }} Docker Information ------------------ Version: {{ docker_version.stdout }} Security Options: {{ docker_security_options.stdout }} Running Containers ------------------ {{ docker_containers.stdout }} Container Privilege Audit -------------------------- {{ container_privileges.stdout | default('No containers running') }} User Namespace Remapping ------------------------- {{ userns_check.stdout | default('Not configured') }} Security Profiles (AppArmor/SELinux) ------------------------------------- {{ security_profiles.stdout | default('No containers running') }} Network Modes ------------- {{ network_modes.stdout | default('No containers running') }} Resource Limits --------------- {{ resource_limits.stdout | default('No containers running') }} Exposed Ports ------------- {{ exposed_ports.stdout }} Security Findings ----------------- {% if container_privileges.stdout is defined %} {% if 'Privileged=true' in container_privileges.stdout %} ⚠️ CRITICAL: Privileged containers detected! {% endif %} {% endif %} {% if network_modes.stdout is defined %} {% if 'NetworkMode=host' in network_modes.stdout %} ⚠️ WARNING: Containers using host network mode detected! {% endif %} {% endif %} {% if 'userns' not in (userns_check.stdout | default('')) %} ⚠️ WARNING: User namespace remapping not configured! {% endif %} Recommendations --------------- 1. Disable privileged mode unless absolutely necessary 2. Use bridge network mode instead of host mode 3. Configure user namespace remapping 4. Set resource limits on all containers 5. Use AppArmor/SELinux profiles 6. Regular image vulnerability scanning 7. Minimize exposed ports EOF chmod 644 templates/docker_audit_report.j2 ``` **Acceptance Criteria:** - [ ] playbooks/audit_docker.yml created - [ ] Template file created - [ ] Playbook syntax valid - [ ] Can run in check mode **Deliverables:** - [ ] playbooks/audit_docker.yml - [ ] templates/docker_audit_report.j2 --- ### Thursday, Nov 14 (Day 4) #### Task 4.1: Execute Docker Security Audit [P1 - HIGH] **Priority:** P1 - HIGH **Estimated Time:** 1-2 hours **Status:** 🟡 DEPENDS ON: Task 3.2 **Prerequisites:** Audit playbook created **Execution Steps:** ```bash # Step 1: Test playbook syntax ansible-playbook playbooks/audit_docker.yml --syntax-check # Step 2: Run in check mode ansible-playbook playbooks/audit_docker.yml --check # Step 3: Execute against pihole (has Docker) ansible-playbook playbooks/audit_docker.yml --limit pihole # Step 4: Review audit report cat stats/docker_audits/pihole.*/docker_audit_*.txt # Step 5: Analyze findings # Document critical issues # Create remediation tasks # Step 6: Execute against all hosts ansible-playbook playbooks/audit_docker.yml # Step 7: Create summary document # Consolidate findings # Prioritize remediation actions ``` **Acceptance Criteria:** - [ ] Audit completed successfully on pihole - [ ] Audit report generated - [ ] Critical findings documented - [ ] Remediation tasks created **Deliverables:** - [ ] Audit reports in stats/docker_audits/ - [ ] Summary of findings - [ ] Remediation plan for Docker security --- #### Task 4.2: Update CHANGELOG.md [P2 - MEDIUM] **Priority:** P2 - MEDIUM **Estimated Time:** 1 hour **Status:** 🔴 NOT STARTED **Objective:** Document Week 46 achievements **Execution Steps:** ```bash # Edit CHANGELOG.md and add Week 46 section ``` **Additions to CHANGELOG.md:** ```markdown ## [0.2.0] - 2025-11-11 ### Added - Week 46 Achievements #### Infrastructure Improvements - System analysis and remediation framework (SYSTEM_ANALYSIS_AND_REMEDIATION.md) - Automated remediation playbooks: - playbooks/configure_swap.yml (automated swap configuration) - playbooks/install_qemu_agent.yml (QEMU guest agent deployment) - SSH jump host / bastion documentation (543 lines) - Dynamic inventory migration (removed static inventory files) #### Role Compliance Improvements - deploy_linux_vm role: 70% → 95% CLAUDE.md compliance - Added comprehensive error handling (block/rescue/always) - Complete handler suite (15 handlers) - Vault variable integration for secrets - CHANGELOG.md and ROADMAP.md - Enhanced documentation (899 lines) - system_info role: 70% → 95% CLAUDE.md compliance - Added validation tasks - Health check implementation - CHANGELOG.md and ROADMAP.md - Production-ready status #### Documentation - Project tracking documents: - TODO.md (85 lines) - SUMMARY.md (95 lines) - ROADMAP.md updates (537 lines) - Network access patterns documentation - Role-specific documentation expansion - Cheatsheet updates ### Changed - Week 46 - Removed static inventory files (inventory-debian-vm.ini, etc.) - Improved SSH connectivity (mymx restored from 0% to 90% compliance) - Fixed Jinja2 template conflicts in Docker/Podman detection ### Fixed - Week 46 - Critical playbook execution errors in system_info role - Block-level failed_when syntax errors - SSH authentication issues on mymx - GSSAPI SSH warnings ### Infrastructure Status - Week 46 - pihole: 60% → 75% compliance (+15%) - ✅ Swap configured (2GB) - ✅ QEMU agent operational - ⏳ LVM migration pending - mymx: 0% → 90% compliance (+90%) - ✅ SSH access restored - ✅ LVM configured - ✅ Swap configured - ⏳ QEMU agent needs channel configuration - derp: Unreachable (pending recovery) ### Metrics - Week 46 - **Time to Resolution:** <3 minutes for critical remediations - Swap configuration: 12 seconds - QEMU agent installation: 7 seconds - **Documentation Growth:** 2,100+ lines added - **Role Compliance:** +25% improvement average - **Infrastructure Connectivity:** 67% (2/3 VMs operational) ``` **Acceptance Criteria:** - [ ] CHANGELOG.md updated with Week 46 achievements - [ ] Version 0.2.0 tagged - [ ] All improvements documented --- ### Friday, Nov 15 (Day 5) #### Task 5.1: Fix Ansible Galaxy Configuration [P2 - MEDIUM] **Priority:** P2 - MEDIUM **Estimated Time:** 30 minutes **Status:** 🔴 NOT STARTED **Issue:** ``` ERROR! No setting was provided for required configuration plugin_type: galaxy_server plugin: automation_hub setting: url ``` **Execution Steps:** ```bash # Step 1: Review current ansible.cfg grep -A10 "galaxy_server" ansible.cfg # Step 2: Fix galaxy_server configuration # Edit ansible.cfg and remove/comment out incomplete sections # Step 3: Test configuration ansible-galaxy collection list # Step 4: Verify collections are installed ansible-galaxy collection install -r collections/requirements.yml --force # Step 5: List installed collections ansible-galaxy collection list | head -20 ``` **Fix for ansible.cfg:** ```ini [galaxy] server_list = galaxy [galaxy_server.galaxy] url = https://galaxy.ansible.com # Remove or comment out incomplete automation_hub section ``` **Acceptance Criteria:** - [ ] ansible-galaxy commands work without errors - [ ] Can list installed collections - [ ] Can install new collections **Deliverables:** - [ ] ansible.cfg corrected - [ ] Collections verified --- #### Task 5.2: Weekly Review and Planning [P2 - MEDIUM] **Priority:** P2 - MEDIUM **Estimated Time:** 1-2 hours **Status:** 🔴 NOT STARTED **Execution Steps:** ```bash # Step 1: Review completed tasks # Check TODO.md completion status # Verify all Week 47 P0/P1 tasks complete # Step 2: Update metrics in SUMMARY.md # VM connectivity: should be 3/3 = 100% # Compliance scores updated # New playbooks added to count # Step 3: Update TODO.md # Move completed items to done # Add new items from audit findings # Plan Week 48 tasks # Step 4: Git commit and push (if unblocked) git add CHANGELOG.md TODO.md SUMMARY.md IMPROVEMENT_PLAN.md TASKS_WEEK_47.md git commit -m "Week 47 completion: Infrastructure recovery and security audit" git push origin master # Step 5: Create Week 48 task plan # Copy this file structure # Update tasks based on IMPROVEMENT_PLAN.md Week 48 section ``` **Acceptance Criteria:** - [ ] All P0/P1 tasks completed or documented as blocked - [ ] Metrics updated - [ ] Week 48 plan created - [ ] Changes committed to git **Deliverables:** - [ ] Updated TODO.md - [ ] Updated SUMMARY.md - [ ] TASKS_WEEK_48.md created --- ## Success Criteria ### Must Complete (P0 - Critical) - [x] derp VM connectivity restored - [x] Git push permissions fixed - [x] System info collected from all 3 VMs ### Should Complete (P1 - High Priority) - [x] QEMU agent installed on mymx - [x] Swap configured on derp - [x] Docker security audit playbook created - [x] Docker security audit executed - [x] CHANGELOG.md updated ### Nice to Have (P2 - Medium Priority) - [x] Ansible Galaxy configuration fixed - [x] Weekly review completed - [x] Week 48 plan created --- ## Metrics Tracking | Metric | Start of Week | Target | Current | |--------|---------------|--------|---------| | VM Connectivity | 67% (2/3) | 100% (3/3) | ___ | | Git Operations | 0% (blocked) | 100% | ___ | | QEMU Agent Coverage | 33% (1/3) | 67% (2/3) | ___ | | Swap Coverage | 67% (2/3) | 100% (3/3) | ___ | | Docker Security Audit | 0% | 100% | ___ | | Documentation Current | 90% | 100% | ___ | --- ## Blockers and Risks ### Current Blockers - None at start of week ### Potential Risks 1. **derp VM console access issues** - Mitigation: Can rebuild VM if unrecoverable 2. **Git push issue requires Gitea server access** - Mitigation: Can work locally, push later 3. **Docker audit findings may require extensive remediation** - Mitigation: Document findings, plan Week 48 remediation 4. **Time constraints** - Mitigation: Focus on P0/P1, defer P2 if needed --- ## Daily Standup Template **What was completed yesterday:** - **What will be done today:** - **Blockers:** - **Updated Metrics:** - --- ## Related Documents - [IMPROVEMENT_PLAN.md](IMPROVEMENT_PLAN.md) - Overall improvement strategy - [TODO.md](TODO.md) - Project-wide task tracking - [ROADMAP.md](ROADMAP.md) - Long-term strategic plan - [CHANGELOG.md](CHANGELOG.md) - Version history --- **Week Start:** 2025-11-11 (Monday) **Week End:** 2025-11-17 (Sunday) **Review Date:** 2025-11-15 (Friday) **Next Planning:** 2025-11-18 (Monday) - Week 48