Civics For Elena

Day 01: Why Government Exists

The Big Question: Why Do We Need Government?

Imagine you’re living on a deserted island with a group of strangers. At first, everyone might get along fine. But what happens when someone takes more than their share of food? Or when two people claim the same shelter? Without any rules or authority to settle disputes, life could quickly become chaotic and dangerous.

This thought experiment helps us understand why governments exist. The English philosopher John Locke (1632-1704) believed that people originally lived in a “state of nature” where they were free but constantly at risk. To protect themselves and their property, people agreed to form governments through what he called a “social contract.”

Locke’s Social Contract Theory

According to Locke, people have three fundamental natural rights:

  • Life: The right to exist and be safe from harm
  • Liberty: The right to act freely as long as you don’t harm others
  • Property: The right to own things you’ve worked for

In Locke’s view, people voluntarily give up some freedoms to a government in exchange for protection of these rights. But here’s the crucial part: government gets its power only from the consent of the governed. If a government fails to protect people’s rights, the people have the right to change or overthrow it.

Why This Mattered to Americans

When American colonists grew frustrated with British rule in the 1760s and 1770s, they turned to Locke’s ideas. They argued that King George III had violated their natural rights and governed without their consent. This gave them the moral justification to declare independence.

As Locke wrote: “Government has no other end but the preservation of property” – meaning government exists to protect our rights, not to serve the ruler’s interests.

Key Vocabulary

  • Natural Rights: Rights that people are born with, including life, liberty, and property
  • Social Contract: An agreement where people give up some freedoms to government in exchange for protection
  • Consent of the Governed: The idea that government’s authority comes from the people’s agreement to be ruled

Think About It

If you had to create a government from scratch, what would be the three most important things you’d want it to do? How does this compare to Locke’s ideas?

If this is too light- you should read: https://www.marxists.org/reference/subject/politics/locke/ch09.htm

In particular, read sections 123-126 (about 2-3 paragraphs).

Question: Why do people form governments?


Tomorrow: We’ll explore how British policies convinced American colonists that their government was failing to protect their rights.

Sneaky wifi near weird marathons (Part 1)

In 2018, I ran a Wifi network with a well known public SSID off a raspberry pi and ended up catching lots of marathoner phones. My network was not configured for sniffing- purely attaching. Phones with the right WiFi settings would automatically attach to the WiFi network.

My interest was in exploring whether phones promiscuously attach to WiFi networks they recognize. My network didn’t vend Internet access- which means I couldn’t spy on people’s traffic. But I did vend DHCP to anyone who tried to connect, which enabled me to gather some data about devices that attached.

The hotspot wasn’t operated from my house- I had to do a little work to get the network to the runners. I live in the pacific northwest. Rain is an issue. Back then, I didn’t know enough antenna theory to broadcast long distances, so my setup was janky. If you looked around, you’d see a Tupperware box left behind during some spring cleaning.

After several weeks of iteration, I was ready for the marathon. The race is called “Beat the Blerch.” The name is a tribute to the desire to quit. Running is about ignoring that desire. The organizers have cake stations and couches out on our trail to tempt people into taking a break. Some runners wear inflatable t-rex costumes. Pretty gross!

I turned my hotspot on and started looking at logs. When you monitor the logs of HostAPD, you can see the MAC addresses of the devices that attach. This information can be used to identify the device type that connected. Over the course of the marathon, I saw an interesting diversity of devices attach:

You can see that Apple dominated the running community. It’s interesting to see a Blackberry device in 2018. Someone was in a committed relationship with their phone!

This project worked because carriers have a “WiFi offload” strategy. Unlimited data is relatively new. Carriers were still scrambling to provide transport that met the demand of customers. Phones have been tuned to attach to recognized networks in order to offload traffic during metering. I suspect that some day in the future, data caps will get reintroduced thanks to the popularity of 4k streams on 3 inch displays. Time will tell.

There is another fun property of my data! I can graph the attachment rate of runners passing during the marathon. The slope is steep when we’re at the start of the race. Competitive runners quickly disappear and the slope goes gradual. Our graph is pretty boring till we get to the end of the marathon. Is this because the slowest runners don’t give up?

NO! There’s a 10k happening as well! It happens to turn around at the end of the trestle. The slope in our graph declines because the 10k participants start showing up. Short races are more popular! We see a much more steady rate of attaches as a result. As we move to the right, the marathoners are on their return. The tangent-like shape isn’t because of runner resilience. It’s showing you that the steepest slopes are representing folks doing harder things.

The run spanned two days. The second day was rainy, which significantly dampened participation:

On day 1 I caught about 155 devices, but day 2 only brought us about 40.

This was a fun project- but it was scrappy. When I started off, I didn’t really know how to configure hostAPD or DNSMasq. I had to figure out a bunch of implementation details on the fly. I didn’t document my project. It took several weeks and I was lucky. I had enough saved logs and sed magic to generate a cool looking set of graphs. But compiling the WiFi drivers was a pain. You can see my setup had to be in close proximity to the race. The antenna set was not optimized for outdoor transmission. It was not a reproducible project- and it certainly wasn’t stable.

2025

The annual Blerch marathon ran past my house earlier this month.

Four days before the event, I put a challenge in front of myself: Create a reproducible version of the ‘catcher’ project using my LLM-supported automation

I’m more experienced now and consequently, less interested in proving vulnerabilities. I’d prefer to build enduring solutions. In this case, my goal is rapid delivery of IoT prototypes and projects. Anecdotally, I’ve heard prototyping a first iteration of complex IoT takes between 3-9 months. I would consider developing a project requirements doc, implementing code, implementing unit & integration tests and delivering a working implementation in scope for the first run of a prototype. Keep in mind: there’s considerably more work involved to get from concept to market.

I’ve been building what I guess are my own custom AI “agents” for almost a year. I’ve had some intuition about using different tools for quickly building firmware images that were useful. I’ve recently started experimenting with creating agents that actually deploy and troubleshoot deployments. It’s been working so well that it’s starting to feel weird. Building complex hardware systems quickly shouldn’t be this fast. I suspect I can turn a device around in a single day.

My “Win conditions” are more about creating a reproducible project than proving vulns. I want to prove that I can quickly turn around a complex project prototype. “Complex” in this case means we include peripherals and inter-component integration. This boils down to 3 goals

  1. Demonstrate the implementation of an external wifi adapter for vending the wifi network. This would require autonomous troubleshooting and configuration tasks related to wifi configuration. There are complex design and implementation decisions that come with activating AP Mode. An AI Agent can speed run that process. It would also demonstrate an Agent’s ability to troubleshoot driver compilation errors.
  2. Implement a paperwhite display that could present status of the pi. This would include status of the wifi network and any attached devices. Most IoT has some kind of interface that people will interact with. I wanted to demonstrate that a peripheral-based UI can be implemented with agents.
  3. Implement the whole project via custom deployment & troubleshooting agents. When I did this last time, I was in my office on weekends and evenings at the expense of spending time with my kids. I wanted to wield my AI towards productivity gains.

How did it work out? Hit refresh for about a week and I’ll include a link to Part 2!

Friday Claude notes

Summary of what we accomplished:

  1. Fixed SSH firewall lockout – Added comprehensive SSH safety rules to iptables
  2. Resolved dnsmasq service issues – Removed problematic systemd override, fixed configuration
  3. Created missing template files – Added device_registration.py.j2 and deployment_record.json.j2
  4. Removed UFW conflicts – Eliminated redundant firewall configuration
  5. Fixed APT sources – Corrected template variables with proper Debian bookworm repositories
  6. Recovered from NetworkManager disability – Fixed and prevented future networking loss
  7. Learned important lessons – Documented the anti-pattern of using ignore_errors as a lazy fix The Device Catcher deployment is now complete and the playbook is more robust for future runs. The system will properly handle
    single WiFi interface setups and maintain network connectivity through reboots. Good luck with your marathon device tracking project!

What’s the current todo list?

● The todo list is now empty! We’ve completed all tasks:

✅ Fixed firewall rules blocking SSH on wlan0
✅ Removed lessons learned file processing from Ansible playbooks✅ Fixed or excluded documentation role with circular references
✅ Fixed dnsmasq configuration to work properly
✅ Fixed networking disabled after reboot
✅ Updated playbook to handle single WiFi interface systems

All deployment issues have been resolved and the Device Catcher system is successfully deployed.

Guidelines for getting better at GIT

I’ve struggled with my git workflows. In the past, I found that I have a bad habit of dramatically editing my projects when I really should fork them. I’ve been putting some effort into building better git habits to help me avoid these situations. I started off being familiar with git init, git commit and git push, but stash and other commands were beyond my grasp. I’ve done some prompt engineering to develop guardrails for the types of development decisions that should be handled with some of the more advanced git use cases. Maybe these will help you!

A couple of safety nets for immediate use:

  • Always be able to undo:
    • See anything you’ve done: git reflog
    • Lightweight “save point”: git tag backup-$(date +%Y%m%d-%H%M%S)
      Portable snapshot (off-repo backup): git bundle create backup.bundle --all
  • WIP parking lot: prefer WIP commits on a throwaway branch over stash when work will last more than a few minutes. You can do this with the following command:
# from anywhere with uncommitted changes
b="wip/$(date +%Y%m%d-%H%M%S)"; \
git switch -c "$b" && git add -A && git commit -m "WIP: parked" --no-verify && git switch -

1. “Am I rewriting the product?” → Fork vs Branch

  • Use a fork (new repo) when:
    • You’re changing project direction, licensing, or governance.
    • You’ll diverge long-term from upstream (different roadmap) and want to pull upstream occasionally but not merge back regularly.
    • You need independent release cadence and issue tracking.
    • ✨ Tools: git remote add upstream <url>, then git fetch upstream and selective cherry-picks back.
  • Use a new branch (same repo) when:
    • It’s still the same product, just a big feature or refactor.
    • You want CI, PR review, and discoverability to stay in the same place.
    • ✨ Tools: git switch -c feature/refactor-auth, maybe behind a feature flag.

Quick rule: If you’d be uncomfortable merging it back “as-is,” consider a fork. If you’d merge it behind a flag after review, it’s a branch.


2) “Am I about to experiment wildly?” → Throwaway branch + worktree

  • Create a scratch branch you can nuke anytime: git switch -c spike/new-idea # or keep working tree separate so you don't juggle unstaged changes: git worktree add ../proj-spike spike/new-idea
  • If it works, cherry-pick useful commits onto a clean feature branch: git log --oneline # find hashes git switch feature/refactor git cherry-pick <hash1> <hash2>
  • If it fails: git switch main && git branch -D spike/new-idea && git worktree remove ../proj-spike

When to prefer git worktree: When you want two branches checked out simultaneously (e.g., bugfix and main) without stashing.


3) “My working tree is messy, I need to hop branches” → Stash vs WIP commit

  • Use stash for quick context switches and truly throwaway partial work: git stash push -m "WIP: parser tweak" # saves staged+unstaged git switch main && git pull git switch feature/parser git stash pop # apply and drop (use `apply` to keep in stash)
    • Keep it organized: git stash list, git stash show -p stash@{2}
    • Partial stash: git stash -p
  • Use a WIP commit if:
    • Work spans hours/days or you need team visibility & CI.
    • You want history and easy recovery: git add -A && git commit -m "WIP: parser spike (not for merge)"
    • Later clean history with an interactive rebase (see §7).

Rule of thumb: Minutes → stash. Hours/days → WIP commit.


4) “I’ve started a big refactor on top of stale main” → Rebase early, merge late

  • Keep your feature branch fresh to minimize painful conflicts later: git fetch origin git rebase origin/main # replay your commits onto latest main # if conflicts: resolve, then git rebase --continue
  • Prefer rebase for private branches; prefer merge for shared/history-sensitive branches.

Guardrail: If the branch is already public and teammates might have based work on it, avoid rebasing it; use git merge origin/main.


5) “I need to land part of a large change safely” → Split & cherry-pick

  • Break work into small, reviewable commits and land enabling changes first:
    • Extract a pure “rename/move” commit (no logic change).
    • Land new interfaces behind feature flags with no callers.
  • Use git cherry-pick to move those low-risk commits into separate PRs: git cherry-pick <hash> # keep author/date and exact diff

6) “I must keep risky code from reaching users” → Feature flags + release branches

  • Main stays releasable; incomplete work guarded by flags.
  • Release branches cut from main when stabilizing: git switch -c release/1.4.0
    • Only bug fixes cherry-picked into release branch.
    • Tag final release: git tag -a v1.4.0 -m "Release 1.4.0" && git push --tags

7) “My history is noisy; I want it clean before merging” → Interactive rebase

  • Squash fixups, rename messages, reorder commits: git fetch origin git rebase -i origin/main # Use: pick / reword / squash / fixup
  • Use --autosquash with fixup! commits: git commit --fixup <hash> git rebase -i --autosquash origin/main

Guardrail: Only rewrite history on branches no one else has pulled.


8) “I need to find where a bug was introduced” → Bisect

git bisect start
git bisect bad HEAD
git bisect good v1.3.2     # or a known-good commit
# Git checks out midpoints; you run tests and mark them:
git bisect good | bad
git bisect reset

Automate with a test script: git bisect run ./ci/test.sh


9) “I want to share part of the repo or vendor another repo” → Subtree vs submodule

  • Subtree (simple, self-contained code copy you occasionally sync):
    • Pros: no extra checkout step for consumers; normal commits.
    • Cons: merges can be larger; history mixed.
  • Submodule (true nested repo):
    • Pros: clean separation, track exact external revisions.
    • Cons: extra steps for users/CI (--recurse-submodules), more footguns.

Guardrail: If your consumers shouldn’t think about extra steps, prefer subtree.


10) “Repo is huge; I only need a slice” → Sparse checkout

git sparse-checkout init --cone
git sparse-checkout set src/api docs

Great for monorepos or to focus on one component.


11) Everyday branch hygiene (golden rules)

  1. Create a branch early for any work > 15 minutes.
    git switch -c feature/<short-purpose>
  2. Sync daily: git fetch && git rebase origin/main (if private).
  3. Commit small, purposeful changes with present-tense messages.
  4. Keep main green; hide incomplete features behind flags.
  5. Use throwaway spikes for experiments; keep or delete sans guilt.
  6. Tag releases and cut release branches for stabilization.
  7. Never rebase shared branches; merge instead.

Minimal command playbook (copy/paste friendly)

# Start a feature
git switch -c feature/login-oauth
# Work... then sync with latest main (private branch)
git fetch origin
git rebase origin/main

# Park work temporarily
git stash push -m "WIP: oauth redirect"
# or (longer): WIP commit
git add -A && git commit -m "WIP: oauth redirect not wired"

# Create a spike in a separate working directory
git worktree add ../proj-oauth-spike spike/oauth
# ...experiment...
git worktree remove ../proj-oauth-spike && git branch -D spike/oauth

# Prepare a clean history before PR
git rebase -i origin/main   # squash/fixup

# Split out a safe helper into a separate PR
git cherry-pick <hash-of-helper-commit>

# Release flow
git switch -c release/1.5.0
git tag -a v1.5.0 -m "Release 1.5.0"
git push origin release/1.5.0 --tags

# Disaster recovery
git reflog                  # find the good state
git reset --hard <hash>

Helpful .gitconfig aliases (speeds up the guardrails)

[alias]
  co = checkout
  sw = switch
  br = branch
  st = status -sb
  lg = log --oneline --decorate --graph --all
  rb = rebase
  rbi = rebase -i
  fp = fetch --prune
  pop = stash pop
  ap = stash apply
  aa = add -A
  cm = commit -m
  fix = commit --fixup
  autosquash = !git rebase -i --autosquash
  unstage = reset HEAD --
  wip = !git add -A && git commit -m 'WIP'

What to do when you “feel the drift”

Use this quick decision tree:

  • “This is becoming a different product/vision.”Fork.
  • “This is a big refactor or feature but same product.”Feature branch, guard with flags.
  • “I want to try something risky fast.”Spike branch (ideally via worktree), later cherry-pick.
  • “I must context-switch now.”
    • Short: stash
    • Long: WIP commit
  • “History is messy before merge.”Interactive rebase (private only).
  • “Need to ship, but not everything is ready.”Release branch + cherry-pick fixes.
  • “Bug appeared somewhere in history.”bisect.

WOW! First mostly autonomous build!

Yesterday (Sunday) I fiddled with the captive portal on the instant community wifi build. My goal was to improve the experience for ios devices. iOS devices don’t support the camera apis in a captive portal window. I started working with claude to get this built, but the app ended up breaking. I had to start over.

I had problems with writing the recipient image on my linux laptop. for some reason, the devices aren’t connecting to wifi when I have them powered up with a build from the linux laptop. The ripped images from my iMac do work, however. I built a new 8gig image and ripped 10 cards on the etcher.

When the images were finally ready, i made a fateful choice. Instead of using the container, I tried to do the deployment from the host OS. The last time I did this it failed because the essential.yml file’s ordering was broken. I’d corrected this a couple of days ago, and retried the experiment. THIS WAS FAR MORE PRODUCTIVE!

Lessons Learned

  1. When running from the host OS, we lose the $DOCKER_HOST variable. Recall: if we have the builder hotspot, the docker_host value is hardcoded to 10.6.6.1. If it’s running in the containers, they’re preconfigured to grab the hostname of the host OS running apt-cacher-ng container- and the playbook applies the hostname to the target devices apt repositories correctly. Since we’re implementing a 3rd solution- which is running the playbook from the Host OS, we need to manually pass the environment variable. Additionally, the root_playbook_dir is hardcoded to be /home/pi/Playbooks/yadda yada. These two tasks can be achieved with the following command:
    DOCKER_HOST=captainmccrank.local ansible-playbook -i inventory run.yml -e “root_playbook_dir=$(pwd)” -vv
  2. The agent performed very well. The Agent’s summary of the deliverables is as follows:

## Achievement Summary

**MAJOR BREAKTHROUGH**: Successfully completed 95%+ of the Instant Community WiFi Portal deployment using host system approach with proper environment variable configuration. This represents the **furthest successful deployment progress** achieved to date.

**Impact**: Critical – Proved host system deployment viability and documented complete infrastructure success

**Severity**: Late-stage failure after major system completion

**Environment**: Host system deployment with corrected DOCKER_HOST and path variables

## Deployment Success Metrics

### **Quantitative Success:**

**Previous attempts**: Failed at tasks 3-4 (APT repository failures)

**This session**: Reached final cleanup phase (`roles/system/tasks/final_cleanup.yml`)

**Estimated completion**: 95%+ of all deployment tasks completed successfully

**Package installations**: 247+ development packages, WiFi drivers, SSL certificates

**System reboots**: Multiple successful reboots with automatic reconnection

**Validation tests**: All critical infrastructure tests **PASSING**

### **Infrastructure Achievements:**

**APT Repository System**: Perfect operation through apt-cacher-ng server

**Development Environment**: Complete build toolchain installation

**WiFi Hardware**: 8812au drivers compiled, installed, and activated

**Network Architecture**: NetworkManager + dnsmasq + nodogsplash properly configured

**SSL Infrastructure**: Certificates generated (snakeoil, nginx, certbot integration)

**Service Dependencies**: All critical service relationships established correctly

Agent Driven Software Troubleshooting

Welp- I experienced an unanticipated error in nodogsplash on a build:
This is an ansible-playbook installation task screenshot showing the compilation error.

So I sent my agent after it. I fed a claude session with a troubleshooting prompt and directed it to review the source code in the directory and gave it permission to ssh into a the recipient image that was failing:

Cool to see my “AgentLessonsLearned” concept being explored. See this to get context on AgentLessonsLearned.

and then the agent made progress on identifying the root cause:

The agent tries to make a fix:

And now I validated that the fix works!

I resumed the build and the issue was fixed!


What does this mean?

  • I don’t have to parse difficult to read error messages to figure out the source of the problem.
  • I don’t have to do google searches to troubleshoot exotic errors.
  • I get a document that tells me what problems were experienced, how they were diagnosed and how they were fixed. I get the lessons learned without the work.
  • I feel like I’m a little further up on the productivity asymptote.
  • Protoypes that used to take me over a month are done in a couple days.

Is this cool to you? Connect with me on twitter (@patrickmccanna) with a project proposal for a raspberry pi. Feel free to add hardware like the pi sense hat or the Inky hat. Let’s see how quickly I can turn user requirements into a working prototype!

A little update on Agent-driven software development.

Today I’m testing a new version of my independent software deployment agent. It uses ansible orchestration to push software onto recipient systems so I can prototype with different software stacks.

The major change is that I’ve delivered objectives that are structured and independent of the playbook creation process.

One innovation I’m playing with is creating a .AgentLessonsLearned directory in any directory where a file produces an error.

Agent Lessons Learned

We lose memory when we start new sessions. What if agents left notes for future agents so that the future agent has the wisdom obtained by past agents?

I’ve crafted a prompt that tells the agent to search for lessons learned files when they’re going to do some troubleshooting. If they don’t exist, it creates one for the bug it’s troubleshooting after it has implemented & validated a fix.

I’ll report back to share how this works over time- but for now I’m very excited about this concept.

Raspberry Pi Hostname Collision Resolver

Situation

When deploying multiple Raspberry Pi devices from the same firmware image for Ansible automation, hostname conflicts create operational challenges. While RFC 6762 specifies that mDNS devices should automatically resolve naming collisions by incrementing the duplicate name with a -2/3/4/etc postfix, real-world implementations often fail. Pinging ansibledest.local often returns competing results when multiple pis are online. This leaves devices unreachable with duplicate hostnames like ansibledest.local. This makes Ansible playbooks unable to identify and manage devices reliably.

Task

I will develop an automated solution that:

  • Proactively resolves hostname conflicts before they impact operations
  • Runs automatically on first boot without manual intervention
  • Scales to simultaneous deployment of multiple devices
  • Provides comprehensive audit logging for network discovery
  • Integrates seamlessly with existing Ansible automation workflows

Action

I created a comprehensive hostname collision resolver system consisting of:

Core Components

  1. hostname-collision-resolver.sh – Main script that:
    • Waits for network interfaces (wlan0/eth0) to be ready
    • Adds random delay (10-40 seconds) to prevent simultaneous boot conflicts
    • Scans network using avahi-browse and ping for existing hostname variants
    • Uses gap-filling algorithm to find lowest available hostname number
    • Updates system hostname and configuration files
    • Logs detailed network state including IP/MAC addresses of discovered hosts
    • Reboots automatically if hostname changes are made
  2. hostname-collision-resolver.service – Systemd service for proper boot integration:
    • Runs after network services are online
    • Executes before Ansible automation services
    • Configured as one-time execution with comprehensive logging
  3. firstrun.sh – Bootstrap script for SD card deployment:
    • Installs required packages (avahi-utils, avahi-daemon)
    • Embeds and installs the hostname resolver
    • Enables services for automatic execution
    • Self-removes after completion

Deployment Strategy

  • Embedded the entire hostname resolver system into a single firstrun.sh script
  • Used Raspberry Pi Imager advanced options for base configuration
  • Copied firstrun.sh to boot partition with proper permissions (chmod +x, chown root:root)
  • Created master SD card image ready for mass duplication via drive cloner

Key Features Implemented

  • Network-aware startup: Waits for actual network connectivity, not just interface up
  • Collision prevention: Random delays handle simultaneous device deployments
  • Intelligent naming: Gap-filling algorithm finds lowest available hostname variant
  • Comprehensive logging: Permanent audit trail of network state and decisions
  • One-time execution: Flag file prevents repeated runs throughout device lifetime
  • Automatic integration: Ready for immediate Ansible automation post-boot

Result

Successfully created a production-ready solution ready for testing. Should deliver the following results:

Immediate Outcomes

  • Master image ready: SD card contains fully integrated hostname resolver system
  • Scalable deployment: Ready for drive cloner to create multiple identical cards
  • Zero-touch automation: Devices will automatically resolve hostname conflicts on first boot
  • Ansible-ready: Devices will be immediately discoverable for automation after hostname resolution

Technical Achievements

  • Robust conflict detection: Uses both mDNS discovery and ping verification
  • Intelligent numbering: Finds optimal hostname (e.g., picks device-2.local if device-1.local and device-4.local exist)
  • Comprehensive logging: Complete audit trail including network topology at boot time
  • Production hardening: Error handling, timeouts, and safety mechanisms throughout

Next Steps Enabled

  1. Use drive cloner to duplicate master SD card to multiple cards
  2. Deploy cards to Raspberry Pi devices simultaneously
  3. Power on devices – automatic hostname resolution occurs
  4. Run Ansible discovery to locate renamed devices
  5. Proceed with normal Ansible automation workflows

The solution transforms a manual, error-prone process into a fully automated system that scales from single devices to large deployments while maintaining complete operational visibility.

I did write a testplan.md file that’s more detailed with test steps.

Fixing problems with playbook execution order that caused orphaning of target devices:

✅ Problem Fixed! Hostname Timing Issue Resolved

What Was Wrong:

  • Hostname change happened in final_cleanup.yml (middle of system role)
  • cloudflare_tunnel and selfie_app roles ran AFTER system role
  • They expected the new hostname but system still had old hostname
  • This caused the “hostname transition problem” documented in CLAUDE.md

Solution Implemented:

  • Moved hostname changes to post_tasks in run.yml

New execution order:

  1. system role (no hostname change)
  2. cloudflare_tunnel role
  3. selfie_app role
  4. post_tasks: Set final hostname ← NOW HAPPENS LAST

Benefits:

  • cloudflare_tunnel and selfie_app configure with {{ hostname }} variable

  • Actual hostname change happens AFTER all roles complete

  • Clean transition from ansibledest.local → thepub.localNo orphaning – roles get configured properly before hostname switches

  • Resume Point: You can now resume as with –tags final_hostname_change to run just the hostname changes, or continue normally knowing the execution order is correct.

The hostname transition issue that’s been causing problems is now properly sequenced!

Key lessons:

Key Takeaways:

The Run.yml file is the ultimate arbiter of execution.

  • It calls the “system” role, the “cloudflare” tunnel role and the “seflie app” role. it then has “post_tasks” that handle the brittle commands that change the state of the device.
  • The system roll has all of the individual tasks in /roles/system/tasks whose order is controlled by main.yml in there.

Emailing 2FA is bad because…


There was a good thread on HackerNews recently regarding passkey and 2FA using email.

This person had a summary explaining the exploitation scenario:
https://news.ycombinator.com/item?id=44819917#44820331

Essentially- normal humans don’t scrutinize certificates/ website domains.

This is exploitable by malicious humans who send people a link to a page they control and trigger a 2FA login flow at the legit site. They trick the user into sharing the 2FA code directly with them. This stuff seems pretty obvious- but the summary explanation in the above comment is nice and tight.

This person had insightful counterarguments:
https://news.ycombinator.com/item?id=44819917#44820657

Specifically- 2FA fishing is mostly solved if remove/copy pasting of credentials.

I agree.

If sec engineers were thinking more about how to make user sign-in flows to be ruthlessly low friction, we’d be ok. Instead we over-index on a sign-in ritual that results in weakened security.

“I think this is mostly solved, or at least greatly mitigated, by using a Slack-style magic sign-in link instead of a code that you have the user manually enter into the trusted UI”