Reproducible Sneaky Wifi Part 2

Last week I left you with a nail-biter. I ran a sneaky wifi network near a weird marathon in 2018 and I captured close to 200 devices. I reproduced the experiment this fall- how’d it go in 2025? Terrible in some regards, but awesome in terms of prototyping acceleration. An experiment that took 2 months in 2018 took me 4 days in 2025.

Time lapse of runners

The Bad

In the 2025 experiment, I caught a grand total of 18 devices.

Does this mean mobile phones are more secure? Was it the exact same experiment? No!

Low Participant Turnout: My Wifi Hotspot was active starting at 7 am. The marathon was scheduled to start at 8 am. We didn’t see a single runner till ~9:15 am. When runners did start arriving, the quantity of runners was limited compared to past years. The 2018 marathon spanned two days. The race was only one day this year. The participant cohort of runners was significantly smaller than in years past.

Bad SSID choices: This attack depends upon your ability to anticipate a wifi ssid that your targets have an affinity for. The wifi SSID i used in 2018 wasn’t going to work because it has been deprecated. I went with “Starbucks WiFi” initially, but this only caught 2 devices. The lack of “Starbucks WiFi” tuned devices is an interesting indicator of how times have changed. It used to be that mobile phone owners needed to attach to wifi to use email/browse the web with their phones. This was because cellular networks did not have unlimited data, and so you either ran out of data for the month or you were hit with a large cellphone bill if you used cellular for data. People used to go to coffee shops to “work” on their phones and laptops. Now you’re really there to socialize or caffinate. I also wonder if Starbucks’ popularity has declined. In the last 10 years, I’ve only drank Starbucks out of necessity.

So after a couple of hours of watching only 2 attaches, I yielded to temptation and changed the SSID to “xfinitywifi.” The xfinitywifi ssid is a controversial wifi network vended by Comcast, exclusive to comcast customers.

You can use wigle.net to see the most popular active SSIDs:

Changing to use xfinitywifi felt like desperation! Comcast does not have much presence in Snoqualmie valley. I reasoned that most of the runners were probably coming from cities where Comcast is dominant- e.g. Bellevue, Issaquah and Redmond. I managed to catch 16 more devices over the next 4 hours. The count was so small I didn’t bother to keep my logs. But here are some screenshots to give you a feel for what I experienced:

Raspberry Pi with AWUS036ACH WiFi adapter & home built dual yagis
Paperwhite display
Custom status monitor


This experiment agitated me greatly. I know there are still problems related to wifi offloading- but I only caught 18 devices. I didn’t spend enough time researching SSIDs and the end result was low attaches.

Despite my grumpiness about the data, this experiment was a major success.

Did you notice the external Wifi Adapter above? How about the nice Paperwhite display presenting status of the device. My monitoring script was far more sophisticated than a tail of a hostapd logs. I didn’t have to write this code or fiddle with hostapd configurations or nftables rules. I didn’t have to find the right kernel headers and compile wifi drivers. I didn’t have to flex my terrible design skills. I knew the features I wanted and I gave my agents direction on how to deploy the features.

I was able to successfully produce an IoT prototype with complex hardware dependencies in 4 days.

The Good:


I implemented a working prototype of a custom wifi hotspot with a paperwhite display, an external wifi adapter & a Yagi wifi antenna in 4 days.

Methodology

Claude Code & Pre-prompting strategies

I leveraged Claude code for most of my work. I created a working directory invoked Claude with a 1,500 line pre-prompt for requirements analysis and planning. This pre-prompt produced ansible playbooks that take advantage of my Firmware Development caching containers. The pre-prompt addresses topics related to Requirement Exploration, Architecture Safety, Known Good Deployment Patterns, Domain Specific Knowledge and Documentation & Maintenance. I’ve been iterating on this prompt for about 6 weeks through applications on about 5 other projects. I constructed a separate pre-prompt of 166 lines that handles Deploying code, Code analysis, system access, frameworks for deploying code & systematic troubleshooting and refactoring the code to address discovered defects.

Development Loop

The normal lifecycle of developing a reliable working prototype seems to take about 3-4 build cycles.

My agent would serially perform the following operations during the build process:

  • Initiate a build
  • Discover defects during build process
  • Troubleshoot them on the recipient system
  • Make corrections to the original build playbooks
  • Resume the build at the corrected defect
  • Complete a working build.

If the build experienced errors, I waited to get a complete build and then started again on fresh recipient image. I kept seeing improvements until the build process ran reliably without errors.

Throttling

My biggest challenge was rate limiting:

My agents hit my 5 hour Anthropic token limit on the $20 plan in about 2 hours. During this 4 day period, I scheduled my day around throttling limits. I tried to make sure that some building happened while I slept. Two days before the marathon, I upgraded to the $200 plan. My iOS screen time report was 1 hour during that week.

I didn’t have to write any code to make this project work. That’s not to suggest that anybody could do this experiment. I was successful because I knew exactly what software libraries I wanted to see deployed and how I wanted them tuned. I regularly had to intervene when the agents proposed bad plans. But I’m now approaching a point where my single board computer development processes are automated. It felt like having a mildly competent apprentice.

Over the last few years, I’ve been able to build a range of Raspberry Pi Prototypes. All of them were a labor of large effort. My build process made prototyping faster, but it still took me several months to work out the details of various project:

Making reproducible builds was expensive and typically took 2-3 months. I’d steal spare time on evenings or weekends to work on projects. The greatest costs come from the testing & validation needed to create durable, reproducible firmware images.  With a combination of tasteful pre-prompts, custom agents & an automated build process I can turn around reproducible firmware builds in less than a week.



1. Software & Hardware Testing Houses

You need repeatable, cost-effective environments to validate new software and hardware under real-world conditions, but setting up and tearing down test rigs is slow, inconsistent, and prone to configuration drift.


2. Managed Security Service Providers (MSSPs)
You need deployable, trusted network nodes inside customer environments for monitoring, detection, and incident response — but sourcing, configuring, and reproducing reliable hardware platforms across dozens of clients eats up valuable engineering time.


3. IoT Manufacturers

You want to prove out your next device concept quickly, with working prototypes that demonstrate connectivity, edge processing, and security — but your in-house teams are bottle-necked by long development cycles and unpredictable integration issues.

4. Agricultural & Rural Networking Providers

You need rugged, affordable devices to extend connectivity into fields, barns, and remote communities — but commercial gear is overpriced, hard to customize, and not designed for rapid prototyping or deployment in challenging environments.

5. Telecom & Network Operators
You need cost-effective, rapidly deployable edge devices for monitoring network performance, testing bandwidth in rural or urban environments, or validating new customer premises equipment—but traditional hardware procurement cycles are too slow and expensive.

6. Smart City & Infrastructure Providers
You’re deploying IoT devices to manage traffic lights, utilities, or environmental sensors across a city, but you need quick, low-cost prototypes to validate integrations before scaling to tens of thousands of units.

7. Educational & Research Institutions
Your students or researchers need reproducible, documented environments for experimentation with hardware, networking, or AI, but setting up reliable builds consumes valuable teaching and research time.

8. Healthcare & MedTech Device Innovators
You’re exploring connected health devices—remote patient monitors, smart diagnostic tools, or secure data collection endpoints—but you need a prototype that proves functionality while meeting strict reliability and security requirements.

9. Defense & Public Safety Contractors
You’re tasked with rapidly developing ruggedized, secure edge devices for field communication, surveillance, or sensor fusion, but your internal teams can’t keep pace with the prototyping demands.

10. Environmental & Energy Monitoring Firms
You need distributed, low-power devices to collect data in harsh or remote environments—forests, farms, offshore rigs, or mines—but your current prototypes fail due to durability or reproducibility issues.

11. Media & Event Production Companies
You want portable, reliable devices for live-streaming, crowd analytics, or on-site Wi-Fi provisioning at concerts and sporting events, but consumer gear isn’t flexible enough and enterprise hardware is overkill.

12. Transportation & Logistics Providers
You’re experimenting with fleet tracking, warehouse automation, or smart inventory systems, but you need a way to test edge hardware integrations quickly before committing to full-scale rollouts.

13. Industrial Automation & Robotics
You need controllers and monitoring systems for robots, conveyors, or factory IoT sensors, but the cost and time of custom PLCs and proprietary systems make it hard to experiment quickly.

14. Consultancies & Systems Integrators
You’re responsible for stitching together hardware and software for your clients, but you lack a streamlined way to spin up reproducible prototypes that demonstrate proof-of-concept value quickly and reliably.

Sneaky wifi near weird marathons (Part 1)

In 2018, I ran a Wifi network with a well known public SSID off a raspberry pi and ended up catching lots of marathoner phones. My network was not configured for sniffing- purely attaching. Phones with the right WiFi settings would automatically attach to the WiFi network.

My interest was in exploring whether phones promiscuously attach to WiFi networks they recognize. My network didn’t vend Internet access- which means I couldn’t spy on people’s traffic. But I did vend DHCP to anyone who tried to connect, which enabled me to gather some data about devices that attached.

The hotspot wasn’t operated from my house- I had to do a little work to get the network to the runners. I live in the pacific northwest. Rain is an issue. Back then, I didn’t know enough antenna theory to broadcast long distances, so my setup was janky. If you looked around, you’d see a Tupperware box left behind during some spring cleaning.

After several weeks of iteration, I was ready for the marathon. The race is called “Beat the Blerch.” The name is a tribute to the desire to quit. Running is about ignoring that desire. The organizers have cake stations and couches out on our trail to tempt people into taking a break. Some runners wear inflatable t-rex costumes. Pretty gross!

I turned my hotspot on and started looking at logs. When you monitor the logs of HostAPD, you can see the MAC addresses of the devices that attach. This information can be used to identify the device type that connected. Over the course of the marathon, I saw an interesting diversity of devices attach:

You can see that Apple dominated the running community. It’s interesting to see a Blackberry device in 2018. Someone was in a committed relationship with their phone!

This project worked because carriers have a “WiFi offload” strategy. Unlimited data is relatively new. Carriers were still scrambling to provide transport that met the demand of customers. Phones have been tuned to attach to recognized networks in order to offload traffic during metering. I suspect that some day in the future, data caps will get reintroduced thanks to the popularity of 4k streams on 3 inch displays. Time will tell.

There is another fun property of my data! I can graph the attachment rate of runners passing during the marathon. The slope is steep when we’re at the start of the race. Competitive runners quickly disappear and the slope goes gradual. Our graph is pretty boring till we get to the end of the marathon. Is this because the slowest runners don’t give up?

NO! There’s a 10k happening as well! It happens to turn around at the end of the trestle. The slope in our graph declines because the 10k participants start showing up. Short races are more popular! We see a much more steady rate of attaches as a result. As we move to the right, the marathoners are on their return. The tangent-like shape isn’t because of runner resilience. It’s showing you that the steepest slopes are representing folks doing harder things.

The run spanned two days. The second day was rainy, which significantly dampened participation:

On day 1 I caught about 155 devices, but day 2 only brought us about 40.

This was a fun project- but it was scrappy. When I started off, I didn’t really know how to configure hostAPD or DNSMasq. I had to figure out a bunch of implementation details on the fly. I didn’t document my project. It took several weeks and I was lucky. I had enough saved logs and sed magic to generate a cool looking set of graphs. But compiling the WiFi drivers was a pain. You can see my setup had to be in close proximity to the race. The antenna set was not optimized for outdoor transmission. It was not a reproducible project- and it certainly wasn’t stable.

2025

The annual Blerch marathon ran past my house earlier this month.

Four days before the event, I put a challenge in front of myself: Create a reproducible version of the ‘catcher’ project using my LLM-supported automation

I’m more experienced now and consequently, less interested in proving vulnerabilities. I’d prefer to build enduring solutions. In this case, my goal is rapid delivery of IoT prototypes and projects. Anecdotally, I’ve heard prototyping a first iteration of complex IoT takes between 3-9 months. I would consider developing a project requirements doc, implementing code, implementing unit & integration tests and delivering a working implementation in scope for the first run of a prototype. Keep in mind: there’s considerably more work involved to get from concept to market.

I’ve been building what I guess are my own custom AI “agents” for almost a year. I’ve had some intuition about using different tools for quickly building firmware images that were useful. I’ve recently started experimenting with creating agents that actually deploy and troubleshoot deployments. It’s been working so well that it’s starting to feel weird. Building complex hardware systems quickly shouldn’t be this fast. I suspect I can turn a device around in a single day.

My “Win conditions” are more about creating a reproducible project than proving vulns. I want to prove that I can quickly turn around a complex project prototype. “Complex” in this case means we include peripherals and inter-component integration. This boils down to 3 goals

  1. Demonstrate the implementation of an external wifi adapter for vending the wifi network. This would require autonomous troubleshooting and configuration tasks related to wifi configuration. There are complex design and implementation decisions that come with activating AP Mode. An AI Agent can speed run that process. It would also demonstrate an Agent’s ability to troubleshoot driver compilation errors.
  2. Implement a paperwhite display that could present status of the pi. This would include status of the wifi network and any attached devices. Most IoT has some kind of interface that people will interact with. I wanted to demonstrate that a peripheral-based UI can be implemented with agents.
  3. Implement the whole project via custom deployment & troubleshooting agents. When I did this last time, I was in my office on weekends and evenings at the expense of spending time with my kids. I wanted to wield my AI towards productivity gains.

How did it work out? Hit refresh for about a week and I’ll include a link to Part 2!

Friday Claude notes

Summary of what we accomplished:

  1. Fixed SSH firewall lockout – Added comprehensive SSH safety rules to iptables
  2. Resolved dnsmasq service issues – Removed problematic systemd override, fixed configuration
  3. Created missing template files – Added device_registration.py.j2 and deployment_record.json.j2
  4. Removed UFW conflicts – Eliminated redundant firewall configuration
  5. Fixed APT sources – Corrected template variables with proper Debian bookworm repositories
  6. Recovered from NetworkManager disability – Fixed and prevented future networking loss
  7. Learned important lessons – Documented the anti-pattern of using ignore_errors as a lazy fix The Device Catcher deployment is now complete and the playbook is more robust for future runs. The system will properly handle
    single WiFi interface setups and maintain network connectivity through reboots. Good luck with your marathon device tracking project!

What’s the current todo list?

● The todo list is now empty! We’ve completed all tasks:

✅ Fixed firewall rules blocking SSH on wlan0
✅ Removed lessons learned file processing from Ansible playbooks✅ Fixed or excluded documentation role with circular references
✅ Fixed dnsmasq configuration to work properly
✅ Fixed networking disabled after reboot
✅ Updated playbook to handle single WiFi interface systems

All deployment issues have been resolved and the Device Catcher system is successfully deployed.

Guidelines for getting better at GIT

I’ve struggled with my git workflows. In the past, I found that I have a bad habit of dramatically editing my projects when I really should fork them. I’ve been putting some effort into building better git habits to help me avoid these situations. I started off being familiar with git init, git commit and git push, but stash and other commands were beyond my grasp. I’ve done some prompt engineering to develop guardrails for the types of development decisions that should be handled with some of the more advanced git use cases. Maybe these will help you!

A couple of safety nets for immediate use:

  • Always be able to undo:
    • See anything you’ve done: git reflog
    • Lightweight “save point”: git tag backup-$(date +%Y%m%d-%H%M%S)
      Portable snapshot (off-repo backup): git bundle create backup.bundle --all
  • WIP parking lot: prefer WIP commits on a throwaway branch over stash when work will last more than a few minutes. You can do this with the following command:
# from anywhere with uncommitted changes
b="wip/$(date +%Y%m%d-%H%M%S)"; \
git switch -c "$b" && git add -A && git commit -m "WIP: parked" --no-verify && git switch -

1. “Am I rewriting the product?” → Fork vs Branch

  • Use a fork (new repo) when:
    • You’re changing project direction, licensing, or governance.
    • You’ll diverge long-term from upstream (different roadmap) and want to pull upstream occasionally but not merge back regularly.
    • You need independent release cadence and issue tracking.
    • ✨ Tools: git remote add upstream <url>, then git fetch upstream and selective cherry-picks back.
  • Use a new branch (same repo) when:
    • It’s still the same product, just a big feature or refactor.
    • You want CI, PR review, and discoverability to stay in the same place.
    • ✨ Tools: git switch -c feature/refactor-auth, maybe behind a feature flag.

Quick rule: If you’d be uncomfortable merging it back “as-is,” consider a fork. If you’d merge it behind a flag after review, it’s a branch.


2) “Am I about to experiment wildly?” → Throwaway branch + worktree

  • Create a scratch branch you can nuke anytime: git switch -c spike/new-idea # or keep working tree separate so you don't juggle unstaged changes: git worktree add ../proj-spike spike/new-idea
  • If it works, cherry-pick useful commits onto a clean feature branch: git log --oneline # find hashes git switch feature/refactor git cherry-pick <hash1> <hash2>
  • If it fails: git switch main && git branch -D spike/new-idea && git worktree remove ../proj-spike

When to prefer git worktree: When you want two branches checked out simultaneously (e.g., bugfix and main) without stashing.


3) “My working tree is messy, I need to hop branches” → Stash vs WIP commit

  • Use stash for quick context switches and truly throwaway partial work: git stash push -m "WIP: parser tweak" # saves staged+unstaged git switch main && git pull git switch feature/parser git stash pop # apply and drop (use `apply` to keep in stash)
    • Keep it organized: git stash list, git stash show -p stash@{2}
    • Partial stash: git stash -p
  • Use a WIP commit if:
    • Work spans hours/days or you need team visibility & CI.
    • You want history and easy recovery: git add -A && git commit -m "WIP: parser spike (not for merge)"
    • Later clean history with an interactive rebase (see §7).

Rule of thumb: Minutes → stash. Hours/days → WIP commit.


4) “I’ve started a big refactor on top of stale main” → Rebase early, merge late

  • Keep your feature branch fresh to minimize painful conflicts later: git fetch origin git rebase origin/main # replay your commits onto latest main # if conflicts: resolve, then git rebase --continue
  • Prefer rebase for private branches; prefer merge for shared/history-sensitive branches.

Guardrail: If the branch is already public and teammates might have based work on it, avoid rebasing it; use git merge origin/main.


5) “I need to land part of a large change safely” → Split & cherry-pick

  • Break work into small, reviewable commits and land enabling changes first:
    • Extract a pure “rename/move” commit (no logic change).
    • Land new interfaces behind feature flags with no callers.
  • Use git cherry-pick to move those low-risk commits into separate PRs: git cherry-pick <hash> # keep author/date and exact diff

6) “I must keep risky code from reaching users” → Feature flags + release branches

  • Main stays releasable; incomplete work guarded by flags.
  • Release branches cut from main when stabilizing: git switch -c release/1.4.0
    • Only bug fixes cherry-picked into release branch.
    • Tag final release: git tag -a v1.4.0 -m "Release 1.4.0" && git push --tags

7) “My history is noisy; I want it clean before merging” → Interactive rebase

  • Squash fixups, rename messages, reorder commits: git fetch origin git rebase -i origin/main # Use: pick / reword / squash / fixup
  • Use --autosquash with fixup! commits: git commit --fixup <hash> git rebase -i --autosquash origin/main

Guardrail: Only rewrite history on branches no one else has pulled.


8) “I need to find where a bug was introduced” → Bisect

git bisect start
git bisect bad HEAD
git bisect good v1.3.2     # or a known-good commit
# Git checks out midpoints; you run tests and mark them:
git bisect good | bad
git bisect reset

Automate with a test script: git bisect run ./ci/test.sh


9) “I want to share part of the repo or vendor another repo” → Subtree vs submodule

  • Subtree (simple, self-contained code copy you occasionally sync):
    • Pros: no extra checkout step for consumers; normal commits.
    • Cons: merges can be larger; history mixed.
  • Submodule (true nested repo):
    • Pros: clean separation, track exact external revisions.
    • Cons: extra steps for users/CI (--recurse-submodules), more footguns.

Guardrail: If your consumers shouldn’t think about extra steps, prefer subtree.


10) “Repo is huge; I only need a slice” → Sparse checkout

git sparse-checkout init --cone
git sparse-checkout set src/api docs

Great for monorepos or to focus on one component.


11) Everyday branch hygiene (golden rules)

  1. Create a branch early for any work > 15 minutes.
    git switch -c feature/<short-purpose>
  2. Sync daily: git fetch && git rebase origin/main (if private).
  3. Commit small, purposeful changes with present-tense messages.
  4. Keep main green; hide incomplete features behind flags.
  5. Use throwaway spikes for experiments; keep or delete sans guilt.
  6. Tag releases and cut release branches for stabilization.
  7. Never rebase shared branches; merge instead.

Minimal command playbook (copy/paste friendly)

# Start a feature
git switch -c feature/login-oauth
# Work... then sync with latest main (private branch)
git fetch origin
git rebase origin/main

# Park work temporarily
git stash push -m "WIP: oauth redirect"
# or (longer): WIP commit
git add -A && git commit -m "WIP: oauth redirect not wired"

# Create a spike in a separate working directory
git worktree add ../proj-oauth-spike spike/oauth
# ...experiment...
git worktree remove ../proj-oauth-spike && git branch -D spike/oauth

# Prepare a clean history before PR
git rebase -i origin/main   # squash/fixup

# Split out a safe helper into a separate PR
git cherry-pick <hash-of-helper-commit>

# Release flow
git switch -c release/1.5.0
git tag -a v1.5.0 -m "Release 1.5.0"
git push origin release/1.5.0 --tags

# Disaster recovery
git reflog                  # find the good state
git reset --hard <hash>

Helpful .gitconfig aliases (speeds up the guardrails)

[alias]
  co = checkout
  sw = switch
  br = branch
  st = status -sb
  lg = log --oneline --decorate --graph --all
  rb = rebase
  rbi = rebase -i
  fp = fetch --prune
  pop = stash pop
  ap = stash apply
  aa = add -A
  cm = commit -m
  fix = commit --fixup
  autosquash = !git rebase -i --autosquash
  unstage = reset HEAD --
  wip = !git add -A && git commit -m 'WIP'

What to do when you “feel the drift”

Use this quick decision tree:

  • “This is becoming a different product/vision.”Fork.
  • “This is a big refactor or feature but same product.”Feature branch, guard with flags.
  • “I want to try something risky fast.”Spike branch (ideally via worktree), later cherry-pick.
  • “I must context-switch now.”
    • Short: stash
    • Long: WIP commit
  • “History is messy before merge.”Interactive rebase (private only).
  • “Need to ship, but not everything is ready.”Release branch + cherry-pick fixes.
  • “Bug appeared somewhere in history.”bisect.

WOW! First mostly autonomous build!

Yesterday (Sunday) I fiddled with the captive portal on the instant community wifi build. My goal was to improve the experience for ios devices. iOS devices don’t support the camera apis in a captive portal window. I started working with claude to get this built, but the app ended up breaking. I had to start over.

I had problems with writing the recipient image on my linux laptop. for some reason, the devices aren’t connecting to wifi when I have them powered up with a build from the linux laptop. The ripped images from my iMac do work, however. I built a new 8gig image and ripped 10 cards on the etcher.

When the images were finally ready, i made a fateful choice. Instead of using the container, I tried to do the deployment from the host OS. The last time I did this it failed because the essential.yml file’s ordering was broken. I’d corrected this a couple of days ago, and retried the experiment. THIS WAS FAR MORE PRODUCTIVE!

Lessons Learned

  1. When running from the host OS, we lose the $DOCKER_HOST variable. Recall: if we have the builder hotspot, the docker_host value is hardcoded to 10.6.6.1. If it’s running in the containers, they’re preconfigured to grab the hostname of the host OS running apt-cacher-ng container- and the playbook applies the hostname to the target devices apt repositories correctly. Since we’re implementing a 3rd solution- which is running the playbook from the Host OS, we need to manually pass the environment variable. Additionally, the root_playbook_dir is hardcoded to be /home/pi/Playbooks/yadda yada. These two tasks can be achieved with the following command:
    DOCKER_HOST=captainmccrank.local ansible-playbook -i inventory run.yml -e “root_playbook_dir=$(pwd)” -vv
  2. The agent performed very well. The Agent’s summary of the deliverables is as follows:

## Achievement Summary

**MAJOR BREAKTHROUGH**: Successfully completed 95%+ of the Instant Community WiFi Portal deployment using host system approach with proper environment variable configuration. This represents the **furthest successful deployment progress** achieved to date.

**Impact**: Critical – Proved host system deployment viability and documented complete infrastructure success

**Severity**: Late-stage failure after major system completion

**Environment**: Host system deployment with corrected DOCKER_HOST and path variables

## Deployment Success Metrics

### **Quantitative Success:**

**Previous attempts**: Failed at tasks 3-4 (APT repository failures)

**This session**: Reached final cleanup phase (`roles/system/tasks/final_cleanup.yml`)

**Estimated completion**: 95%+ of all deployment tasks completed successfully

**Package installations**: 247+ development packages, WiFi drivers, SSL certificates

**System reboots**: Multiple successful reboots with automatic reconnection

**Validation tests**: All critical infrastructure tests **PASSING**

### **Infrastructure Achievements:**

**APT Repository System**: Perfect operation through apt-cacher-ng server

**Development Environment**: Complete build toolchain installation

**WiFi Hardware**: 8812au drivers compiled, installed, and activated

**Network Architecture**: NetworkManager + dnsmasq + nodogsplash properly configured

**SSL Infrastructure**: Certificates generated (snakeoil, nginx, certbot integration)

**Service Dependencies**: All critical service relationships established correctly

Agent Driven Software Troubleshooting

Welp- I experienced an unanticipated error in nodogsplash on a build:
This is an ansible-playbook installation task screenshot showing the compilation error.

So I sent my agent after it. I fed a claude session with a troubleshooting prompt and directed it to review the source code in the directory and gave it permission to ssh into a the recipient image that was failing:

Cool to see my “AgentLessonsLearned” concept being explored. See this to get context on AgentLessonsLearned.

and then the agent made progress on identifying the root cause:

The agent tries to make a fix:

And now I validated that the fix works!

I resumed the build and the issue was fixed!


What does this mean?

  • I don’t have to parse difficult to read error messages to figure out the source of the problem.
  • I don’t have to do google searches to troubleshoot exotic errors.
  • I get a document that tells me what problems were experienced, how they were diagnosed and how they were fixed. I get the lessons learned without the work.
  • I feel like I’m a little further up on the productivity asymptote.
  • Protoypes that used to take me over a month are done in a couple days.

Is this cool to you? Connect with me on twitter (@patrickmccanna) with a project proposal for a raspberry pi. Feel free to add hardware like the pi sense hat or the Inky hat. Let’s see how quickly I can turn user requirements into a working prototype!

Raspberry Pi Hostname Collision Resolver

Situation

When deploying multiple Raspberry Pi devices from the same firmware image for Ansible automation, hostname conflicts create operational challenges. While RFC 6762 specifies that mDNS devices should automatically resolve naming collisions by incrementing the duplicate name with a -2/3/4/etc postfix, real-world implementations often fail. Pinging ansibledest.local often returns competing results when multiple pis are online. This leaves devices unreachable with duplicate hostnames like ansibledest.local. This makes Ansible playbooks unable to identify and manage devices reliably.

Task

I will develop an automated solution that:

  • Proactively resolves hostname conflicts before they impact operations
  • Runs automatically on first boot without manual intervention
  • Scales to simultaneous deployment of multiple devices
  • Provides comprehensive audit logging for network discovery
  • Integrates seamlessly with existing Ansible automation workflows

Action

I created a comprehensive hostname collision resolver system consisting of:

Core Components

  1. hostname-collision-resolver.sh – Main script that:
    • Waits for network interfaces (wlan0/eth0) to be ready
    • Adds random delay (10-40 seconds) to prevent simultaneous boot conflicts
    • Scans network using avahi-browse and ping for existing hostname variants
    • Uses gap-filling algorithm to find lowest available hostname number
    • Updates system hostname and configuration files
    • Logs detailed network state including IP/MAC addresses of discovered hosts
    • Reboots automatically if hostname changes are made
  2. hostname-collision-resolver.service – Systemd service for proper boot integration:
    • Runs after network services are online
    • Executes before Ansible automation services
    • Configured as one-time execution with comprehensive logging
  3. firstrun.sh – Bootstrap script for SD card deployment:
    • Installs required packages (avahi-utils, avahi-daemon)
    • Embeds and installs the hostname resolver
    • Enables services for automatic execution
    • Self-removes after completion

Deployment Strategy

  • Embedded the entire hostname resolver system into a single firstrun.sh script
  • Used Raspberry Pi Imager advanced options for base configuration
  • Copied firstrun.sh to boot partition with proper permissions (chmod +x, chown root:root)
  • Created master SD card image ready for mass duplication via drive cloner

Key Features Implemented

  • Network-aware startup: Waits for actual network connectivity, not just interface up
  • Collision prevention: Random delays handle simultaneous device deployments
  • Intelligent naming: Gap-filling algorithm finds lowest available hostname variant
  • Comprehensive logging: Permanent audit trail of network state and decisions
  • One-time execution: Flag file prevents repeated runs throughout device lifetime
  • Automatic integration: Ready for immediate Ansible automation post-boot

Result

Successfully created a production-ready solution ready for testing. Should deliver the following results:

Immediate Outcomes

  • Master image ready: SD card contains fully integrated hostname resolver system
  • Scalable deployment: Ready for drive cloner to create multiple identical cards
  • Zero-touch automation: Devices will automatically resolve hostname conflicts on first boot
  • Ansible-ready: Devices will be immediately discoverable for automation after hostname resolution

Technical Achievements

  • Robust conflict detection: Uses both mDNS discovery and ping verification
  • Intelligent numbering: Finds optimal hostname (e.g., picks device-2.local if device-1.local and device-4.local exist)
  • Comprehensive logging: Complete audit trail including network topology at boot time
  • Production hardening: Error handling, timeouts, and safety mechanisms throughout

Next Steps Enabled

  1. Use drive cloner to duplicate master SD card to multiple cards
  2. Deploy cards to Raspberry Pi devices simultaneously
  3. Power on devices – automatic hostname resolution occurs
  4. Run Ansible discovery to locate renamed devices
  5. Proceed with normal Ansible automation workflows

The solution transforms a manual, error-prone process into a fully automated system that scales from single devices to large deployments while maintaining complete operational visibility.

I did write a testplan.md file that’s more detailed with test steps.

Fixing problems with playbook execution order that caused orphaning of target devices:

✅ Problem Fixed! Hostname Timing Issue Resolved

What Was Wrong:

  • Hostname change happened in final_cleanup.yml (middle of system role)
  • cloudflare_tunnel and selfie_app roles ran AFTER system role
  • They expected the new hostname but system still had old hostname
  • This caused the “hostname transition problem” documented in CLAUDE.md

Solution Implemented:

  • Moved hostname changes to post_tasks in run.yml

New execution order:

  1. system role (no hostname change)
  2. cloudflare_tunnel role
  3. selfie_app role
  4. post_tasks: Set final hostname ← NOW HAPPENS LAST

Benefits:

  • cloudflare_tunnel and selfie_app configure with {{ hostname }} variable

  • Actual hostname change happens AFTER all roles complete

  • Clean transition from ansibledest.local → thepub.localNo orphaning – roles get configured properly before hostname switches

  • Resume Point: You can now resume as with –tags final_hostname_change to run just the hostname changes, or continue normally knowing the execution order is correct.

The hostname transition issue that’s been causing problems is now properly sequenced!

Key lessons:

Key Takeaways:

The Run.yml file is the ultimate arbiter of execution.

  • It calls the “system” role, the “cloudflare” tunnel role and the “seflie app” role. it then has “post_tasks” that handle the brittle commands that change the state of the device.
  • The system roll has all of the individual tasks in /roles/system/tasks whose order is controlled by main.yml in there.

Emailing 2FA is bad because…


There was a good thread on HackerNews recently regarding passkey and 2FA using email.

This person had a summary explaining the exploitation scenario:
https://news.ycombinator.com/item?id=44819917#44820331

Essentially- normal humans don’t scrutinize certificates/ website domains.

This is exploitable by malicious humans who send people a link to a page they control and trigger a 2FA login flow at the legit site. They trick the user into sharing the 2FA code directly with them. This stuff seems pretty obvious- but the summary explanation in the above comment is nice and tight.

This person had insightful counterarguments:
https://news.ycombinator.com/item?id=44819917#44820657

Specifically- 2FA fishing is mostly solved if remove/copy pasting of credentials.

I agree.

If sec engineers were thinking more about how to make user sign-in flows to be ruthlessly low friction, we’d be ok. Instead we over-index on a sign-in ritual that results in weakened security.

“I think this is mostly solved, or at least greatly mitigated, by using a Slack-style magic sign-in link instead of a code that you have the user manually enter into the trusted UI”

Comprehensive Troubleshooting Guide for AWUS036ACH on Raspberry Pi OS

Introduction

The Alfa AWUS036ACH is a popular USB Wi-Fi adapter that uses the Realtek RTL8812AU chipset. While powerful, it can present several challenges when setting up on a Raspberry Pi, especially for features like monitor mode and packet injection. This guide provides a systematic approach to identify and fix common driver issues.

Table of Contents

  1. Hardware Verification
  2. Basic Installation Methods
  3. Troubleshooting Common Issues
  4. Advanced Configuration
  5. Monitor Mode and Packet Injection
  6. Power Issues
  7. Kernel Compatibility
  8. Additional Resources

Hardware Verification

Before proceeding with software troubleshooting, verify your hardware:

1. Confirm your adapter model: Ensure you have the genuine AWUS036ACH.

2. USB port functionality:

  • The adapter requires significant power. Try connecting directly to the Raspberry Pi (not through a USB hub).
  • If possible, use a USB 3.0 port for better performance.
  • Use a high-quality USB cable.

3. Verify chipset detection:

lsusb

Look for ID 0bda:8812 (Realtek Semiconductor Corp. RTL8812AU).

4. Power Supply:

  • Ensure your Raspberry Pi has an adequate power supply (at least 2.5A recommended).
  • The adapter’s LED should light up when powered.

Basic Installation Methods

There are several approaches to install the driver. If one method fails, try an alternative.

Method 1: Using DKMS (Recommended)

DKMS ensures the driver is automatically rebuilt when the kernel is updated:

# Install prerequisites

sudo apt update
sudo apt upgrade
sudo apt-get install bc mokutil build-essential libelf-dev linux-headers-`uname -r` dkms git

# Clone the driver repository

git clone -b v5.6.4.2 https://github.com/aircrack-ng/rtl8812au.git
cd rtl8812au

# If using Raspberry Pi 3/4 with ARM64 architecture

sed -i 's/CONFIG_PLATFORM_I386_PC = y/CONFIG_PLATFORM_I386_PC = n/g' Makefile
sed -i 's/CONFIG_PLATFORM_ARM64_RPI = n/CONFIG_PLATFORM_ARM64_RPI = y/g' Makefile

# For older Raspberry Pi models (ARMv7)

sed -i 's/CONFIG_PLATFORM_I386_PC = y/CONFIG_PLATFORM_I386_PC = n/g' Makefile
sed -i 's/CONFIG_PLATFORM_ARM_RPI = n/CONFIG_PLATFORM_ARM_RPI = y/g' Makefile

# Install with DKMS

sudo make dkms_install

# Reboot

sudo reboot

Method 2: Using Morrownr’s Driver Repository

This is an alternative driver repository that’s often more up-to-date:

# Install prerequisites

sudo apt update && sudo apt upgrade
sudo apt-get install dkms
sudo apt install -y raspberrypi-kernel-headers build-essential bc dkms git

# Clone repository

mkdir -p ~/src
cd ~/src
git clone https://github.com/morrownr/8812au-20210629.git
cd ~/src/8812au-20210629

# Install driver

sudo ./install-driver.sh

If you encounter kernel header errors, modify the boot config:

sudo su
cd /boot
nano config.txt

Add the following line under the [pi4] section:

arm_64bit=0

Save (Ctrl+O, Enter), exit (Ctrl+X), then reboot:

sudo reboot

Troubleshooting Common Issues

Driver Not Loading

Check if the driver is loaded:

lsmod | grep 88

You should see 8812au or similar in the output.

If not loaded, try manual loading:

sudo modprobe 8812au

Check kernel logs for errors:

dmesg | grep -i rtl

Interface Not Appearing

List network interfaces:

ip a

Look for wlan1 (or similar) if you already have wlan0 for the built-in Wi-Fi.

If no interface appears:

sudo rmmod 8812au 
sudo modprobe 8812au

Reboot the system:

sudo reboot

Compilation Errors

Missing kernel headers:

sudo apt install raspberrypi-kernel-headers

Architecture mismatch errors:

For ARMv7:

export ARCH=arm 
sed -i 's/^MAKE="/MAKE="ARCH=arm\ /' dkms.conf

For ARM64:

export ARCH=arm64
sed -i 's/^MAKE="/MAKE="ARCH=arm64\ /' dkms.conf

Out of memory during compilation:

Increase swap space:

sudo nano /etc/dphys-swapfile

   # Change CONF_SWAPSIZE=100 to CONF_SWAPSIZE=2000

sudo /etc/init.d/dphys-swapfile restart

Advanced Configuration

LED Control

Control the LED behavior of the adapter:

# Disable LED blinking (0 = off, 1 = on)

echo "0" > /proc/net/rtl8812au/$(your_interface_name)/led_ctrl

# Check current setting

cat /proc/net/rtl8812au/$(your_interface_name)/led_ctrl

Alternatively, create a modprobe configuration:

echo "options 88XXau rtw_led_ctrl=0" | sudo tee /etc/modprobe.d/realtek-leds.conf

USB Mode Switching

Switch between USB 2.0/3.0 modes:

sudo rmmod 88XXau

sudo modprobe 88XXau rtw_switch_usb_mode=1  # 0=no switch, 1=USB2->USB3, 2=USB3->USB2

Disable MAC Address Randomization

If NetworkManager keeps changing your MAC address:

sudo nano /etc/NetworkManager/NetworkManager.conf

Add:

[device]

wifi.scan-rand-mac-address=no

Then restart NetworkManager:

sudo service NetworkManager restart

Monitor Mode and Packet Injection

These features are essential for network analysis and security testing.

Setting Up Monitor Mode

Kill potentially interfering processes:

sudo airmon-ng check kill

Set interface down:

sudo ip link set wlan1 down  # Replace wlan1 with your interface name

Set monitor mode:

sudo iw dev wlan1 set type monitor

Set interface up:

sudo ip link set wlan1 up

Verify monitor mode:

iwconfig wlan1

It should show “Mode: Monitor”.

Troubleshooting Monitor Mode

If monitor mode doesn’t work:

Try using airmon-ng instead:

sudo airmon-ng start wlan1

Check for interference:

sudo airmon-ng check

Kill any processes listed.

Manual mode setting:

sudo iwconfig wlan1 mode monitor

Check driver capability:

iw list

Look for “monitor” in supported interface modes.

Setting TX Power

Adjust transmission power (use with caution):

sudo iw wlan1 set txpower fixed 3000

Power Issues

The AWUS036ACH requires significant power, which can cause issues with the Raspberry Pi.

Check USB power management:

sudo iwconfig wlan1 power off

Disable power savings:

sudo nano /etc/modprobe.d/8812au.conf

Add:

options 8812au rtw_power_mgnt=0 rtw_enusbss=0

Use a powered USB hub if direct connection fails.

Kernel Compatibility

The driver might have compatibility issues with newer kernels.

Check current kernel version:

uname -a

Prepare kernel for module compilation:

cd /usr/src/kernel

sudo git clean -fdx && sudo make bcm2711_defconfig && sudo make modules_prepareFor severe kernel incompatibility issues, consider using an older kernel version:

sudo apt install raspberrypi-kernel=1.20201126-1

(Replace with appropriate version if needed)

Additional Resources

Conclusion

The AWUS036ACH can work well with Raspberry Pi, but requires some configuration. If one method fails, try an alternative approach, as driver compatibility can vary between different Raspberry Pi models and OS versions.

Remember that kernel updates might break your driver installation, requiring you to reinstall the driver. Using DKMS can help minimize this issue by automatically rebuilding the driver when the kernel is updated.