My rack lives far enough from the Xfinity gateway that running a clean Cat6 between them is a project, not a ten-minute job. So for the last few months, every node in the cluster – three mini-PCs and two HP servers with USB wifi dongles – has been on wifi, all five associating with the SSID broadcast by an ASUS RT-AC2400 sitting on the rack in Media Bridge mode. The ASUS connects wirelessly to the Xfinity AP and provides a local SSID for the cluster.
It worked. It was fragile. Time to cable it.
Why a Wifi Bridge and Not Powerline
The obvious alternative was a powerline (PLC) adapter pair – one near the Xfinity gateway, one in the rack, using the home’s electrical wiring as the physical layer. I evaluated this. I did not buy it.
The pitch for powerline is “wired without pulling cable.” The reality, especially in older buildings with mixed-vintage wiring, is:
- Bandwidth is a marketing number. “AV2000” adapters advertise 2 Gbps. Real-world throughput on a noisy home circuit is closer to 100-300 Mbps – comparable to wifi, sometimes worse.
- Copper noise is real and unpredictable. A vacuum cleaner on a different outlet, an LED dimmer, a charging laptop – any of these can drop your link rate by half. You only find this out after you’ve installed the things.
- It’s the same single shared medium as wifi. PLC is a contention-based protocol over a noisy bus. The “wired” feeling is misleading.
The ASUS in Media Bridge mode at least has a clean radio and the bridge logic is well-understood. The trade-off is one extra wireless hop between the rack and the gateway, which is exactly the bottleneck that came back to bite me later. More on that.
The Cutover Plan
Goal: pull every node off wifi, put each one on Ethernet through a small in-rack switch, with the switch’s uplink going to a LAN port on the ASUS (still in Media Bridge mode – the wifi-to-Xfinity hop stays).
before: after:
+-- mini-pc-3
Xfinity Xfinity +-- mini-pc-1
^ wifi ^ wifi |
ASUS (Media Bridge) <-- 5x wifi ASUS -- switch ----+-- mini-pc-2
+-- hp-server-1
+-- hp-server-2
The control plane node has its IP (10.0.0.11) hard-coded in kubelet, kube-apiserver, etcd manifests, and the apiserver TLS cert SANs. Losing that IP means rebuilding the cluster, so the cutover for it had to preserve the IP across the interface migration – not a fresh DHCP lease, not a new value.
The plan: pre-stage netplan on every node (Ethernet stanza with the existing IP, drop the wifi block), then netplan apply with a 60-second auto-rollback safety net via systemd-run. If the new config doesn’t bring up the expected IP on the expected interface, the rollback restores the wifi config from a backup and re-applies. Self-healing if I screwed up.
I queued up a $25 unmanaged 8-port switch – a Netgear GS108 – and started.
The Switch That Wasn’t On
I cabled the rack, plugged in the switch, and powered it on. Power LED came on. Blinking. No port LEDs lit. From software:
$ for nic in enp1s0 eno1; do sudo ethtool $nic | grep "Link detected"; done
Link detected: no
Link detected: no
All five nodes: zero link. Re-seated cables, swapped a cable for a known-good one, tried different switch ports. Nothing.
GS108 spec: the power LED should be solid green when healthy. There is no normal blinking state. Combined with no port LEDs lighting, the switch was failing self-test.
The PSU brick said 12V 0.5A – correct spec for the GS108 (you can read it off the label on the bottom of the switch). But “labeled correctly” and “outputting correctly” are not the same thing, especially for cheap wall warts that have been sitting in a drawer for years.
I tested the brick independently against a known-good supply – it was the failure. With a fresh 12 V brick, every port came up at 1000 Mb/s Full duplex on first plug.
The switch was fine. The brick was lying.
The Cutover, in One Loop
With the switch alive, the cutover itself was anticlimactic. For each worker:
# pre-stage netplan: ethernet only, current static IP
# write /etc/netplan/50-cloud-init.yaml.bak.<ts>
# write new netplan
# arm 60s auto-rollback via systemd-run
sudo systemd-run --unit=netplan-rollback-$ts --on-active=60s \
/bin/bash -c "
if ! ip -4 addr show $ETH | grep -q '$IP/' \
|| ! ping -c 3 -W 2 10.0.0.1 >/dev/null; then
cp $BACKUP /etc/netplan/50-cloud-init.yaml
netplan apply
fi
"
sudo netplan apply
All four workers cut over in sequence in under two minutes. Then the control plane node, where I held my breath because the SSH session for that one would drop the moment the IP migrated. It came back on Ethernet within ten seconds and the cluster never saw a NotReady.
kubectl get nodes: 5 / 5 Ready. Done.
The Bandwidth Surprise
I ran a curl of a 50 MB Cloudflare file from each node before and after. This is what I expected: latency improves a touch, throughput stays the same. This is what I got:
| Node | Wi-Fi | Wired | Δ |
|---|---|---|---|
| mini-pc-3 | 86 Mbps | 36 Mbps | -58% |
| mini-pc-1 | 246 Mbps | 37 Mbps | -85% |
| mini-pc-2 | 74 Mbps | 35 Mbps | -53% |
| hp-server-1 | 76 Mbps | 53 Mbps | -30% |
| hp-server-2 | 56 Mbps | 56 Mbps | 0% |
Every node got slower. The biggest regression – mini-pc-1 going from 246 Mbps to 37 – was the one that puzzled me until I thought about the topology.
Before: mini-pc-1 had a 2-stream HE wifi link, probably negotiating directly with the Xfinity gateway’s AP rather than the ASUS bridge. Each node had its own air-time allocation.
After: every node funnels through the same ASUS-to-Xfinity wireless uplink, which is now the choke point. The ASUS in Media Bridge mode is essentially a wifi repeater for upstream – one radio doing the LAN-side bridge and the upstream side. Five clients sharing one uplink will not match five clients each negotiating their own.
So why am I keeping it?
I Almost Reverted. Then I Tested the Other Direction.
Looking at that table, I had my finger on the rollback button. Five-out-of-five regressed, one of them by 85%. That’s exactly the criteria I’d written down in my own runbook for “trigger a revert.”
Before pulling the trigger, I ran the same test in the other direction – uploading to Cloudflare’s __up endpoint instead of downloading from __down. I expected upload to be the worse direction. Residential broadband is heavily asymmetric – Xfinity gives me ~500 Mbps down and a fraction of that up. So if download is capped, upload should be hopeless.
| Node | Download (avg of 3) | Upload (avg of 2) |
|---|---|---|
| mini-pc-3 | 34 Mbps | 82 Mbps |
| mini-pc-1 | 38 Mbps | 98 Mbps |
| mini-pc-2 | 35 Mbps | 117 Mbps |
| hp-server-1 | 50 Mbps | 69 Mbps |
| hp-server-2 | 44 Mbps | 93 Mbps |
Upload is two to three times faster than download. That’s backwards for residential. When upload beats download on a Comcast line, the download isn’t really bandwidth-bound – something is throttling it. Most likely: all five nodes share one NAT’d public IP, and Cloudflare’s __down endpoint applies per-source-IP rate limiting. The mini-pc-1 Wi-Fi outlier of 246 Mbps is probably a node that was associating directly with the Xfinity AP and showed up at Cloudflare with a different network position.
That changed the calculus. The “bad” number is in a direction I don’t actually use heavily. The “good” number is in the direction that matters.
Why Reduced Download Doesn’t Mean Reduced Performance
Two paths matter for what cps.joshuaantony.com and the blog actually do.
Path 1: serving a visitor. A Cloudflare Tunnel doesn’t open ports on your router; it’s an outbound-only QUIC connection from the cluster to Cloudflare’s edge. When a visitor hits the site, the request flows into the cluster (small, ~1 KB). The HTML response flows back out of the cluster through the same tunnel (~10-100 KB). The bandwidth-heavy part of serving a page is the response going out – which uses my upload direction. The one that’s healthy at 80-117 Mbps. That’s why the site feels fast: I’m using the fast direction for the data-heavy work.
Path 2: database queries. cps is a DB-backed app. Each page render fires several SQL queries from the cps pod to the cps-postgres pod. These queries never leave the cluster. They go pod-to-pod over the Flannel CNI overlay, which now rides on the wired switch fabric. No upstream involvement at all.
I measured inter-node ping over the new wired path:
mini-pc-1 -> hp-server-1: rtt min/avg/max/mdev = 0.296/0.565/1.079/0.200 ms
hp-server-2 -> hp-server-1: rtt min/avg/max/mdev = 0.377/0.589/0.750/0.116 ms
Sub-millisecond. That’s gigabit-switch wire territory.
Compare to wifi: the gateway hop was 4-11 ms with high jitter under contention. Inter-node would have been similar or worse, because every pod-to-pod packet competed for airtime against five chatty cluster clients. Spikes to 50-100 ms when the cluster got busy.
For a five-query page render:
| Wi-Fi era | Wired era | |
|---|---|---|
| Per-query network round trip | ~5-10 ms, variable | ~0.5 ms, consistent |
| 5 queries × RTT | 25-50 ms | 2.5 ms |
| Tail latency under load | spikes to 100+ ms | basically none |
DB-backed page renders are measurably faster on wired than they were on wifi – by 20-50 ms per request, with the tail-latency outliers gone. The “internet got slower” headline is in the one direction my workload barely uses. The two paths I actually depend on – response upload through Cloudflare, and intra-cluster pod-to-pod – are both better than before.
Why “Worse” is Still Better Here
Beyond the per-request math, two reasons the wired setup wins on net:
- Stability. No more random kubelet disconnects. No more “wifi powersave killed my database” type debugging. The radios are no longer load-bearing for cluster control plane traffic.
- Headroom for internal work. NFS reads from the NAS, kubelet ↔ apiserver, kube-proxy, cluster DNS, image pulls between nodes – all of this used to compete for wifi airtime. Now it’s wire-speed and uncontended.
The internet download bottleneck is solvable. The right fix is to stop bridging over wifi at all – either run a Cat6 across the apartment (the original problem we were avoiding), or move the ASUS into AP mode and connect its WAN port to the Xfinity gateway with a cable, eliminating the wireless hop. That is a different weekend’s problem.
What I Would Tell Past Me
- A blinking power LED on a consumer switch is a fault, not a feature. Don’t spend an hour reseating cables.
- Test cheap power bricks with a multimeter before you trust them. Output spec on the label and output voltage at the barrel are two different facts.
- Pre-stage your netplan with an auto-rollback safety net.
netplan tryis fine when you are at the local console; for remote work, asystemd-runtimer that restores a backup if the expected IP isn’t on the expected interface is the difference between “self-heals” and “drive 30 minutes to plug in a keyboard.” - Measure before you change. I would not have known the wired config regressed if I hadn’t taken a one-minute Wi-Fi baseline first. The “what changed?” question is much easier when you have numbers from before.
- Measure both directions. A regression in download alone almost made me revert. Testing upload took thirty extra seconds and changed the verdict. If you only measure the direction you suspect, you’ll miss the direction that actually matters for your workload.
- Headline numbers are not workload numbers. Internet speedtests are one signal. For a DB-backed app served through a tunnel, the workload-relevant signal is intra-cluster latency and outbound response throughput, not “how fast can I pull a 50 MB file from a CDN.”
- Powerline adapters look attractive when you don’t want to pull cable. They almost never beat what they’re advertised to. A wireless bridge with a clean radio path beats a wired connection over noisy electrical wiring.