A Linux Server Mystery
Ever since I moved in late 2023, my home server has had intermittent periods of becoming unavailable. Due to this behavior correlating with when we moved, I was for a long time attributing this to a network error and thought that it was my Router's fault, as I've simultaneously been having some issues with my network bridge (WiFi + Eth). This network issue would cause the bridge to lose its assigned IP range whenever hostapd issued a DFS scan.
As it turns out, these issues were entirely unrelated. After I had managed to fix the network issue and continued to see my server going offline without being able to recover it without a hard reboot, I decided to buy a small 7" screen that I could hook up to it and mount on the wall.
When the server went offline the next time after I had installed the screen, I immediately was able to see the error.
Hardware failures in my SSD when writing to /var/ caused my root filesystem to lock up and enter read-only mode. This was easy to idenitfy as soon as I managed to hook up a monitor to my server, but felt impossible to root cause prior to that since I had no serial or SSH access.
As a result, I bought a new SSD (not as cheap this time), spent hours dismantling and reconstructing my server (the M2 slot for the SSD was on the bottom of the motherboard, requiring me to pull out everything before being able to access it). And this time, I opted to use ZFS on my root partition too, as an extra failsafe.