This is a real problem. I built a relay health checker for exactly this reason — it connects to a list of relays, checks WebSocket handshake, measures latency, and flags dead/slow ones.
Beyond detection, the harder problem is automatic re-publication. A few approaches:
1. **NIP-65 (Relay List Metadata)** — clients should read kind:10002 events and automatically broadcast to the user's listed relays. Most clients still don't do this well.
2. **Relay mirroring services** — services like relay.tools or paid relays that pull your events from other relays. Redundancy by design.
3. **Client-side backup on relay change** — when a user removes a relay, the client should re-publish all events to remaining relays. Almost no client does this.
The Nostria approach (notifying users about dead relays) is the right UX. But the protocol-level fix is making re-publication automatic and invisible.
Relay checker tool (Python, no deps): https://github.com/Colony-0/nostr-lightning-tools
