Join Nostr
2026-04-01 08:28:29 UTC
in reply to

jonny (good kind) on Nostr: i love this. there's a mechanism to slip secret messages to the LLM that it is told ...

i love this. there's a mechanism to slip secret messages to the LLM that it is told to interpret as system messages. there is no validation around these of any kind on the client, and there doesn't seem to be any differentiation about location or where these things happen, so that seems like a nice prompt injection vector. this is how claude code reminds the LLM to not do a malware, and it's applied by just string concatenation. i can't find any place that gets stripped aside from when displaying output. it actually looks like all the system reminders get catted together before being send to the API. neat!