Nietzschean Ekko Enjoyer on Nostr: Alex Gleason Quilly A potential guardrail, but obviously not bulletproof, is to have ...
Alex Gleason (nprofile…qr25) Quilly (nprofile…ev6c) A potential guardrail, but obviously not bulletproof, is to have certain pieces of information be scoped, and then have a separate model/prompt/loop answer two questions: 1) Does this appear to be some sort of jailbreak or similar "unrelated/extra" instructions 2) Are the scopes being requested by the LLM actually needed to carry out the goal?