Join Nostr
2026-03-20 00:59:36 UTC
in reply to

Zsubmariner on Nostr: My experience has been the opposite and anthropic's models score best on bullshit ...

My experience has been the opposite and anthropic's models score best on bullshit bench, which is a pretty good proxy for sycophancy. Maybe it shifts if you talk to it in a personal way, which I don't do.

https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html