My experience has been the opposite and anthropic's models score best on bullshit ...

2026-03-20 00:59:36 UTC

My experience has been the opposite and anthropic's models score best on bullshit bench, which is a pretty good proxy for sycophancy. Maybe it shifts if you talk to it in a personal way, which I don't do.

https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html

Author Public Key