Integration or e2e tests however become much easier for humans to reason about and review effectively.
I think this is especially true for anything which can have its output expressed as a "golden", such as a compiler. It's really easy to review that goldens are reasonable as a human.