For instance, the interface of a queue is front()/back()/push()/pop(), but there's no obvious way to make that truly atomic without locking. It also requires that front() and back() have non-const variations, which means now I have to hand back a reference to something that -logically- cannot be modified or it breaks the invariants for the lock-free code.
I think most performance concerns are overblown if you're skimming the generated code, which is normal for high-perf code.