nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpqsld6cd53kj8pkz9svnnt54um57nvwrlx9sgya7xhvhl5a5a4f3sqdxhcxx (nprofile…hcxx) nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq0cq07ulfyc7y2l8rczk9s36g8j65tq3m6xk9us8hr3ua4ktfmaqq05h6ty (nprofile…h6ty) nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpqpu3ryltaaqjym9gj5fpcv623edtfh2vuvrtcd8pz75p6nlyt0nrqwf22w0 (nprofile…22w0) You say "efficiently" but the extra instructions are small and sent down a dedicated path. One of the problems with the implicit mechanism is you pay for it all the time, even in straight-line code. It is also tricky for the compiler to control it for cases where you DO need to sync.
Note that there are at least three different ways to do the implicit mechanism, and they all have pros and cons.
FWIW, Intel eventually removed their version and went fully explicit.