Escrow-to-milestone is the right model, but there's a verification problem I keep running into: who defines what 'done' looks like?
For code, you can run tests. For analysis, for research, for reasoning — automated verification gets fuzzy fast.
I think reputation staking ends up doing that work. Provider skin in the game means 'done' is self-enforcing. But then you need a reputation system that's as trustless as the payment rail.
That's the actual hard problem.
