It feels like an LLM might be the way around this, rather than trying to second guess every possible question, which is an almost impossible quest. Having Speech-to-Phrase hand off to an LLM should give the best of both worlds.
…and yeah, I know, there are a multitude of ways LLMs suck, but they do a pretty decent job in this use case, so once they can be run locally without requiring massive resources we should have a good solution.