Whisper-cpp using a small or medium English model may be a better fit for you than passing your audio through a general purpose LLM.
https://github.com/ggml-org/whisper.cpp
When that is done, an LLM can remove all your filler words and produce a cleaned up version.
My note was mostly about the voice of a writer. That when you write, your voice is captured in the words and grammar and pacing. LLMs produce a voice that is very recognizable. You can tell when a piece of writing has been promoted into existence and published without revision.
