If you are running a Mac this is a little easier, btw because of the unified bus. A Mac shares the bus and RAM across the GPU and CPU, which makes running small and medium-sized models both performant and possible with Ollama.
These models are GPU and RAM hungry between the model size and the context size.
The best video I have seen to somewhat fit it in my head is: https://www.youtube.com/watch?v=QfFRNF5AhME