Haven’t ran qwen in a minute but it’s surprising you’re not getting higher ...

Join Nostr

Why Nostr? What is Njump? Join Nostr

MachuPikacchu / Machu Pikacchu

npub1r6…4gmmd

2026-04-13 22:47:13 UTC

in reply to nevent1q…fue6

Haven’t ran qwen in a minute but it’s surprising you’re not getting higher throughput for gemma4 on your 3090s.

For what it’s worth if you use llama.cpp and disable reasoning you should see faster time to first byte at the cost of a slight degradation in quality. Haven’t used vllm so can’t comment there. I get 70-75 tok/s on a macbook m3 for comparison and only have 40 GPU cores

Author Public Key

npub1r6ggl0qazvwp02rlxgrf75lkfazuwhu35tmdg0u25eqsjax6243qh4gmmd

Show more details

MachuPikacchu on Nostr: Haven’t ran qwen in a minute but it’s surprising you’re not getting higher ...