Juraj on Nostr: Picking a Qwen3.6 model in Ollama? More parameters ≠ better model. Architecture ...
Picking a Qwen3.6 model in Ollama? More parameters ≠ better model. Architecture matters.
qwen3.6:27b is a dense model — every token goes through the entire network. Slower, but higher-quality output. qwen3.6:35b-a3b is MoE (mixture of experts): a router picks a few experts, and only ~3B of the 35B parameters are active per token. Much faster, usually lower quality. You still need all 35B loaded into RAM. The benchmarks back this up: Qwen claims the 27B dense beats their own previous flagship 397B-A17B MoE on agentic coding.
Quantization is rounding the weights (4-bit, 8-bit, …). Smaller file, faster inference, mild quality loss. Running unquantized is usually slower because of memory-bandwidth overhead.
MLX is Apple Silicon–native inference, the fastest path on a Mac. In Ollama, though, the MLX builds of Qwen3.6 are text-only for now — vision isn't wired up on Ollama's MLX path yet, and Unsloth's vision-capable builds don't currently work on Ollama either. If you need images, you're stuck waiting, or going outside Ollama (LM Studio, mlx-vlm).
TL;DR: dense is better for quality, MoE for speed, quantization almost always yes, MLX on a Mac — but in Ollama, no images yet.
Pick here:
https://ollama.com/library/qwen3.6/tagsPublished at
2026-04-23 07:42:22 UTCEvent JSON
{
"id": "504bcc8cb1c4ae4d7ad6580e977339a9904318636852be9f343e02df1a55bae6",
"pubkey": "dab6c6065c439b9bafb0b0f1ff5a0c68273bce5c1959a4158ad6a70851f507b6",
"created_at": 1776930142,
"kind": 1,
"tags": [
[
"r",
"wss://nostr.cypherpunk.today/"
],
[
"r",
"wss://nos.lol/"
],
[
"r",
"wss://relay.primal.net/"
],
[
"r",
"wss://relay.damus.io/"
],
[
"client",
"Primal Web"
]
],
"content": "Picking a Qwen3.6 model in Ollama? More parameters ≠ better model. Architecture matters.\nqwen3.6:27b is a dense model — every token goes through the entire network. Slower, but higher-quality output. qwen3.6:35b-a3b is MoE (mixture of experts): a router picks a few experts, and only ~3B of the 35B parameters are active per token. Much faster, usually lower quality. You still need all 35B loaded into RAM. The benchmarks back this up: Qwen claims the 27B dense beats their own previous flagship 397B-A17B MoE on agentic coding.\nQuantization is rounding the weights (4-bit, 8-bit, …). Smaller file, faster inference, mild quality loss. Running unquantized is usually slower because of memory-bandwidth overhead.\nMLX is Apple Silicon–native inference, the fastest path on a Mac. In Ollama, though, the MLX builds of Qwen3.6 are text-only for now — vision isn't wired up on Ollama's MLX path yet, and Unsloth's vision-capable builds don't currently work on Ollama either. If you need images, you're stuck waiting, or going outside Ollama (LM Studio, mlx-vlm).\nTL;DR: dense is better for quality, MoE for speed, quantization almost always yes, MLX on a Mac — but in Ollama, no images yet.\n\nPick here: https://ollama.com/library/qwen3.6/tags",
"sig": "90f230cd07e19cf5d57828433c19fb8f1030973dbeb5c3c31aad563547acdda532923e3748caaf7390cdf64666f74b4980d92721ddbf1a9474d70a430841756f"
}