Join Nostr
2026-05-21 05:32:32 UTC
in reply to

captjack 🏴‍☠️✨💜 on Nostr: llama.cpp release b9235 added some new toys for boosting inference. Benchmarked ...

llama.cpp release b9235 added some new toys for boosting inference.

Benchmarked Qwen3.6 27B on an RTX 5090 with llama.cpp, using speculative n-gram tuning across 10k generated tokens tests.

Increasing --spec-ngram-map-k4v-size-m scaled decode throughput (predicted_per_second)