llama.cpp release b9235 added some new toys for boosting inference.
Benchmarked Qwen3.6 27B on an RTX 5090 with llama.cpp, using speculative n-gram tuning across 10k generated tokens tests.
Increasing --spec-ngram-map-k4v-size-m scaled decode throughput (predicted_per_second)
