llama.cpp release b9235 added some new toys for boosting inference. Benchmarked ...

2026-05-21 05:32:32 UTC

llama.cpp release b9235 added some new toys for boosting inference.

Benchmarked Qwen3.6 27B on an RTX 5090 with llama.cpp, using speculative n-gram tuning across 10k generated tokens tests.

Increasing --spec-ngram-map-k4v-size-m scaled decode throughput (predicted_per_second)

Author Public Key

npub1te0uzs6vj29umjaxlqqct82j8q6ppyefrxq06dhr8d6pvwfatgkqjmjgwp

Show more details

captjack 🏴‍☠️✨💜 on Nostr: llama.cpp release b9235 added some new toys for boosting inference. Benchmarked ...