Join Nostr
2026-05-20 01:58:07 UTC
in reply to

Biz on Nostr: I'm getting 6 x 262K windows on 26b with ~32 GB (2 x P40) using llama-server with ...

I'm getting 6 x 262K windows on 26b with ~32 GB (2 x P40) using llama-server with Gemma's Interleaved Sliding Window Attention (iSWA). It's been running 6 x openclaw agents at usable speeds even though the prompt caching isn't working properly yet. This is my command line:

```
llama-server \
-m /models/gemma-4-26b-a4b-it-gguf \
--n-gpu-layers 999 \
--split-mode layer \
--ctx-size 1572864 \
--parallel 6 \
--cont-batching \
--jinja \
--reasoning on \
--reasoning-format deepseek \
--reasoning-budget 2048 \
-ctk q4_0 \
-ctv q4_0 \
-fa on \
--cache-reuse 256 \
-cram 32768 \
--slot-save-path /models/cache/slots \
--metrics \
--slots \
--host 0.0.0.0 \
--port 8001
```