<oembed><type>rich</type><version>1.0</version><title>deadmanoz wrote</title><author_name>deadmanoz (npub12m…6pvdp)</author_name><author_url>https://yabu.me/npub12msugd9s2xz5p6flnm9ygsx2zcgnnr29tw0ezq53z3p06xag70vqv6pvdp</author_url><provider_name>njump</provider_name><provider_url>https://yabu.me</provider_url><html>“The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE layers to system RAM (or a fast SSD). With ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs.&#xA;&#xA;If the model fits, you will get &gt;40 tokens/s when using a B200.”&#xA;&#xA;https://unsloth.ai/docs/models/kimi-k2.5</html></oembed>