nprofile1q…fdfxr Yepp, bandwidth-bound workloads benefit most (as I found ...

Why Nostr? What is Njump? Join Nostr

Giovanni Crisalfi

npub1d6…l3fm6

2026-03-29 17:37:43 UTC

in reply to nevent1q…p2k0

nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpqh659u7uz26ggxc9t534espqa3scnduw7fqhzcw86z74ejd0xsdpsxfdfxr (nprofile…dfxr) Yepp, bandwidth-bound workloads benefit most (as I found optimizing this: https://github.com/gicrisf/qwen-asr-rs/tree/bf16-gemm)

In this case, most of the gain comes from the weight matrices: this avoids allocating temp f32 buffers and halves input/output memory traffic

Author Public Key

npub1d6jul9s08n9ucxpn9kj04gcvkcxuqdtp6cm3sm82f2rr8wh2qg2qnl3fm6

Seen on

wss://relay.ditto.pub

Show more details