nprofile1q…whaal They speak about Llama 3. To fit 70b into 24 GB of VRAM, you have ...

2025-03-20 03:53:01 UTC

nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqkr8yrx4h6t40jvsx6lll4wxkelf582t7jxqk3x98rcr23mufyqhqawhaal (nprofile…haal)
They speak about Llama 3. To fit 70b into 24 GB of VRAM, you have to quantizize it down to IQ2_S, which means 2 bit. You can achieve about 10 tokens per second with that kind of quantization.

Author Public Key

Seen on

Show more details

TUXEDO on Nostr: nprofile1q…whaal They speak about Llama 3. To fit 70b into 24 GB of VRAM, you have ...