nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqkr8yrx4h6t40jvsx6lll4wxkelf582t7jxqk3x98rcr23mufyqhqawhaal (nprofile…haal)
They speak about Llama 3. To fit 70b into 24 GB of VRAM, you have to quantizize it down to IQ2_S, which means 2 bit. You can achieve about 10 tokens per second with that kind of quantization.