Mathaetaes on Nostr: this is way beyond my understanding, but there’s a step in there where they ...
this is way beyond my understanding, but there’s a step in there where they quantize an fp32 to an int8. I don’t know how much of the fp32 range a model typically uses, but int8 is a hell of a lot less precise. I’d be curious to see a side by side comparison of outputs of a model trained on the same data, running on this vs on a GPU or whatever they typically run on.