gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did ...

jszymborski · 2025-12-22T15:04:00 1766415840

I'm running the MXFP4 [0] quants at like 10-13 toks/sec. It is actually really good, I'm starting to think its a problem with Cline since I just tried it with Qwen3 and the same thing happened. Turns out Cline _hates_ empty files in my projects, although they aren't required for this to happen.

[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...

kube-system · 2025-12-22T07:11:50 1766387510

Sounds like a crazy quant. IME 2 bit quants are pretty dumb.