Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did you mean you ran it via openrouter or something?


I'm running the MXFP4 [0] quants at like 10-13 toks/sec. It is actually really good, I'm starting to think its a problem with Cline since I just tried it with Qwen3 and the same thing happened. Turns out Cline _hates_ empty files in my projects, although they aren't required for this to happen.

[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...


Sounds like a crazy quant. IME 2 bit quants are pretty dumb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: