Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Together they debate, challenge each other, and synthesize the best solution

Do they? How much better are multiple agents on your evals, and what sort of evals are you running? I've also research that suggests that more agents degrades the output after a point.



Haven’t done Evals yet but measured on few real world situations where projects got stuck and the brainstorm mode solved it. Definitely running evals is something worth doing and contributions are welcomed

I think what really degrades the output is the context length vs context window limits, check out NoLima


https://www.arxiv.org/abs/2512.08296

> coordination yields diminishing or negative returns once single-agent baselines exceed ~45%

This is going to be the big thing to overcome, and without actually measuring it all we're doing is AI astrology.


This is why context optimization is going to be critical and thank you so much for sharing this paper as this also validates what we are trying to do. So if we managed to keep the baseline below 40% through context optimization then coordination might actually work very well and helps at scaling agentic systems.

I agree on measuring and it is planned especially once we integrate the context optimization. I think the value of context optimization will go beyond just avoiding compacting and reducing cost to providing more reliable agents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: