> Codex is more hands off, I personally prefer that over claude's more hands-on approach
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.
Well, if you had tried using GPT/Codex for development you would know that the output from those 5 hours would not be 90% trash, it would be close to 100% pure magic. I'm not kidding. It's incredible as long as you use a proper analyze-plan-implement-test-document process.
It's definitely cool and engineering wise close to SOTA given lovable and all of the app generators.
But, assuming you are trying to be in between lovable and google, how are you not going to be steamrolled by google or perplexity etc the moment you get solid traction? Like, if your insight for v3 was that the model should make its own tools, so even less hardcoded, then i just dont see a moat or any vertical direction. What really is the difference?
Thanks, and great question. The custom Phind models are really key here -- off-the-shelf models (even SOTA models from big labs) are slow and error-prone when it comes to generating full websites on-the-fly.
Our long-term vision is to build a fully personalized internet. For Google this is an innovator's dilemma, as Google currently serves as a portal to the current internet.
because the secret is that the web runs on advertising/targeted recommendations. Brezos(tm) wants you to actively browse Ramazon so he can harvest your data, search patterns etc. Amazon and most sites like that are very not crawl friendly for this reason. Why would Brezos let Saltman get all the juicy preference data?
On the foundational level, test time compute(reasoning), heavy RL post training, 1M+ plus context length etc.
On the application layer, connecting with sandboxes/VM's is one of the biggest shifts. (Cloudfares codemode etc). Giving an llm a sandbox unlocks on the fly computation, calculations, RPA, anything really.
MCP's, or rather standardized function calling is another one.
Also, local llm's are becoming almost viable because of better and better distillation, relying on quick web search for facts etc.
Their page itself looks classic v0/ai generated, that yellow/orange warning box, plus the general shadows/borders screams LLM slop etc. Is it too hard these days to spend 30 minutes to think about UI/user experience?
I actually like the idea, not sure about monetization.
It also requires access to all the data?? And it's not even open source.
> I actually like the idea, not sure about monetization.
To be fair, we're not sure about monetization either :) We just had a lot of fun building it and have enjoyed seeing what people make with it.
> It also requires access to all the data??
Think of us like Tampermonkey/some other userscript manager. The scripts you run have to go through our script engine. That means that any data/permission your script needs access to, our script needs access to. We do try to make the scripting transparent. If you're familiar with the Greasemonkey API, we show you which permissions a given script requests (e.g. here https://www.tweeks.io/share/script/d856f07a2cb843c5bfa1b455, requires GM_addStyle)
I'm working on Flavia, an ultra-low latency voice AI data analyst that can join your meetings. You can throw in data(csv's, postgres db's, bigquery, posthog analytics for now) and you just talk and ask questions. Using cerebras(2000 tokens per second) and very low latency sandboxes on the fly, you can get back charts/tables/analysis in under 1 second. (excluding time of the actual SQL query if you are doing bigquery).
She can also join your google meet or teams meetings, share her screen and then everyone in the meeting can ask questions and see live results. Currently being used by product managers and executives for mainly analytics and data science use cases.
We plan to open-source it soon if there is demand. Very fast voice+actions is the future imo
Great feedback thanks! We have added a synthetic e-commerce dataset as an example when you sign up so you can test it without your data first. Will also add a demo video ASAP.
What kind of plan do you have with Cerebras? It seems like something like that would need one of the $1500/month plans at least if there were more than a handful of customers.
They introduced pay as you go recently. The limits on that is similar to the plans, 1 million tokens per minute, so if you stack a few keys and do a simple load balancing with redis, can cover a decent amount of traffic with no upfront cost. Eventually we would have to go enterprise though yes!
ok.. when I tried to use pay-as-you-go it was unusable for me because there were a ton of 429s and 503s. one test it was just constant for a few seconds when I tried it, 429 or 503.
I am using it for a voice application though so retrying causes a delay for the user that they don't expect. especially if it stays unavailable for a few seconds.
It is special because of what is being discussed here: it attempted (pretended?) to do so as a non-profit, which arguably gave it early support by people who otherwise may not have provided it. None of the other players you mention did so, which to me makes it an unfair advantage. Or not, given that it seems that anything is fair that you can get away with these days.
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.
reply