Using transformers does not mutually exclude other tools in the sleeve.
What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.
Indeed, there are even multiple attempts to use both self-attention and convolutions in novel architectures, and there is evidence this works very well and may have significant advantages over pure vision transformer models [1-2].
IMO there is little reason to think transformers are (even today) the best architecture for any deep learning application. Perhaps if a mega-corp poured all their resources into some convolutional transformer architecture, you'd get something better than just the current vision transformer (ViT) models, but, since so much optimizations and work on the training of ViTs has been done, and since we clearly still haven't maxed out their capacity, it makes sense to stick with them at scale.
That being said, ViTs are still currently clearly the best if you want something trained on a near-entire-internet of image or video data.
What do you think about Geoffrey Hinton's concerns about the AI (minus "AGI")? Do you agree with those concerns or do you believe that LLMs are only that much "useful" so they wouldn't impose a risk on our society?
I actually wish the audience to take the opposite, or perhaps a more balanced view. Being pragmatic is like taking an extreme view and as much as this article is a great resource, and contains some legit advice otherwise difficult to find elsewhere in such a concise form, folks need to be aware that this advice is what Google found for their unfathomable scale codebase to gain some real world benefits.
The things this article is describing are more nuanced than just "think about the performance sooner than latter". I say this as someone who does these kind of optimizations for a living and all too often I see teams wasting time trying to micro-optimize codepaths which by the end of the day do not provide any real demonstrable value.
And this is a real trap you can get into really easily if you read this article as a general wisdom, which is not.
I think because of capacity. This race is mainly driven with the AI power demand estimated to increase 10x in the next 5 years. Currently it's 5GW and by 2030 it is expected to be 50GW.
Is this taking efficiency gains into account? I would expect 10X efficiency increase every 3 years given Moore's Law and the hardware appropriate algorithms tendency.
They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.
This goes to show how leaders in Mistral don't quite get that they are not special as they seem to think they are. Anthropic or OpenAI also require their talent to relocate but with stakes that are at least a high reward - $500k or $1M a year is a good start that is maybe worth investing into.
> They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.
The best talents have been regularly leaving Paris and London, India and China for decades. With the US closing its borders, they definitely have a chance to lure some.
If somebody is in the EU already that calculation completely flips. We have a strong software startup industry in the US, would it really be that surprising if there was more unallocated talent in the EU, at this point?
> If somebody is in the EU already that calculation completely flips.
Would you find it compelling to move your whole life for ~100k EUR when you can make as much or more at your home city, with a job that is almost certainly more stable?
And I meant the Europeans. People in EU don't have a culture of moving between cities or countries unless they really have a strong reason to, e.g. can't find a job at home.
> would it really be that surprising if there was more unallocated talent in the EU, at this point?
I am pretty sure there is. It has changed over the course of last few years, primarily because of COVID, and companies willing to offer remote contracts, but it's far from being able to utilize the talent.
Sure, now imagine answering 10 different people to all of their questions? It's the largest hindrance I have ever seen but I agree with the above comment that it largely depends on the team.
> Predictive SELECT Statements:
> Added the PRECOGNITION keyword.
> SELECT * FROM sales WHERE date = 'tomorrow' now returns data with 99.4% accuracy by leveraging the built-in 4kB inference engine. The library size has increased by 12 bytes to accommodate this feature.
12 bytes really sounds like something that the lead dev would write!
What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.
[1] https://arxiv.org/html/2509.20787v2
reply