More

menaerus · 2025-12-23T08:53:41 1766480021

Using transformers does not mutually exclude other tools in the sleeve.

What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.

[1] https://arxiv.org/html/2509.20787v2

D-Machine · 2025-12-23T15:09:27 1766502567

Indeed, there are even multiple attempts to use both self-attention and convolutions in novel architectures, and there is evidence this works very well and may have significant advantages over pure vision transformer models [1-2].

IMO there is little reason to think transformers are (even today) the best architecture for any deep learning application. Perhaps if a mega-corp poured all their resources into some convolutional transformer architecture, you'd get something better than just the current vision transformer (ViT) models, but, since so much optimizations and work on the training of ViTs has been done, and since we clearly still haven't maxed out their capacity, it makes sense to stick with them at scale.

That being said, ViTs are still currently clearly the best if you want something trained on a near-entire-internet of image or video data.

[1] https://arxiv.org/abs/2103.15808

[2] https://scholar.google.ca/scholar?hl=en&as_sdt=0%2C5&q=convo...

menaerus · 2025-12-23T08:36:37 1766478997

What do you think about Geoffrey Hinton's concerns about the AI (minus "AGI")? Do you agree with those concerns or do you believe that LLMs are only that much "useful" so they wouldn't impose a risk on our society?

menaerus · 2025-12-22T13:06:40 1766408800

> 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

For general purpose LLM probably yes. For something very domain-specialized not necessarily.

menaerus · 2025-12-22T13:04:43 1766408683

OpenAI released GPT-4o in May 2024, and Anthropic released Claude 3.5 Sonnet in June 2024.

I haven't tried the local models as much but I'd find it difficult to believe that they would outperform the 2024 models from OpenAI or Anthropic.

The only major algorithmic shift was done towards the RLVR and I believe it was already being applied during the 2023-2024.

menaerus · 2025-12-21T09:48:32 1766310512

I actually wish the audience to take the opposite, or perhaps a more balanced view. Being pragmatic is like taking an extreme view and as much as this article is a great resource, and contains some legit advice otherwise difficult to find elsewhere in such a concise form, folks need to be aware that this advice is what Google found for their unfathomable scale codebase to gain some real world benefits.

The things this article is describing are more nuanced than just "think about the performance sooner than latter". I say this as someone who does these kind of optimizations for a living and all too often I see teams wasting time trying to micro-optimize codepaths which by the end of the day do not provide any real demonstrable value.

And this is a real trap you can get into really easily if you read this article as a general wisdom, which is not.

menaerus · 2025-12-21T08:20:12 1766305212

I think because of capacity. This race is mainly driven with the AI power demand estimated to increase 10x in the next 5 years. Currently it's 5GW and by 2030 it is expected to be 50GW.

notKilgoreTrout · 2025-12-21T16:30:52 1766334652

Is this taking efficiency gains into account? I would expect 10X efficiency increase every 3 years given Moore's Law and the hardware appropriate algorithms tendency.

menaerus · 2025-12-22T16:55:24 1766422524

Yes, of course.

menaerus · 2025-12-20T11:50:20 1766231420

They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.

This goes to show how leaders in Mistral don't quite get that they are not special as they seem to think they are. Anthropic or OpenAI also require their talent to relocate but with stakes that are at least a high reward - $500k or $1M a year is a good start that is maybe worth investing into.

StopDisinfo910 · 2025-12-21T11:35:11 1766316911

> They can't hire the best talent because the most experienced people will not leave their homes to chase a high-risk role with questionable remuneration by relocating their whole life to Paris or London.

The best talents have been regularly leaving Paris and London, India and China for decades. With the US closing its borders, they definitely have a chance to lure some.

bee_rider · 2025-12-20T16:05:21 1766246721

If somebody is in the EU already that calculation completely flips. We have a strong software startup industry in the US, would it really be that surprising if there was more unallocated talent in the EU, at this point?

menaerus · 2025-12-21T07:59:44 1766303984

> If somebody is in the EU already that calculation completely flips.

Would you find it compelling to move your whole life for ~100k EUR when you can make as much or more at your home city, with a job that is almost certainly more stable?

And I meant the Europeans. People in EU don't have a culture of moving between cities or countries unless they really have a strong reason to, e.g. can't find a job at home.

> would it really be that surprising if there was more unallocated talent in the EU, at this point?

I am pretty sure there is. It has changed over the course of last few years, primarily because of COVID, and companies willing to offer remote contracts, but it's far from being able to utilize the talent.

wqaatwt · 2025-12-23T08:25:42 1766478342

> People in EU don't have a culture of moving between cities or countries

Southern and Eastern Europeans certainly do.

menaerus · 2025-12-23T09:53:42 1766483622

I said unless they have a strong reason to. Do they?

wqaatwt · 2025-12-24T08:13:13 1766563993

Do people in other places constantly move to other states/countries on a whim without any strong reasons?

menaerus · 2025-12-16T17:00:50 1765904450

Sure, now imagine answering 10 different people to all of their questions? It's the largest hindrance I have ever seen but I agree with the above comment that it largely depends on the team.

menaerus · 2025-12-10T13:29:21 1765373361

And

  It is now the only software in the world still written in C89.

Hilarious.

throwaway2037 · 2025-12-12T05:11:07 1765516267

But wait, there's more!

    > Predictive SELECT Statements:
    > Added the PRECOGNITION keyword.
    > SELECT * FROM sales WHERE date = 'tomorrow' now returns data with 99.4% accuracy by leveraging the built-in 4kB inference engine. The library size has increased by 12 bytes to accommodate this feature.

12 bytes really sounds like something that the lead dev would write!

menaerus · 2025-12-10T07:31:23 1765351883

Similarly in regular SQL systems, the same is achieved by fsyncing to WAL.