https://i.imgur.com/xsFKqsI.png "Draw a picture of a full glass of wine, ie a wi...

cruffle_duffle · 2025-03-25T22:36:08 1742942168

Maybe the "HELL YEAH" added a "party implication" which shifted it's "thinking" into just correct enough latent space that it was able to actually hunt down some image somewhere in its training data of a truly full glass of wine.

I almost wonder if prompting it "similar to a full glass of beer" would get it shifted just enough.

Stevvo · 2025-03-25T21:15:50 1742937350

Can't replicate. Maybe the rollout is staggered? Using Plus from Europe, it's consistently giving me a half full glass.

amy_petrik · 2025-03-26T05:03:23 1742965403

I am using Plus from Australia, and while I am not getting a full glass, nor am I getting a half full glass. The glass I'm getting is half empty.

DonHopkins · 2025-03-26T09:53:39 1742982819

Surprised it isn't fully empty for being upside down!

bb88 · 2025-03-26T07:17:41 1742973461

That's funny. HN hates funny. Enjoy your shadowban.

BlobberSnobber · 2025-03-26T13:23:14 1742995394

Yeah. I understand that this site doesn’t want to become Reddit, but it really has an allergy to comedy, it’s sad. God forbid you use sarcasm, half the people here won’t understand it and the other half will say it’s not appropriate for healthy discussion…

Tempat · 2025-03-26T16:47:45 1743007665

Good example in this very discussion: https://news.ycombinator.com/item?id=43477003

bb88 · 2025-03-27T07:24:57 1743060297

I like this site, but it can become inhuman sometimes.

People get upvoted for pedantry rather than furthering a conversation, e.g.

coder543 · 2025-03-25T21:41:46 1742938906

Is it drawing the image from top to bottom very slowly over the course of at least 30 seconds? If not, then you're using DALL-E, not 4o image generation.

uh_uh · 2025-03-25T23:56:43 1742947003

This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.

cubefox · 2025-03-26T09:27:15 1742981235

Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.

thesparks · 2025-03-26T01:57:14 1742954234

apparently it's not diffusion, but tokens

WaxProlix · 2025-03-26T12:52:28 1742993548

Works for me as well https://chatgpt.com/share/67e3f838-63fc-8000-ab94-5d10626397...

USA, but VPN set to exit in Canada at time of request (I think).

raxxorraxor · 2025-03-26T07:54:21 1742975661

The EU got the drunken version. And a good drunk know not to top of a glass of wine ever. In that context the glass is already "full".

But aside from that it would only be comparable if would compare your prompts.

sionisrecur · 2025-03-25T21:41:25 1742938885

Maybe it's half empty.

EgregiousCube · 2025-03-26T04:38:35 1742963915

qingcharles · 2025-03-25T23:30:53 1742945453

You might still be on DALL-E. My account is if you use ChatGPT.

I switched over to the sora.com domain and now I have access to it.

cchance · 2025-03-26T02:32:30 1742956350

the free site even has it, just dont turn on image generation it works with it off, if you enable it it uses dall-e

eitland · 2025-03-26T06:22:21 1742970141

Most interesting thing to me is the spelling is correct.

I'm not a heavy user of AI or image generation in general, so is this also part of the new release or has this been fixed silently since last I tried?

widerporst · 2025-03-26T07:21:53 1742973713

It very much looks like a side effect of this new architecture. In my experience, text looks much better in recent DALL-E images (so what ChatGPT was using before), but it is still noticeably mangled when printing more than a few letters. This model update seems to improve text rendering by a lot, at least as long as the content is clearly specified.

However, when giving a prompt that requires the model to come up with the text itself, it still seems to struggle a bit, as can be seen in this hilarious example from the post: https://images.ctfassets.net/kftzwdyauwt9/21nVyfD2KFeriJXUNL...

remuskaos · 2025-03-26T08:29:12 1742977752

The periodic table is absolutely hilarious, I didn't know LLMs had finally mastered absurdist humor.

soco · 2025-03-26T10:40:57 1742985657

Yeah who wouldn't love a dip in the sulphur pool. But back to the question, why can't such a model recognize letters as such? It cannot be trained to pay special attention to characters? How come it can print an anatomically correct eye but not differentiate between P and Z?

londons_explore · 2025-03-26T12:10:44 1742991044

I think the model has not decided if it should print a P or a Z, so you end up with something halfway between the two.

It's a side effect of the entire model being differentiable - there is always some halfway point.

dghlsakjg · 2025-03-26T02:03:19 1742954599

The head of foam on that glass of wine is perfect!

ASalazarMX · 2025-03-26T03:07:28 1742958448

I think we're really fscked, because even AI image detectors think the images are genuine. They look great in Photoshop forensics too. I hope the arms race between generators and detectors doesn't stop here.

gloosx · 2025-03-26T06:26:19 1742970379

We're not. This PNG image of a wine glass has JPEG compression artefacts which are leaking from JPEG training data. You can zoom into the image and you will see 8x8 boundaries of the blocks used in JPEG compression, which just cannot be in a PNG. This is a common method to detect AI-generated image and it is working so far, no need for complex photoshop forensics or AI-detectors, just zoom-in and check for compression - current AI is incapable of getting it right – all the compression algorithms are mixed and mashed in the training data, so on the generated image you can find artefacts from almost all of them if you're lucky, but JPEG is prevalent obviously, lossless images are rare online.

ASalazarMX · 2025-03-28T18:43:41 1743187421

If JPEG compression is the only evident flaw, this kind of reinforces my point, as most of these images will end up shared as processed JPEG/WebP on social media.

gloosx · 2025-04-01T06:42:21 1743489741

You didn't get it. The image contains ALL compression artifacts from different algorithms mashed up in a single picture, the JPEG is just prevalent.

ASalazarMX · 2025-04-02T00:02:38 1743552158

Oh, I see. There's still room for reliable detection then.

londons_explore · 2025-03-26T12:12:15 1742991135

plenty of real PNG images have jpeg artifacts because they were once jpegs off someones phone...