AI models collapse when trained on recursively generated data

floofloof@lemmy.ca · 3 months ago

AI models collapse when trained on recursively generated data

BombOmOm@lemmy.world · 3 months ago

Yep. It leads to a positive feedback loop. They just continue to self-reinforce whatever came out before.

And with increasing amounts of the internet being polluted with AI text output…

Ensign_Crab@lemmy.world · 3 months ago

… AI inbreeding.

skillissuer@discuss.tchncs.de · 3 months ago

hapsburgGPT

Boozilla@lemmy.world · 3 months ago

We call it the GRRM model.

Sibbo@sopuli.xyz · 3 months ago

In the USA, they call it the AlaLlama model.

bionicjoey@lemmy.ca · 3 months ago

GPTargaryen

sp3tr4l@lemmy.zip · 3 months ago

What about the Grrr! model after that astoundingly XD So Random! thing from Invader Zim?

He’s an android or robot, right?

LilaOrchidee@feddit.org · edit-2 2 months ago

deleted by creator

MagicShel@programming.dev · 3 months ago

That seems so obviously predictable.

kevincox@lemmy.ml · 3 months ago

To be fair this doesn’t sound much different than your average human using the internet.

sp3tr4l@lemmy.zip · 3 months ago

2024, Reverse Turing Test Challenge:

Can an LLM AI differentiate between human input and LLM AI input?

Even_Adder@lemmy.dbzer0.com · edit-2 3 months ago

You have to pretty much intentionally give it enough synthetic data to wreck it. OpenAI and Anthropic train their models on generated data to improve them. As long as there’s supervision during training, which there always will be, this isn’t really a problem.

https://openai.com/index/prover-verifier-games-improve-legibility/

https://www.anthropic.com/research/claude-character

Tobberone@lemm.ee · 3 months ago

Well… Its built on statistics and statistical inference will return to the mean eventually. If all it ever gets to train on is closer and closer to the mean, there will be nothing left to work with. It will all be the average…