Model collapse: when AI systems trained on synthetic content begin degrading like photocopies of photocopies. Recent experiments show recursive training leads to semantic drift and loss of signal—errors accumulate slowly, precision decays steadily.

The real risk isn't technical fragility but institutional misalignment. When knowledge production prioritizes scale over truth, collapse becomes symptom of concentrated power.
#AIResearch #ModelCollapse

https://open.substack.com/pub/massimoflore/p/how-ai-is-transforming-our-search?r=4r9bmz&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Future Frontiers · May 25How AI Is Transforming Our Search for InformationBy Massimo Flore

**Stuart Longland (VK4MSL)** @stuartl@longlandclan.id.au · May 27

**Jens Bemme** @JensB@openbiblio.social · May 20

**thomkennon** @thomkennon@mastodon.social · Dec 28, 2024

**Tino Eberl** @tinoeberl@mastodon.online · Sep 10, 2024

Sep 10, 2024

Tino Eberl @tinoeberl@mastodon.online

#KINews

Die unkontrollierte Nutzung von KI-generierten Inhalten beim #Training kann zu irreversiblen Fehlern in Modellen wie #LLMs führen, genannt „#ModelCollapse“. Dadurch verschwinden seltene Inhalte, was die Fairness der Vorhersagen beeinträchtigt. Die langfristige Erhaltung echter menschlicher #Datenquellen ist entscheidend. Um dies zu gewährleisten, sollte die Herkunft von Online-Inhalten nachverfolgt werden.

#KI #Datenqualität #Science #Technologie #AI

https://tino-eberl.de/ki-news/model-collapse-die-gefahr-der-ki-ueberflutung-mit-maschinell-generierten-inhalten/

Tino Eberl · Jul 30, 2024Model Collapse: Die Gefahr der KI-Überflutung mit maschinell generierten Inhalten

More from

Tino Eberl

**Nemo_bis** @nemobis@mamot.fr · Aug 29, 2024

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Jul 27, 2024

Jul 27, 2024

Miguel Afonso Caetano @remixtures@tldr.nettime.org

#AI #GenerativeAI #LLMs #ModelCollapse #SyntheticData: "Stable diffusion revolutionized image creation from descriptive text. GPT-2 (ref. 1), GPT-3(.5) (ref. 2) and GPT-4 (ref. 3) demonstrated high performance across a variety of language tasks. ChatGPT introduced such language models to the public. It is now clear that generative artificial intelligence (AI) such as large language models (LLMs) is here to stay and will substantially change the ecosystem of online text and images. Here we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs). We build theoretical intuition behind the phenomenon and portray its ubiquity among all learned generative models. We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet."

https://www.nature.com/articles/s41586-024-07566-y

NatureAI models collapse when trained on recursively generated data - Nature Analysis shows that indiscriminately training generative artificial intelligence on real and generated content, usually done by scraping data from the Internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.

**Martin Hamilton** @m@martinh.net · Apr 1, 2024

Apr 1, 2024

Martin Hamilton @m@martinh.net

NEW: Rage Against The Machine Learning (deluxe edition)
https://martinh.bandcamp.com/

Here's what happens when you ask a hot new "AI" music generator to write some songs about deceptive AIs, lamenting billionaires, and catgirl hackers with their ThinkPads and geodesic domes.

#AI #ArtificialIntelligence #GenerativeAI

Recent searches

Search options

Administered by:

Server stats:

#modelcollapse