The eupolicy.social admin @admin

8 posts8 participants2 posts today

**isws** @isws@sigmoid.social · 16m

The first ISWS 2025 Tutorial Session is on "Reasoning in Large Knowledge Graphs". Valentina Presutti is announcing Aidan Hogan from University of Chile, Santiago de Chile, as first speaker.

Valentina Presutti announcing Aidan Hogan, both standing next to each other in front of the projection screen. The audience sitting in front of them in the Fresco room at Bertinoro Castle

#isws2025 #semanticweb #semweb

**José A. Alonso** @Jose_A_Alonso@mathstodon.xyz · 35m

35m

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

Readings shared June 8, 2025. https://jaalonso.github.io/vestigium/posts/2025/06/08-readings_shared_06-08-25 #AI #Algorithms #CAS #FunctionalProgramming #Haskell #ITP #LLMs #LeanProver #Math #Maxima #Reasoning

Vestigium · 1dReadings shared June 8, 2025The readings shared in Bluesky on 8 June 2025 are The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. ~ Parshin Shojaee et al

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · 20h

20h

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"What the Apple paper shows, most fundamentally, regardless of how you define AGI, is that LLMs are no substitute for good well-specified conventional algorithms. (They also can’t play chess as well as conventional algorithms, can’t fold proteins like special-purpose neurosymbolic hybrids, can’t run databases as well as conventional databases, etc.)

In the best case (not always reached) they can write python code, supplementing their own weaknesses with outside symbolic code, but even this is not reliable. What this means for business and society is that you can’t simply drop o3 or Claude into some complex problem and expect it to work reliably.

Worse, as the latest Apple papers shows, LLMs may well work on your easy test set (like Hanoi with 4 discs) and seduce you into thinking it has built a proper, generalizable solution when it does not.

At least for the next decade, LLMs (with and without inference time “reasoning”) will continue have their uses, especially for coding and brainstorming and writing. And as Rao told me in a message this morning, “the fact that LLMs/LRMs don't reliably learn any single underlying algorithm is not a complete deal killer on their use. I think of LRMs basically making learning to approximate the unfolding of an algorithm over increasing inference lengths.” In some contexts that will be perfectly fine (in others not so much)."

https://garymarcus.substack.com/p/a-knockout-blow-for-llms

Marcus on AI · 1dA knockout blow for LLMs?By Gary Marcus

#AI #GenerativeAI #AGI

**Erik Jonker** @ErikJonker@mastodon.social · 22h

22h

Erik Jonker @ErikJonker@mastodon.social

Nice blog by Gary Marcus, for me personally this is not a blow, I don't see current AI as reasoning or thinking, it still will have enormous impact without these capabilities.
Also it's fascinating that models are able to solve some difficult math problems (that are not in their trainingset ) while they fail with seemingly simple other problems.
https://open.substack.com/pub/garymarcus/p/a-knockout-blow-for-llms?utm_campaign=post&utm_medium=web
#ai #llm #reasoning #thinking

Marcus on AI · 1dA knockout blow for LLMs?By Gary Marcus

1d

David @npub1cfhh50298407nqc9pf2ahdn5dcxuxkzhpextg07rrv49tzsyzz5sq7khav@momostr.pink

"By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse."

https://machinelearning.apple.com/research/illusion-of-thinking

Apple Machine Learning ResearchThe Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem ComplexityRecent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…

#AI #LLM #Reasoning

Replied to Carsten Wawer

**Jörg Seidel** @lostgen@det.social · 2d

Jörg Seidel @lostgen@det.social

@w4w3r Nein! Doch! Oh!

#KI #Demokratie #PolitikUndDigitalisierung

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · 2d

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal- ing properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles."

https://machinelearning.apple.com/research/illusion-of-thinking

#AI #GenerativeAI #LLMs

**Carsten Wawer** @w4w3r@mastodon.social · 2d

Carsten Wawer @w4w3r@mastodon.social

Die Illusion des Denkens – und was sie für KI in der politischen Kommunikation bedeutet

Eine aktuelle Studie von Apple legt offen, was viele in der KI-Community ahnen: Large Reasoning Models (LRMs) simulieren oft nur das Denken – sie denken nicht wirklich.

Paper: The Illusion of Thinking → https://sqn.link/illusion-of-thinking

#KI #Demokratie #PolitikUndDigitalisierung

**PrivacyDigest** @PrivacyDigest@mas.to · May 29

May 29

PrivacyDigest @PrivacyDigest@mas.to

Researchers Warn Against Treating #AI Outputs as Human-Like #Reasoning - Slashdot

#Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this #anthropomorphization creates dangerous misconceptions about how these systems actually work

https://tech.slashdot.org/story/25/05/29/1411236/researchers-warn-against-treating-ai-outputs-as-human-like-reasoning?utm_source=rss1.0mainlinkanon&utm_medium=feed

tech.slashdot.orgResearchers Warn Against Treating AI Outputs as Human-Like Reasoning - SlashdotArizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this anthropomorphization creates dangerous misconceptions about how these systems actually work. The resea...

**KINEWS24** @KiNews@mastodon.social · May 29

May 29

KINEWS24 @KiNews@mastodon.social

DeepSeek R1-0528 erklärt: Was kann das neue Open-Source-Wunder wirklich?

Schlauer denken mit 128k Tokens
CoT-Reasoning der Extraklasse
Open Source schlägt Kommerz

#ai #ki #artificialintelligence #deepseek #opensource #reasoning

Jetzt LIKEN, teilen, LESEN und FOLGEN! Schreib uns in den Kommentaren!

https://kinews24.de/deepseek-r1-0528-update-analyse-features-performance/

GIF

**Europe Says** @europesays@pubeurope.com · May 24

May 24

Europe Says @europesays@pubeurope.com

https://www.europesays.com/2106006/ Code Conversion, Reasoning, Visualization and Other LLMs for Science at Argonne – High-Performance Computing News Analysis #AI #AIForScience #ArgonneNationalLaboratory #ArtificialIntelligence #CodeConversion #Data #GenAI #GenrativeAI #HPCAI #LLMs #LLMsForScience #reasoning #Visualization

Code Conversion, Reasoning, Visualization and Other LLMs for Science at Argonne - High-Performance Computing News Analysis

**Jesus Castagnetto** @jmcastagnetto@mastodon.social · May 23

May 23

Jesus Castagnetto @jmcastagnetto@mastodon.social

From #Nature ”A framework for evaluating the chemical #knowledge and #reasoning abilities of large language models against the expertise of chemists"

#AI #LLM #Chemistry #Science

https://www.nature.com/articles/s41557-025-01815-x

NatureA framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists - Nature ChemistryLarge language models are increasingly used for diverse tasks, yet we have limited insight into their understanding of chemistry. Now ChemBench—a benchmarking framework containing more than 2,700 question–answer pairs—has been developed to assess their chemical knowledge and reasoning, revealing that the best models surpass human chemists on average but struggle with some basic tasks.

**Thomas Renkert** @thomasrenkert@hcommons.social · May 23

May 23

Thomas Renkert @thomasrenkert@hcommons.social

The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.

As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.

While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?

source: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf

If CoT pressures are used to improve agent capabilities or alignment, there may be no alternative approach to yield the same improvements without degrading monitorability. In the worst case, where the agent learns to fully obscure its intent in its CoT, we ultimately revert to the same model safety conditions that existed prior to the emergence of reasoning models and must rely on monitoring activations, monitoring potentially adversarial CoTs and outputs, or improved alignment methods. Model actions and outputs continue to grow in complexity, and we’ve demonstrated that even at their current capabilities, models can execute sophisticated reward hacks. In the near future, these models may become either too advanced or too widely used for humans to manually oversee. For this reason, we believe that CoT monitoring may become one of few tools we have for identifying latent misalignments, and it may be required that we avoid strong chain-of-thought optimization pressures entirely to guarantee we do not inadvertently cause CoT obfuscation.

**eicker.news ᳇ tech news** @technews@eicker.news · May 23

May 23

eicker.news ᳇ tech news @technews@eicker.news

»#Anthropic’s #Claude4 AI models are better at #coding and #reasoning: Anthropic says #Claude 4 worked autonomously for seven hours in customer tests.« https://www.theverge.com/news/672705/anthropic-claude-4-ai-ous-sonnet-availability?eicker.news #tech #media #news

The Verge · May 22Anthropic’s Claude 4 AI models are better at coding and reasoningBy Jess Weatherbed

**Deutschland** @de@pubeurope.com · May 20

May 20

Deutschland @de@pubeurope.com

https://www.europesays.com/de/126163/ Google I/O: KI-Abo für 250 US-Dollar und ein agentisches Gemini #Deutschland #Gemini #Germany #Google #GoogleGemini #GoogleI/O #IT #KünstlicheIntelligenz #Reasoning #Science #Science&Technology #Technik #Technology #Wissenschaft #Wissenschaft&Technik

Continued thread

**UKP Lab** @UKPLab@sigmoid.social · May 15

May 15

UKP Lab @UKPLab@sigmoid.social

Read the full guest article on page 3 (in German):
www.tu-darmstadt.de/media/daa_responsives_design/01_die_universitaet_medien/aktuelles_6/publikationen_km/hoch3/pdf/hoch3_2025_2.pdf

(2/2)

#UKPLab #LLMs #Reasoning

**Ted** @Ted@tschopp.net · May 12

May 12

Ted @Ted@tschopp.net

My daughter taught herself math—with no teacher, just space to think and room to fail.

Last week, an AI did the same. It started with a single line of code and taught itself reasoning.

We should all be asking:
Not just what kind of minds we’re building, but what kind of grace we give ourselves when we have to learn, change, and try again.

#Reasoning #AIethics #LifelongLearning

https://tedt.org/The-Student-Who-Taught-Herself/

A young girl sits at a desk with a pencil and blank paper, surrounded by soft orange light. A computer screen glows beside her, symbolizing both human and machine learning.

**Bäda** @Moosbeda@mastodon.social · May 9

May 9

Bäda @Moosbeda@mastodon.social

Weiterhin oder sogar in zunehmendem Maße eine Herausforderung: Die Unzuverlässigkeit von #LLM s #GenAi #GPT #OpenAI #Reasoning #Halluzination https://m.winfuture.de/news/150778

WinFuture.de · May 7ChatGPT halluziniert immer mehr und OpenAI weiß nicht, warumBy Witold Pryjda

**AI-Phi** @ai_phi@mastodon.social · May 1

May 1

AI-Phi @ai_phi@mastodon.social

New from AI-Phi: Our latest Causerie on reasoning is live! We gathered to explore what reasoning means in AI—symbolic logic, LLMs, and the gray areas in between.

Join the conversation and dive into the highlights https://ai-phi.github.io/posts/causerie-on-reasoning/

ai-phi.github.io · Apr 30Our Causerie on ReasoningThe output from our first causerie on the topic of reasoning.

#AI #Philosophy #Reasoning

**José A. Alonso** @Jose_A_Alonso@mathstodon.xyz · Apr 21

Apr 21

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

Enhancing AI trustworthiness through automated reasoning: A novel method for explaining deep learning and LLM reasoning. ~ Julia Connolly, Oliver Stanton, Sarah Veronica, Liam Whitmore. https://www.researchgate.net/publication/390844466_Enhancing_AI_Trustworthiness_Through_Automated_Reasoning_A_Novel_Method_for_Explaining_Deep_Learning_and_LLM_Reasoning #LLMs #Reasoning #ITP

Recent searches

Search options

Administered by:

Server stats:

#reasoning