The first ISWS 2025 Tutorial Session is on "Reasoning in Large Knowledge Graphs". Valentina Presutti is announcing Aidan Hogan from University of Chile, Santiago de Chile, as first speaker.
The first ISWS 2025 Tutorial Session is on "Reasoning in Large Knowledge Graphs". Valentina Presutti is announcing Aidan Hogan from University of Chile, Santiago de Chile, as first speaker.
"What the Apple paper shows, most fundamentally, regardless of how you define AGI, is that LLMs are no substitute for good well-specified conventional algorithms. (They also can’t play chess as well as conventional algorithms, can’t fold proteins like special-purpose neurosymbolic hybrids, can’t run databases as well as conventional databases, etc.)
In the best case (not always reached) they can write python code, supplementing their own weaknesses with outside symbolic code, but even this is not reliable. What this means for business and society is that you can’t simply drop o3 or Claude into some complex problem and expect it to work reliably.
Worse, as the latest Apple papers shows, LLMs may well work on your easy test set (like Hanoi with 4 discs) and seduce you into thinking it has built a proper, generalizable solution when it does not.
At least for the next decade, LLMs (with and without inference time “reasoning”) will continue have their uses, especially for coding and brainstorming and writing. And as Rao told me in a message this morning, “the fact that LLMs/LRMs don't reliably learn any single underlying algorithm is not a complete deal killer on their use. I think of LRMs basically making learning to approximate the unfolding of an algorithm over increasing inference lengths.” In some contexts that will be perfectly fine (in others not so much)."
Nice blog by Gary Marcus, for me personally this is not a blow, I don't see current AI as reasoning or thinking, it still will have enormous impact without these capabilities.
Also it's fascinating that models are able to solve some difficult math problems (that are not in their trainingset ) while they fail with seemingly simple other problems.
https://open.substack.com/pub/garymarcus/p/a-knockout-blow-for-llms?utm_campaign=post&utm_medium=web
#ai #llm #reasoning #thinking
@w4w3r Nein! Doch! Oh!
"Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal- ing properties, and limitations remain insufficiently understood. Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of compositional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, offering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles."
https://machinelearning.apple.com/research/illusion-of-thinking
Die Illusion des Denkens – und was sie für KI in der politischen Kommunikation bedeutet
Eine aktuelle Studie von Apple legt offen, was viele in der KI-Community ahnen: Large Reasoning Models (LRMs) simulieren oft nur das Denken – sie denken nicht wirklich.
Paper: The Illusion of Thinking → https://sqn.link/illusion-of-thinking
Researchers Warn Against Treating #AI Outputs as Human-Like #Reasoning - Slashdot
#Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this #anthropomorphization creates dangerous misconceptions about how these systems actually work
DeepSeek R1-0528 erklärt: Was kann das neue Open-Source-Wunder wirklich?
Schlauer denken mit 128k Tokens
CoT-Reasoning der Extraklasse
Open Source schlägt Kommerz
#ai #ki #artificialintelligence #deepseek #opensource #reasoning
Jetzt LIKEN, teilen, LESEN und FOLGEN! Schreib uns in den Kommentaren!
https://kinews24.de/deepseek-r1-0528-update-analyse-features-performance/
https://www.europesays.com/2106006/ Code Conversion, Reasoning, Visualization and Other LLMs for Science at Argonne – High-Performance Computing News Analysis #AI #AIForScience #ArgonneNationalLaboratory #ArtificialIntelligence #CodeConversion #Data #GenAI #GenrativeAI #HPCAI #LLMs #LLMsForScience #reasoning #Visualization
From #Nature ”A framework for evaluating the chemical #knowledge and #reasoning abilities of large language models against the expertise of chemists"
The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.
As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.
While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?
source: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf
»#Anthropic’s #Claude4 AI models are better at #coding and #reasoning: Anthropic says #Claude 4 worked autonomously for seven hours in customer tests.« https://www.theverge.com/news/672705/anthropic-claude-4-ai-ous-sonnet-availability?eicker.news #tech #media #news
https://www.europesays.com/de/126163/ Google I/O: KI-Abo für 250 US-Dollar und ein agentisches Gemini #Deutschland #Gemini #Germany #Google #GoogleGemini #GoogleI/O #IT #KünstlicheIntelligenz #Reasoning #Science #Science&Technology #Technik #Technology #Wissenschaft #Wissenschaft&Technik
Read the full guest article on page 3 (in German):
www.tu-darmstadt.de/media/daa_responsives_design/01_die_universitaet_medien/aktuelles_6/publikationen_km/hoch3/pdf/hoch3_2025_2.pdf
(2/2)
My daughter taught herself math—with no teacher, just space to think and room to fail.
Last week, an AI did the same. It started with a single line of code and taught itself reasoning.
We should all be asking:
Not just what kind of minds we’re building, but what kind of grace we give ourselves when we have to learn, change, and try again.
Weiterhin oder sogar in zunehmendem Maße eine Herausforderung: Die Unzuverlässigkeit von #LLM s #GenAi #GPT #OpenAI #Reasoning #Halluzination https://m.winfuture.de/news/150778
New from AI-Phi: Our latest Causerie on reasoning is live! We gathered to explore what reasoning means in AI—symbolic logic, LLMs, and the gray areas in between.
Join the conversation and dive into the highlights https://ai-phi.github.io/posts/causerie-on-reasoning/
Enhancing AI trustworthiness through automated reasoning: A novel method for explaining deep learning and LLM reasoning. ~ Julia Connolly, Oliver Stanton, Sarah Veronica, Liam Whitmore. https://www.researchgate.net/publication/390844466_Enhancing_AI_Trustworthiness_Through_Automated_Reasoning_A_Novel_Method_for_Explaining_Deep_Learning_and_LLM_Reasoning #LLMs #Reasoning #ITP