eupolicy.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
This Mastodon server is a friendly and respectful discussion space for people working in areas related to EU policy. When you request to create an account, please tell us something about you.

Server stats:

215
active users

#generativeAI

41 posts31 participants6 posts today

"Not so long ago, you would be right to question why a seemingly innocuous-looking free “flashlight” or “calculator” app in the app store would try to request access to your contacts, photos, and even your real-time location data. These apps may not need that data to function, but they will request it if they think they can make a buck or two by monetizing your data.

These days, AI isn’t all that different.

Take Perplexity’s latest AI-powered web browser, Comet, as an example. Comet lets users find answers with its built-in AI search engine and automate routine tasks, like summarizing emails and calendar events.

In a recent hands-on with the browser, TechCrunch found that when Perplexity requests access to a user’s Google Calendar, the browser asks for a broad swath of permissions to the user’s Google Account, including the ability to manage drafts and send emails, download your contacts, view and edit events on all of your calendars, and even the ability to take a copy of your company’s entire employee directory.

Perplexity says much of this data is stored locally on your device, but you’re still granting the company rights to access and use your personal information, including to improve its AI models for everyone else.

Perplexity isn’t alone in asking for access to your data. There is a trend of AI apps that promise to save you time by transcribing your calls or work meetings, for example, but which require an AI assistant to access your real-time private conversations, your calendars, contacts, and more. Meta, too, has been testing the limits of what its AI apps can ask for access to, including tapping into the photos stored in a user’s camera roll that haven’t been uploaded yet."

techcrunch.com/2025/07/19/for-

TechCrunch · For privacy and security, think twice before granting AI access to your personal data | TechCrunchAI chatbots, assistants and agents are increasingly asking for gross levels of access to your personal data under the guise of needing your information to make them work.

"Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.

Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks.

However, NIAH is fundamentally a simple retrieval task, in which a known sentence (the “needle”) is placed in a long document of unrelated text (the “haystack”), and the model is prompted to retrieve it. While scalable, this benchmark typically assesses direct lexical matching, which may not be representative of flexible, semantically oriented tasks.

We extend the standard NIAH task, to investigate model behavior in previously underexplored settings. We examine the effects of needles with semantic, rather than direct lexical matches, as well as the effects of introducing variations to the haystack content.
(...)
We demonstrate that even under these minimal conditions, model performance degrades as input length increases, often in surprising and non-uniform ways. Real-world applications typically involve much greater complexity, implying that the influence of input length may be even more pronounced in practice."

research.trychroma.com/context

research.trychroma.comContext Rot: How Increasing Input Tokens Impacts LLM Performance

"Despite promising results on synthetic benchmarks (e.g. Vending-Bench, SpreadsheetBench, DSBench), frontier models consistently underperform once they are deployed in complex, real-world situations.

To test this, we introduce AccountingBench, which measures models’ ability to “close the books” for a real business. This evaluation is built from 1 year of financial data from a real SaaS business producing millions of dollars in revenue, with a human expert baseline by a CPA to compare with.

Current frontier models excel at tasks that don't change the underlying environment: answering questions, writing code, researching sources. However, it remains unclear how well these capabilities translate to "butterfly" tasks where each action has lasting consequences, and errors compound over time.

In AccountingBench, while the strongest models are as successful as a human expert accountant in the initial months – they produce incoherent results on longer time horizons.

O3, O4-Mini and 2.5 Pro were unable to close 1 month of books, giving up partway through. Grok 4 and Claude 4 tend to perform well initially (within 1% of CPA baselines), but accumulate material errors over time.

"Closing the books" means ensuring that a business's internal financial records (i.e. “books”) accurately reflects external reality (what the bank actually says you have, what customers actually owe you, what you really owe vendors, etc.) across every single financial account owned by the company.

This is a mind-numbing, tedious task that is regularly performed by tens of millions of accountants worldwide, with potentially dire consequences (ranging from monetary losses to insolvency and, in some cases, prison) if done incorrectly – a perfect candidate for benchmarking frontier model capabilities."

accounting.penrose.com/

accounting.penrose.comCan LLMs Do Accounting? | PenroseAn experiment exploring whether frontier models can close the books for a real SaaS company.

Wix Adds Chaos to CI/CD Pipelines with AI and Improves Reliability

Cloud-based web development service Wix has written about a new approach to integrating artificial intelligence into continuous integration…
#NewsBeep #News #US #USA #UnitedStates #UnitedStatesOfAmerica #Artificialintelligence #AI #ArtificialIntelligence #ContinuousIntegration #DevOps #GenerativeAI #LargeLanguageModels #Technology #wixchaosaicicdpipelines
newsbeep.com/us/23576/

"Simon Willison has a plan for the end of the world. It’s a USB stick, onto which he has loaded a couple of his favorite open-weight LLMs—models that have been shared publicly by their creators and that can, in principle, be downloaded and run with local hardware. If human civilization should ever collapse, Willison plans to use all the knowledge encoded in their billions of parameters for help. “It’s like having a weird, condensed, faulty version of Wikipedia, so I can help reboot society with the help of my little USB stick,” he says.

But you don’t need to be planning for the end of the world to want to run an LLM on your own device. Willison, who writes a popular blog about local LLMs and software development, has plenty of compatriots: r/LocalLLaMA, a subreddit devoted to running LLMs on your own hardware, has half a million members.
For people who are concerned about privacy, want to break free from the control of the big LLM companies, or just enjoy tinkering, local models offer a compelling alternative to ChatGPT and its web-based peers.

The local LLM world used to have a high barrier to entry: In the early days, it was impossible to run anything useful without investing in pricey GPUs. But researchers have had so much success in shrinking down and speeding up models that anyone with a laptop, or even a smartphone, can now get in on the action. “A couple of years ago, I’d have said personal computers are not powerful enough to run the good models. You need a $50,000 server rack to run them,” Willison says. “And I kept on being proved wrong time and time again.”"

technologyreview.com/2025/07/1

MIT Technology Review · How to run an LLM on your laptopBy Grace Huckins

Generative AI is manufactured individuality.

Industrialized authenticity. Fantasized rage against the system.

It's autonomy perverted into dependency, egocentrism, cruelty and isolation.

It's the ultimate tool of mass control, the final divide and conquer. Opium of the people, at global scale.

If you want to use that on children education, you're raising drones

"The human version of reading involves finitude. It was thrilling to discover how much I could read, and studying for the exams put me on the path to becoming one of those people who’s “read everything.” Yet, even as I made my way through a substantial part of the canon, I couldn’t help noticing that I was reading only a small portion of what existed. The library at my university was comically vast, with many underground levels, and deep in the stacks the lights flickered on to reveal whole shelves of books that I doubted anyone had read, at least not anytime recently. And today, looking back, another kind of limitation reveals itself: memory.
(...)
Does A.I. fundamentally challenge these limitations? It’s certainly possible to imagine that intelligent reading machines will help us find value in texts that would otherwise go unread. (The process could be a little like fossil-fuel extraction: old, specialized, or difficult writing could be utilized, in condensed form, to power new thinking.) And there could also be scenarios in which L.L.M.s extend and deepen our reading memories. If I’d studied for my exams with an A.I. by my side, and then kept discussing my reading with that same A.I. year after year, I might build something like a living commonplace book, a thinking diary. As it happens, however, I’ve been blessed with a human conversational partner—my wife, who was in my graduate program, too. Our relationship has been shaped by our reading. Artificial intelligence, in itself, is unmotivated; it reads, but is not a reader; its “interests,” at any given time, depend fundamentally on the questions it’s asked. And so its usefulness as a reading tool depends on the existence of a culture of reading which it can’t embody or perpetuate."

newyorker.com/culture/open-que

The New Yorker · What’s Happening to Reading?By Joshua Rothman

"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.

This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.

Why this matters (though it's probably obvious?)
Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.

It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."

lesswrong.com/posts/dqd54wpEfj

www.lesswrong.comxAI's Grok 4 has no meaningful safety guardrails — LessWrongThis article includes descriptions of content that some users may find distressing. …

"War. Climate change. Unemployment. Against these headline-dominating issues, AI still feels like a gimmick to many. Yet experts warn that AI will reshape all of these issues and more - to say nothing of potential changes to our work and relationships. The question is: do people see the connection? What will make them care?

This research is the first large-scale effort to answer those questions. We polled 10,000 people across the U.S., U.K., France, Germany, and Poland to understand how AI fits into their broader hopes and fears for the future
(...)
The truth is that people are concerned that AI will worsen almost everything about their daily lives, from relationships and mental health to employment and democracy. They’re not concerned about “AI” as a concept; they’re concerned about what it will do to the things they already care about most.
(...)
People continue to rank AI low in their list of overall concerns. But we have discovered that there is a strong latent worry about AI risks, because people believe AI will make almost everything they care about worse.

This concern is not even. Rather, it plays into existing societal divisions, with women, lower-income and minority respondents most concerned about AI risks.

When it comes to what we worry about when we worry about AI, we have found that concern to be evolving rapidly. People worry most about relationships, more even than about their jobs.

People don't perceive AI as a catastrophic risk like war or climate change; though 1 in 3 are worried that AI might pursue its own goals outside our control, this is actually a lower proportion than
some surveys found for the same question two years ago.

Instead, our respondents see AI as a pervasive influence that modifies risk in a host of other areas, with concern about specific harms on the rise."

report2025.seismic.org/

@51north/project-template-nuxtSeismic Report 2025 | Seismic FoundationIn this report, through original research, we show how public opinion about AI is changing.

"A California federal judge ruled on Thursday that three authors suing artificial intelligence startup Anthropic for copyright infringement can represent writers nationwide whose books Anthropic allegedly pirated to train its AI system.

U.S. District Judge William Alsup said the authors can bring a class action, opens new tab on behalf of all U.S. writers whose works Anthropic allegedly downloaded from "pirate libraries" LibGen and PiLiMi to create a repository of millions of books in 2021 and 2022.

Alsup said Anthropic may have illegally downloaded as many as 7 million books from the pirate websites, which could make it liable for billions of dollars in damages if the authors' case is successful.

An Anthropic spokesperson said the company was considering options to challenge the ruling and that the court failed to account for the difficulty of establishing copyright ownership "millions of times over in a single lawsuit." An attorney for the authors declined to comment on the decision."

reuters.com/legal/government/u

Read books. Read academic articles written by human beings. Read long-length blog posts written by people of flesh and blood. Ultimately. when it comes to scientific research there can be no shortcuts to the real hard work of really absorbing the meaning behind the text.

"Even if individual scientists benefit from adopting AI, it doesn’t mean science as a whole will benefit. When thinking about the macro effects, we are dealing with a complex system with emergent properties. That system behaves in surprising ways because it is not a market. It is better than markets at some things, like rewarding truth, but worse at others, such as reacting to technological shocks. So far, on balance, AI has been an unhealthy shock to science, stretching many of its processes to the breaking point.

Any serious attempt to forecast the impact of AI on science must confront the production-progress paradox. The rate of publication of scientific papers has been growing exponentially, increasing 500 fold between 1900 and 2015. But actual progress, by any available measure, has been constant or even slowing. So we must ask how AI is impacting, and will impact, the factors that have led to this disconnect.

Our analysis in this essay suggests that AI is likely to worsen the gap. This may not be true in all scientific fields, and it is certainly not a foregone conclusion. By carefully and urgently taking actions such as those we suggest below, it may be possible to reverse course. Unfortunately, AI companies, science funders, and policy makers all seem oblivious to what the actual bottlenecks to scientific progress are. They are simply trying to accelerate production, which is like adding lanes to a highway when the slowdown is actually caused by a toll booth. It’s sure to make things worse."

aisnakeoil.com/p/could-ai-slow

AI Snake Oil · Could AI slow science?By Sayash Kapoor

"If A.I. companions could truly fulfill their promise—banishing the pain of loneliness entirely—the result might feel blissful, at least at first. But would it make us better? In “A Biography of Loneliness,” the cultural historian Fay Alberti sees value in at least the fleeting kind of loneliness that you encounter during life transitions—“moving away to university, changing jobs, getting divorced.” It can, she says, “be a spur to personal growth, a way of figuring out what one wants in relationships with others.” The psychologist Clark Moustakas, in “Loneliness,” takes the condition to be “an experience of being human which enables the individual to sustain, extend, and deepen his humanity.”

Most obviously, loneliness could go the way of boredom. I’m old enough to remember when feeling bored was just a fact of life. Late at night, after the television stations signed off, you were on your own, unless you had a good book or a companion around. These days, boredom still visits—on planes without Wi-Fi; in long meetings—but it’s rare. Our phones are never far, and the arsenal of distractions has grown bottomless: games, podcasts, text threads, and the rest.

This is, in some ways, an obvious improvement. After all, no one misses being bored. At the same time, boredom is a kind of internal alarm, letting us know that something in our environment—or perhaps in ourselves—has gone missing. Boredom prompts us to seek out new experiences, to learn, to invent, to build
(...)
In a similar way, loneliness isn’t just an affliction to be cured but an experience that can shape us for the better."

newyorker.com/magazine/2025/07

The New Yorker · A.I. Is About to Solve Loneliness. That’s a ProblemBy Paul Bloom