eupolicy.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
This Mastodon server is a friendly and respectful discussion space for people working in areas related to EU policy. When you request to create an account, please tell us something about you.

Server stats:

217
active users

#arxiv

9 posts6 participants0 posts today

Important post on building a (dark, offline) mirror of Arxiv, by TIB, the Technische Informationsbibliothek, run by the Leibniz Information Centre for Science and Technology, in Hannover, Germany. The blog also explains the license / copyright challenges. And has some 🔥:

> “So when the Trump administration makes decisions that have fatal consequences for science and research in the US, the repercussions reach far beyond the Gulf of Mexico”

blog.tib.eu/2025/05/14/protect

TIB-Blog · Protecting Science: TIB builds Dark Archive for arXiv - TIB-Blog
More from Esther Tobschall

Today on the #arXiv:

Farnham et al. 2025, "High-Speed Boulders and the Debris Field in DART Ejecta" - arxiv.org/abs/2506.16694

#DARTMission science with predictions for the #HeraMission.

And demonstrating how things can go sideways in a collision.

arXiv logo
arXiv.orgHigh-Speed Boulders and the Debris Field in DART EjectaOn 26 September 2022 the Double Asteroid Redirection Test (DART) spacecraft collided with Dimorphos, the moon of the near-Earth asteroid 65803 Didymos, in a full-scale demonstration of a kinetic impactor concept. The companion LICIACube spacecraft documented the aftermath, capturing images of the expansion and evolution of the ejecta from 29 to 243 s after the impact. We present results from our analyses of these observations, including an improved reduction of the data and new absolute calibration, an updated LICIACube trajectory, and a detailed description of the events and phenomena that were recorded throughout the flyby. One notable aspect of the ejecta was the existence of clusters of boulders, up to 3.6 m in radius, that were ejected at speeds up to 52 m/s. Our analysis of the spatial distribution of 104 of these boulders suggests that they are likely the remnants of larger boulders shattered by the DART spacecraft in the first stages of the impact. The amount of momentum contained in these boulders is more than 3 times that of the DART spacecraft, and it is directed primarily to the south, almost perpendicular to the DART trajectory. Recoil of Dimorphos from the ejection of these boulders has the potential to change its orbital plane by up to a degree and to impart a non-principal axis component to its rotation state. Damping timescales for these phenomena are such that the Hera spacecraft, arriving at the system in 2026, should be able to measure these effects.

My first paper in the new group was published on #arXiv today!
We demonstrate high-fidelity operations on a #QuantumComputing platform that could potentially solve very complex problems involving electrons, by using atoms with the same properties (fermions).
The platform is built with single atoms in a lattice made of light. The operations are realised by letting two neighboring atoms interact. With our microscope, we can see the result of every single operation.

arxiv.org/abs/2506.14711

JavelinGuard: Low-Cost Transformer Architectures for LLM Security

arxiv.org/abs/2506.07330

arXiv.orgJavelinGuard: Low-Cost Transformer Architectures for LLM SecurityWe present JavelinGuard, a suite of low-cost, high-performance model architectures designed for detecting malicious intent in Large Language Model (LLM) interactions, optimized specifically for production deployment. Recent advances in transformer architectures, including compact BERT(Devlin et al. 2019) variants (e.g., ModernBERT (Warner et al. 2024)), allow us to build highly accurate classifiers with as few as approximately 400M parameters that achieve rapid inference speeds even on standard CPU hardware. We systematically explore five progressively sophisticated transformer-based architectures: Sharanga (baseline transformer classifier), Mahendra (enhanced attention-weighted pooling with deeper heads), Vaishnava and Ashwina (hybrid neural ensemble architectures), and Raudra (an advanced multi-task framework with specialized loss functions). Our models are rigorously benchmarked across nine diverse adversarial datasets, including popular sets like the NotInject series, BIPIA, Garak, ImprovedLLM, ToxicChat, WildGuard, and our newly introduced JavelinBench, specifically crafted to test generalization on challenging borderline and hard-negative cases. Additionally, we compare our architectures against leading open-source guardrail models as well as large decoder-only LLMs such as gpt-4o, demonstrating superior cost-performance trade-offs in terms of accuracy, and latency. Our findings reveal that while Raudra's multi-task design offers the most robust performance overall, each architecture presents unique trade-offs in speed, interpretability, and resource requirements, guiding practitioners in selecting the optimal balance of complexity and efficiency for real-world LLM security applications.

¿Dónde encontrar artículos científicos de acceso abierto? 🤔

#googleacadémico (Configura la opción de acceso abierto)
#doaj
#scielo
#pubmed Central (para biomedicina y ciencias de la salud).
#arxiv (ciencias exactas y computación).

💡Consejo: Muchos investigadores suben sus artículos en ResearchGate o Academia.edu, aunque estas no son plataformas completamente abiertas.

From tokens to thoughts: How LLMs and humans trade compression for meaning

arxiv.org/abs/2505.17117

arXiv.orgFrom Tokens to Thoughts: How LLMs and Humans Trade Compression for MeaningHumans organize knowledge into compact categories through semantic compression by mapping diverse instances to abstract representations while preserving meaning (e.g., robin and blue jay are both birds; most birds can fly). These concepts reflect a trade-off between expressive fidelity and representational simplicity. Large Language Models (LLMs) demonstrate remarkable linguistic abilities, yet whether their internal representations strike a human-like trade-off between compression and semantic fidelity is unclear. We introduce a novel information-theoretic framework, drawing from Rate-Distortion Theory and the Information Bottleneck principle, to quantitatively compare these strategies. Analyzing token embeddings from a diverse suite of LLMs against seminal human categorization benchmarks, we uncover key divergences. While LLMs form broad conceptual categories that align with human judgment, they struggle to capture the fine-grained semantic distinctions crucial for human understanding. More fundamentally, LLMs demonstrate a strong bias towards aggressive statistical compression, whereas human conceptual systems appear to prioritize adaptive nuance and contextual richness, even if this results in lower compressional efficiency by our measures. These findings illuminate critical differences between current AI and human cognitive architectures, guiding pathways toward LLMs with more human-aligned conceptual representations.

Beyond the Black Box: Interpretability of LLMs in Finance

arxiv.org/abs/2505.24650

arXiv.orgBeyond the Black Box: Interpretability of LLMs in FinanceLarge Language Models (LLMs) exhibit remarkable capabilities across a spectrum of tasks in financial services, including report generation, chatbots, sentiment analysis, regulatory compliance, investment advisory, financial knowledge retrieval, and summarization. However, their intrinsic complexity and lack of transparency pose significant challenges, especially in the highly regulated financial sector, where interpretability, fairness, and accountability are critical. As far as we are aware, this paper presents the first application in the finance domain of understanding and utilizing the inner workings of LLMs through mechanistic interpretability, addressing the pressing need for transparency and control in AI systems. Mechanistic interpretability is the most intuitive and transparent way to understand LLM behavior by reverse-engineering their internal workings. By dissecting the activations and circuits within these models, it provides insights into how specific features or components influence predictions - making it possible not only to observe but also to modify model behavior. In this paper, we explore the theoretical aspects of mechanistic interpretability and demonstrate its practical relevance through a range of financial use cases and experiments, including applications in trading strategies, sentiment analysis, bias, and hallucination detection. While not yet widely adopted, mechanistic interpretability is expected to become increasingly vital as adoption of LLMs increases. Advanced interpretability tools can ensure AI systems remain ethical, transparent, and aligned with evolving financial regulations. In this paper, we have put special emphasis on how these techniques can help unlock interpretability requirements for regulatory and compliance purposes - addressing both current needs and anticipating future expectations from financial regulators globally.

LLMs replacing human participants harmfully misportray, flatten identity groups

arxiv.org/abs/2402.01908

arXiv.orgLarge language models that replace human participants can harmfully misportray and flatten identity groupsLarge language models (LLMs) are increasing in capability and popularity, propelling their application in new domains -- including as replacements for human participants in computational social science, user testing, annotation tasks, and more. In many settings, researchers seek to distribute their surveys to a sample of participants that are representative of the underlying human population of interest. This means in order to be a suitable replacement, LLMs will need to be able to capture the influence of positionality (i.e., relevance of social identities like gender and race). However, we show that there are two inherent limitations in the way current LLMs are trained that prevent this. We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups, then empirically show this on 4 LLMs through a series of human studies with 3200 participants across 16 demographic identities. We also discuss a third limitation about how identity prompts can essentialize identities. Throughout, we connect each limitation to a pernicious history of epistemic injustice against the value of lived experiences that explains why replacement is harmful for marginalized demographic groups. Overall, we urge caution in use cases where LLMs are intended to replace human participants whose identities are relevant to the task at hand. At the same time, in cases where the benefits of LLM replacement are determined to outweigh the harms (e.g., the goal is to supplement rather than fully replace, engaging human participants may cause them harm), we provide inference-time techniques that we empirically demonstrate do reduce, but do not remove, these harms.

Superhuman performance of an LLM on the reasoning tasks of a physician

arxiv.org/abs/2412.10849

arXiv.orgSuperhuman performance of a large language model on the reasoning tasks of a physicianA seminal paper published by Ledley and Lusted in 1959 introduced complex clinical diagnostic reasoning cases as the gold standard for the evaluation of expert medical computing systems, a standard that has held ever since. Here, we report the results of a physician evaluation of a large language model (LLM) on challenging clinical cases against a baseline of hundreds of physicians. We conduct five experiments to measure clinical reasoning across differential diagnosis generation, display of diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, all adjudicated by physician experts with validated psychometrics. We then report a real-world study comparing human expert and AI second opinions in randomly-selected patients in the emergency room of a major tertiary academic medical center in Boston, MA. We compared LLMs and board-certified physicians at three predefined diagnostic touchpoints: triage in the emergency room, initial evaluation by a physician, and admission to the hospital or intensive care unit. In all experiments--both vignettes and emergency room second opinions--the LLM displayed superhuman diagnostic and reasoning abilities, as well as continued improvement from prior generations of AI clinical decision support. Our study suggests that LLMs have achieved superhuman performance on general medical diagnostic and management reasoning, fulfilling the vision put forth by Ledley and Lusted, and motivating the urgent need for prospective trials.