In a surprise move, NVIDIA is bringing CUDA to RISC-V CPUs
Announced at RISC-V Summit China , this allows RISC-V processors to run CUDA drivers + logic, with NVIDIA GPUs handling compute tasks
Enables open CPU + proprietary GPU AI systems—big for edge, HPC & China’s chipmakers
A potential shift in global AI infrastructure
By Anton Shilov for @TomsHardware - #NVIDIA #CUDA now supports #RISCV - is this a signal of broader ecosystem support?
#NVIDIA Bringing #CUDA To #RISCV
NVIDIA's drivers and CUDA software stack are predominantly supported on x86_64 and AArch64 systems but in the past was supported on IBM POWER. This week at the RISC-V Summit China event, NVIDIA's Frans Sijstermans announced that CUDA will be coming to RISC-V.
#AMD for their part with the upstream #opensource #AMDKFD kernel compute driver can already build on RISC-V and the #ROCm user-space components can also be built on RISC-V.
https://www.phoronix.com/news/NVIDIA-CUDA-Coming-To-RISC-V
Apple AI framework MLX: future support for Nvidia's CUDA
Although Nvidia GPUs no longer run in Macs, Apple's MLX will soon be running there too. This makes interesting ports possible.
Apple-KI-Framework MLX: Künftig Support für Nvidias CUDA
Zwar laufen in Macs keine Nvidia-GPUs mehr, dennoch soll Apples MLX nun bald auch dort laufen. Das macht interessante Portierungen möglich.
#GPUHammer is the first attack to show #Rowhammer bit flips on #GPU memories, specifically on a GDDR6 memory in an #NVIDIA A6000 GPU. Our attacks induce bit flips across all tested DRAM banks, despite in-DRAM defenses like TRR, using user-level #CUDA #code. These bit flips allow a malicious GPU user to tamper with another user’s data on the GPU in shared, time-sliced environments. In a proof-of-concept, we use these bit flips to tamper with a victim’s DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip. Enabling Error Correction Codes (ECC) can mitigate this risk, but ECC can introduce up to a 10% slowdown for #ML #inference workloads on an #A6000 GPU.
Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems
Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
New #ZLUDA 5 Preview Released For #CUDA On Non-NVIDIA #GPU
For now this ability to run unmodified CUDA apps on non-#NVIDIA GPUs is focused on #AMD GPUs of the #Radeon RX 5000 series and newer, which is AMD Radeon GPUs with #ROCm. Besides CUDA code samples, GeekBench has been one of the early targets for testing.
https://www.phoronix.com/news/ZLUDA-5-preview.43
Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing
#CUDA #Physics #MaterialsScience #CondensedMatter #MachineLearning #ML #Package
ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks
#ZLUDA Making Progress In 2025 On Bringing #CUDA To Non-NVIDIA #GPU
ZLUDA #opensource effort that started half-decade ago as drop-in CUDA implementation for #Intel GPUs and then for several years was funded by ##AMD as a CUDA implementation for #Radeon GPUs atop #ROCm and then open-sourced but then reverted has been continuing to push along a new path since last year. Current take on ZLUDA is a multi-vendor CUDA implementation for non-NVIDIA GPUs for #AI workloads & more.
https://www.phoronix.com/news/ZLUDA-Q2-2025-Update
Show HN: I built a tensor library from scratch in C++/CUDA
Link: https://github.com/nirw4nna/dsc
Discussion: https://news.ycombinator.com/item?id=44310678
HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration
Ask HN: How to learn CUDA to professional level
Discussion: https://news.ycombinator.com/item?id=44216123
Java developers are no longer limited by CPU cores!
This #InfoQ article explores how to bring GPU-level acceleration to enterprise Java using CUDA, with a practical JNI-based integration pattern, real-world use case, and performance benchmarks.
If you're tackling high-throughput challenges, see how to make Java truly parallel!
Read now: https://bit.ly/4kRGmD7
That's it, I'm going against AMD for recommending computers for #AI.
I don't even know how to start running something on their NPU via Linux, or check it's running at all. Windows fares better but it's `llama.cpp` doesn't work there.
So, if you want to run AI on your computer: RTX, Mac, or don't bother at all.
Performance of Confidential Computing GPUs