AMD ROCm technical blog covering AI, HPC, GPU software, vLLM, kernels, and performance optimization on AMD accelerators.
AMD ROCm Blogs · hardware · 2026-06-01
Score 13
Reinforcement learning (RL) is rapidly becoming a foundational technology for Large Language Models (LLMs)—powering key abilities such as reasoning and agentic behaviors. As RL workloads grow more complex and computationally intensive, the...
High signal Matched: performance, gpu, agentic
AMD ROCm Blogs · hardware · 2026-06-01
Score 11
This blog, like the previous articles in the profiling guide series (Part 1, Part 2, and Part 3), is designed to help you systematically analyze and improve the performance of your Fortran OpenMP offload applications running on AMD GPUs. T...
High signal Matched: performance
AMD ROCm Blogs · hardware · 2026-05-29
Score 29
Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...
High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model
AMD ROCm Blogs · hardware · 2026-05-29
Score 13
Quantum computing offers a fundamentally different approach to computational problems by leveraging quantum mechanical properties such as superposition and entanglement. Unlike a classical bit, which is always 0 or 1, a qubit can exist in...
High signal Matched: benchmark, cost, gpu
AMD ROCm Blogs · hardware · 2026-05-27
Score 17
Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves th...
High signal Matched: gemm, performance, fp8
AMD ROCm Blogs · hardware · 2026-05-25
Score 20
Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...
High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate
AMD ROCm Blogs · hardware · 2026-05-22
Score 30
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...
High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source
AMD ROCm Blogs · hardware · 2026-05-22
Score 18
On a single MI355, our most-optimized FP16 GEMM kernel runs at 99% MFMA efficiency — the matrix engine sits idle for a handful of cycles per loop. Getting there took ten versions, a regression along the way, and a profiler open for the who...
High signal Matched: kernel, gemm, performance
AMD ROCm Blogs · hardware · 2026-05-20
Score 14
AMD released ROCm Core 7.13, the AMD GPU Driver 31.30, and AMD GPU Virtualization 9.0. With these releases, ROCm software expands hardware support across enterprise datacenters. The platform introduces AMD’s latest Instinct accelerators, e...
High signal Matched: performance, gpu, rocm, open-source
AMD ROCm Blogs · hardware · 2026-05-20
Score 12
Large Language Models (LLMs) typically contain billions — or even tens of billions — of parameters. During inference, tensor parallelism is commonly employed to distribute the workload across multiple GPUs. This approach demands frequent,...
High signal Matched: inference, latency, introducing, quantization