NVIDIA Technical Blog · hardware · 2026-05-26
Score 21
Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based...
High signal Matched: cuda, performance, gpu
NVIDIA Technical Blog · hardware · 2026-05-26
Score 21
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
High signal Matched: cuda, performance, gpu, launch
PyTorch Foundation · open-source · 2026-05-19
Score 8
TLDR: PyTorch 2.11 makes it possible to install CUDA-enabled PyTorch wheels on aarch64 Linux directly from PyPI, eliminating the need for custom package indexes and workarounds that previously complicated deployment...
High signal Matched: cuda
PyTorch Foundation · open-source · 2026-05-14
Score 12
We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due...
High signal Matched: cuda, release
NVIDIA Technical Blog · hardware · 2026-04-30
Score 20
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...
High signal Matched: kernel, cuda, gpu, model, agents
Nota AI · korea · 2026-04-22
Score 54
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
NVIDIA Technical Blog · hardware · 2026-04-14
Score 18
When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to...
High signal Matched: cuda, performance, gpu
NVIDIA Technical Blog · hardware · 2026-04-01
Score 12
Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it's also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1...
High signal Matched: cuda
Nota AI · korea · 2026-03-23
Score 42
Jaehoon Lee Technical Content Manager, Nota AI GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...
High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source
Hugging Face · open-source · 2026-01-28
Score 10
No feed summary available yet.
High signal Matched: cuda
Modular · inference-infra · 2026-01-14
Score 18
How to Beat Unsloth's CUDA Kernel Using Mojo—With Zero GPU Experience
High signal Matched: kernel, cuda, gpu
Nota AI · korea · 2025-12-19
Score 74
Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...
High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval
vLLM Project · open-source · 2025-12-03
Score 16
Several months ago, we published a blog post about CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond, introducing a powerful technique for debugging illegal memory access...
High signal Matched: cuda, gpu, introducing
Hugging Face · open-source · 2025-08-18
Score 14
No feed summary available yet.
High signal Matched: cuda, gpu
Modular · inference-infra · 2025-03-25
Score 14
MAX 25.2: Unleash the power of your H200's–without CUDA!
High signal Matched: cuda, h200
Modular · inference-infra · 2025-03-05
Score 10
What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)
High signal Matched: cuda
Modular · inference-infra · 2025-02-20
Score 10
CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)
High signal Matched: cuda
Modular · inference-infra · 2025-02-12
Score 10
How did CUDA succeed? (Democratizing AI Compute, Part 3)
High signal Matched: cuda
Modular · inference-infra · 2025-02-05
Score 10
What exactly is “CUDA”? (Democratizing AI Compute, Part 2)
High signal Matched: cuda