distributed - MLSys Blogs

TL;DR: This case study demonstrates how LinkedIn re-architected its distributed linear programming solver, DuaLip, by developing a GPU-accelerated PyTorch version to handle extreme-scale optimization challenges like web applications. This...

distributed hardware

Open

High signal Matched: distributed, gpu

NVIDIA Technical Blog · hardware · 2026-06-01

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark

Score 13

The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent...

distributed agents

Open

High signal Matched: multi-node, agents

Lambda · cloud · 2026-06-01

Unbox one of NVIDIA's first co-packaged optics switches with us. See why we bet on CPO early.

Score 15

When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...

inference serving distributed benchmark hardware model-release rag agents

Open

High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic

AMD ROCm Blogs · hardware · 2026-05-25

AI Inference on AMD Ryzen™ AI Max Processor

Score 20

Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...

inference distributed hardware model-release cloud quantization evals

Open

High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate

NVIDIA Technical Blog · hardware · 2026-05-07

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus

Score 20

Distributed deep learning depends on fast, reliable GPU-to-GPU communication using the NVIDIA Collective Communication Library (NCCL). When training slows down,...

distributed benchmark hardware training

Open

High signal Matched: distributed, nccl, performance, gpu, training

vLLM Project · open-source · 2026-05-06

Serving Agentic Workloads at Scale with vLLM x Mooncake

Score 18

TL;DR: Agentic workloads generate massive shared prefixes that are often recomputed across turns. By integrating Mooncake's distributed KV cache store into vLLM, we achieve 3.8x higher throughput,...

inference serving distributed kv-cache benchmark agents

Open

High signal Matched: serving, throughput, distributed, kv cache, agentic

SkyPilot · open-source · 2026-05-01

Cache Me If You Can: Tuning Object Stores for AI

Score 8

We ran hundreds of benchmarks to tune storage systems for distributed training so you don’t have to.

distributed training evals

Open

High signal Matched: distributed, training, distributed training, benchmarks

Nota AI · korea · 2026-04-08

[Overview: NetsPresso®] A Platform That Handles Everything from Model Optimization to Target Deployment

Score 36

  Jaehoon Lee Technical Content Manager, Nota AI   AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...

inference distributed kv-cache speculative-decoding benchmark hardware model-release research quantization evals

Open

High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate

NVIDIA Technical Blog · hardware · 2026-03-16

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale

Score 16

Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools....

inference distributed agents

Open

High signal Matched: inference, multi-node, agentic

SkyPilot · open-source · 2026-01-13

Scaling SAM3 Video Segmentation on Multiple Kubernetes clusters and Clouds with SkyPilot

Score 8

Run Meta's SAM3 on large video archives distributed across AWS and Kubernetes clusters with SkyPilot Pools.

distributed

Open

High signal Matched: distributed

Together AI · inference-infra · 2026-01-12

Inside multi-node training: How to scale model training across GPU clusters

Score 22

Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.

distributed hardware model-release training

Open

High signal Matched: distributed, multi-node, gpu, model, training, distributed training

Nota AI · korea · 2025-12-19

NVIDIA Blackwell; The Impact of NVFP4 For LLM Inference

Score 74

  Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...

inference serving kernel cuda distributed benchmark hardware model-release research training quantization evals rag

Open

High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval

vLLM Project · open-source · 2025-11-22

Streamlined multi-node serving with Ray symmetric-run

Score 18

Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster, simplifying the workflow to spawn vLLM servers...

inference serving distributed model-release

Open

High signal Matched: serving, multi-node, launch

Modal · inference-infra · 2025-11-04

One-second voice-to-voice latency with Modal, Pipecat, and open models

Score 12

How we built a real-time voice bot on Modal's distributed serverless platform.

distributed benchmark

Open

High signal Matched: distributed, latency

llm-d · open-source · 2025-09-24

KV-Cache Wins You Can See: From Prefix Caching in vLLM to Distributed Scheduling with llm-d

Score 18

See how llm-d's precise KV-cache aware scheduling delivers 57x faster responses and 2x throughput in production distributed LLM inference benchmarks.

inference serving distributed benchmark evals

Open

High signal Matched: inference, throughput, distributed, benchmarks

SkyPilot · open-source · 2025-09-11

From 1 hour to 10 minutes: How I sped up my distributed LLM training without changing the code or GPUs

Score 10

This page has moved. If you are not redirected automatically, click here.

distributed training

Open

High signal Matched: distributed, training

Hugging Face · open-source · 2025-08-08

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Score 14

No feed summary available yet.

distributed hardware training

Open

High signal Matched: multi-gpu, gpu, training

SkyPilot · open-source · 2025-07-16

The Evolution of AI Job Orchestration. Part 2: The AI-Native Control Plane & Orchestration that Finally Works for ML

Score 16

This is Part 2 of our series on the evolution of AI Job Orchestration. In Part 1, we explored how Neoclouds are democratizing GPU access but leaving the “last mile” unsolved. Now we’ll discover how AI-native orchestration...

distributed benchmark hardware cloud

Open

High signal Matched: infiniband, performance, cost, gpu, cloud

Modal · inference-infra · 2025-07-11

Product updates: Multi-node training clusters, B200 and H200s, and Client 1.0 release

Score 18

Welcome to another round of Modal Product Updates! Here's what's new this month.

distributed hardware model-release training

Open

High signal Matched: multi-node, b200, release, training

llm-d · open-source · 2025-05-20

Announcing the llm-d community!

Score 20

Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.

inference serving distributed benchmark model-release frontier-model

Open

High signal Matched: inference, serving, distributed, performance, introducing, sota

llm-d · open-source · 2025-05-20

llm-d Press Release

Score 20

Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.

inference distributed model-release cloud open-source

Open

High signal Matched: inference, distributed, release, cloud, open source

SkyPilot · open-source · 2025-03-20

Large-Scale AI Batch Inference: 9x Faster Embedding Generation

Score 16

How to accelerate distributed embedding generation? Use the "forgotten" regions.

inference distributed

Open

High signal Matched: inference, generation, distributed

AIBrix · open-source · 2025-03-10

DeepSeek-R1 671B multi-host Deployment in AIBrix

Score 20

This blog post introduces deploying DeepSeek R1 using AIBrix. DeepSeek-R1 demonstrates remarkable proficiency in reasoning tasks through step-by-step training process. It features 671B total parameters with 37B active parameters, and 128k...

inference distributed benchmark model-release training long-context

Open

High signal Matched: inference, distributed, benchmark, model, weights, training, context length

AIBrix · open-source · 2025-02-19

AIBrix v0.2.0 Release: Distributed KV Cache, Orchestration and Heterogeneous GPU Support

Score 42

We’re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...

inference serving distributed kv-cache benchmark hardware model-release agents

Open

High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent

Hugging Face · open-source · 2022-10-21