MLSys Radar

llm-d

Kubernetes-native distributed LLM inference project built around vLLM, intelligent scheduling, KV-cache-aware routing, disaggregated serving, and accelerator portability.

Country
Unknown
Category
open-source
Blog
https://llm-d.ai/
Feed
https://llm-d.ai/blog/rss.xml
Feed discovery status
known

llm-d · open-source · 2026-03-13

Predicted-Latency Based Scheduling for LLMs

Score 18

A lightweight ML model trained online from live traffic replaces manually tuned heuristic weights with direct latency predictions, achieving 43% improvement in P50 end-to-end latency and 70% improvement in TTFT on a production-realistic wo...

benchmark model-release

Open

High signal Matched: latency, ttft, model, weights

llm-d · open-source · 2025-06-25

llm-d Community Update - June 2025

Score 10

Help shape llm-d's future: Take our 5-minute community survey, subscribe to our YouTube channel, and access exclusive resources for LLM serving innovation.

inference serving

Open

High signal Matched: serving

llm-d · open-source · 2025-06-03

llm-d Week 1 Project News Round-Up

Score 12

llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.

inference api

Open

High signal Matched: inference, api

llm-d · open-source · 2025-05-20

llm-d Press Release

Score 20

Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.

inference distributed model-release cloud open-source

Open

High signal Matched: inference, distributed, release, cloud, open source