Open-source KV cache community blog focused on LLM serving, KV-cache tiering, long-context inference, and cache-aware performance optimization.
LMCache · open-source · 2026-06-03
Score 17
TL;DR: A key contributor to the LMCache community just secured a major investment. This will greatly accelerate our mission of building the best KV cache library for every developer. Come join us in building the future AI-native data layer...
High signal Matched: kv cache, lmcache
LMCache · open-source · 2026-05-27
Score 11
A collaboration story about LMCache multiprocess mode + MooncakeStore — From 0 to 1, from functional to optimized. 1. Before We Begin Recently, the LMCache community and the Mooncake community carried out a series of valuable open-source c...
High signal Matched: lmcache, adapter, open-source, open source
LMCache · open-source · 2026-05-21
Score 10
A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...
High signal Matched: serving, lmcache, api
LMCache · open-source · 2026-05-13
Score 20
A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation trac...
High signal Matched: lmcache, moe, mi300x, rocm, fp8, agentic
LMCache · open-source · 2026-05-05
Score 12
DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...
High signal Matched: kv cache, lmcache, model
LMCache · open-source · 2026-04-29
Score 14
For years, we have referred to one of the most critical components of modern LLM inference as a “KV cache.” That name made sense once. Today, it is increasingly misleading. What began as a small, ephemeral optimization inside a...
High signal Matched: inference, kv cache, lmcache
LMCache · open-source · 2026-04-23
Score 30
Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...
High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker
LMCache · open-source · 2026-04-18
Score 12
GTC wrapped up a month ago. Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial...
High signal Matched: kv cache, lmcache, open-source
LMCache · open-source · 2026-04-16
Score 16
TL;DR: TurboQuant allows you to put 4x more context in your GPU without blowing up GPU memory or dropping AI’s intelligence. It does so by quantizing the memory of large language models, also known as KV cache, an important bottleneck ment...
High signal Matched: inference, kv cache, lmcache, gpu
LMCache · open-source · 2026-04-04
Score 34
Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...
High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents