LMCache

Open-source KV cache community blog focused on LLM serving, KV-cache tiering, long-context inference, and cache-aware performance optimization.

Country: Unknown
Category: open-source
Blog: https://blog.lmcache.ai/en/
Feed: https://blog.lmcache.ai/en/feed/
Feed discovery status: known

LMCache · open-source · 2026-06-03

A New Chapter for LMCache and the KV Cache Community

Score 17

TL;DR: A key contributor to the LMCache community just secured a major investment. This will greatly accelerate our mission of building the best KV cache library for every developer. Come join us in building the future AI-native data layer...

kv-cache

Open

High signal Matched: kv cache, lmcache

LMCache · open-source · 2026-05-27

When Open Source Meets Open Source: A Joint Effort Between LMCache and Mooncake

Score 11

A collaboration story about LMCache multiprocess mode + MooncakeStore — From 0 to 1, from functional to optimized. 1. Before We Begin Recently, the LMCache community and the Mooncake community carried out a series of valuable open-source c...

kv-cache fine-tuning open-source

Open

High signal Matched: lmcache, adapter, open-source, open source

LMCache · open-source · 2026-05-21

OpenAI API Is the New IPv4

Score 10

A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...

inference serving kv-cache api

Open

High signal Matched: serving, lmcache, api

LMCache · open-source · 2026-05-13

Benchmarking LMCache for Multi-Turn Agentic Workloads on AMD MI300X

Score 20

A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation trac...

kv-cache moe hardware model-release quantization agents

Open

High signal Matched: lmcache, moe, mi300x, rocm, fp8, agentic

LMCache · open-source · 2026-05-05

Deepseek V4 explained, and why it matters to your wallet

Score 12

DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...

kv-cache model-release

Open

High signal Matched: kv cache, lmcache, model

LMCache · open-source · 2026-04-29

Stop Calling It KV Cache: It’s Something Much Bigger

Score 14

For years, we have referred to one of the most critical components of modern LLM inference as a “KV cache.” That name made sense once. Today, it is increasingly misleading. What began as a small, ephemeral optimization inside a...

inference kv-cache

Open

High signal Matched: inference, kv cache, lmcache

LMCache · open-source · 2026-04-23

LMCache on Amazon SageMaker HyperPod: Accelerating LLM Inference with Managed Tiered KV Cache

Score 30

Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...

inference kv-cache benchmark hardware model-release cloud

Open

High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker

LMCache · open-source · 2026-04-18

LMCache: A Journey

Score 12

GTC wrapped up a month ago. Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial...

kv-cache open-source

Open

High signal Matched: kv cache, lmcache, open-source

LMCache · open-source · 2026-04-16

What is TurboQuant and why it matters for LLM inference, in laymen’s term

Score 16

TL;DR: TurboQuant allows you to put 4x more context in your GPU without blowing up GPU memory or dropping AI’s intelligence. It does so by quantizing the memory of large language models, also known as KV cache, an important bottleneck ment...

inference kv-cache hardware

Open

High signal Matched: inference, kv cache, lmcache, gpu

LMCache · open-source · 2026-04-04

LMCache’s New Architecture Boosts MoE Inference Performance by 10×

Score 34

Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...

inference serving kv-cache moe benchmark rag agents

Open

High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents