TensorRT-LLM · open-source · 2026-06-03
Speculative Decoding
No feed summary available yet.
High signal Matched: decoding, speculative decoding
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding, training
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding, model, training
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding, eagle, training
AMD ROCm Blogs · hardware · 2026-05-29
Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...
High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model
vLLM Project · open-source · 2026-05-28
The v0.5.0 release brings significant architectural improvements to speculative decoding model training, introducing DFlash algorithm support, fully unified online training capabilities, and a...
High signal Matched: decoding, speculative decoding, release, introducing, model, training
vLLM Project · open-source · 2026-05-26
The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding algorithms across both research and...
High signal Matched: decoding, speculative decoding, eagle, research
Nota AI · korea · 2026-05-11
Jaehoon Lee Technical Content Manager, Nota AI NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...
High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api
BAIR · research · 2026-05-08
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...
High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model
Together AI · inference-infra · 2026-04-24
Rollout is the silent bottleneck in RL post-training. DAS fixes it with adaptive speculative decoding — up to 50% faster, zero degradation in reward quality.
High signal Matched: decoding, speculative decoding, training, post-training
Nota AI · korea · 2026-04-08
Jaehoon Lee Technical Content Manager, Nota AI AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...
High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate
Together AI · inference-infra · 2026-03-31
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
High signal Matched: decoding, speculative decoding, open-source
vLLM Project · open-source · 2026-03-13
EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you...
High signal Matched: inference, decoding, speculative decoding, eagle, model
Nota AI · korea · 2026-02-26
Jewon Lee | Wooksu Shin | Seungmin Yang | Ki-Ung Song | Donguk Lim | Jaeyeon Kim | Tae-Ho Kim | Bo-Kyeong KimEdgeFM Team, Nota AI ✔️ Resources for more information: GitHub, ArXiv, Project Page, Demo.✔️ Accepted at ICLR 2026. &...
High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks
vLLM Project · open-source · 2025-12-13
- Speculative decoding serves as an optimization to improve inference performance; however, training a unique draft model for each LLM can be difficult and time-consuming, while production-ready...
High signal Matched: inference, decoding, speculative decoding, draft model, performance, model, training
Together AI · inference-infra · 2025-12-03
AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...
High signal Matched: inference, decoding, speculative decoding, introducing
llm-d · open-source · 2025-12-02
llm-d v0.4 delivers 50% lower latency for MoE models via speculative decoding, expands TPU and XPU support, and adds prefix cache offloading for faster TTFT.
High signal Matched: decoding, prefix cache, speculative decoding, moe, performance, latency, ttft, tpu, sota
Together AI · inference-infra · 2025-12-01
Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...
High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source
Together AI · inference-infra · 2025-08-21
Build AI agents for complex, long-running engineering tasks. Learn key patterns from a case study: accelerating LLM inference with speculative decoding.
High signal Matched: inference, decoding, speculative decoding, agents
Together AI · inference-infra · 2025-05-12
No feed summary available yet.
High signal Matched: decoding, speculative decoding
SqueezeBits · korea · 2025-02-17
A brief review of the research paper from our team, published at ICML 2024.
High signal Matched: verification, paper, research
SqueezeBits · korea · 2024-12-09
This article provides a comparative analysis of speculative decoding.
High signal Matched: decoding, speculative decoding
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
High signal Matched: decoding, generation, speculative decoding
Hugging Face · open-source · 2024-05-01
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding
Hugging Face · open-source · 2024-01-30
No feed summary available yet.
High signal Matched: decoding, speculative decoding
Hugging Face · open-source · 2023-12-20
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding