frontier-model - MLSys Blogs

.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...

inference serving kv-cache speculative-decoding benchmark model-release research training fine-tuning evals long-context agents frontier-model

Open

High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model

Nota AI · korea · 2026-03-31

The Real Reason TurboQuant Shook the Market: AI Optimization Has Gone Mainstream

Score 46

  Jaehoon Lee Technical Content Manager, Nota AI   In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...

inference serving kv-cache benchmark hardware model-release research training fine-tuning quantization agents frontier-model

Open

High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model

Together AI · inference-infra · 2025-12-15

Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest reasoning model

Score 14

Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud

model-release cloud frontier-model

Open

High signal Matched: model, cloud, reasoning model

llm-d · open-source · 2025-12-02

llm-d 0.4: Achieve SOTA Performance Across Accelerators

Score 30

llm-d v0.4 delivers 50% lower latency for MoE models via speculative decoding, expands TPU and XPU support, and adds prefix cache offloading for faster TTFT.

inference kv-cache speculative-decoding moe benchmark hardware frontier-model

Open

High signal Matched: decoding, prefix cache, speculative decoding, moe, performance, latency, ttft, tpu, sota

Hugging Face · open-source · 2025-11-25

Building Deep Research: How we Achieved State of the Art

Score 10

No feed summary available yet.

research frontier-model

Open

High signal Matched: research, state of the art

Modular · inference-infra · 2025-09-19

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

Score 10

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

hardware frontier-model

Open

High signal Matched: blackwell, sota

Modular · inference-infra · 2025-09-12

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

Score 14

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

benchmark hardware frontier-model

Open

High signal Matched: performance, blackwell, sota

llm-d · open-source · 2025-05-20

Announcing the llm-d community!

Score 20

Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.

inference serving distributed benchmark model-release frontier-model

Open

High signal Matched: inference, serving, distributed, performance, introducing, sota

Together AI · inference-infra · 2025-05-20

Introducing Together Code Sandbox & Together Code Interpreter: SOTA code execution for AI

Score 12

No feed summary available yet.

model-release frontier-model

Open

High signal Matched: introducing, sota

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

benchmark model-release research training fine-tuning evals rag api frontier-model

Open

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

Modular · inference-infra · 2024-12-17

MAX GPU: State of the Art Throughput on a New GenAI platform

Score 14

MAX GPU: State of the Art Throughput on a New GenAI platform

serving benchmark hardware frontier-model

Open

High signal Matched: throughput, gpu, state of the art

Modular · inference-infra · 2024-09-13

MAX 24.5 - With SOTA CPU Performance for Llama 3.1

Score 10

MAX 24.5 - With SOTA CPU Performance for Llama 3.1

benchmark frontier-model

Open

High signal Matched: performance, sota

Hugging Face · open-source · 2023-12-11

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Score 14

No feed summary available yet.

moe frontier-model

Open

High signal Matched: mixture of experts, mixtral, sota

Hugging Face · open-source · 2022-03-02

BERT 101 - State Of The Art NLP Model Explained

Score 10

No feed summary available yet.

model-release frontier-model

Open

High signal Matched: model, state of the art

AI2 · research · 2026-04-30

AstaBench update: New results, plus adoption from industry

Score 6

AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.

model-release frontier-model

Open

Watchlist Matched: model, frontier-model

Together AI · inference-infra · 2026-02-25