OpenAI · model-lab · 2026-06-03
Our most capable and efficient frontier model for professional workReleaseMar 5, 202616 min read
No feed summary available yet.
High signal Matched: model, frontier model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model, frontier model
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model, training, frontier model
BAIR · research · 2026-05-08
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...
High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model
Nota AI · korea · 2026-03-31
Jaehoon Lee Technical Content Manager, Nota AI In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...
High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model
Together AI · inference-infra · 2025-12-15
Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud
High signal Matched: model, cloud, reasoning model
llm-d · open-source · 2025-12-02
llm-d v0.4 delivers 50% lower latency for MoE models via speculative decoding, expands TPU and XPU support, and adds prefix cache offloading for faster TTFT.
High signal Matched: decoding, prefix cache, speculative decoding, moe, performance, latency, ttft, tpu, sota
Hugging Face · open-source · 2025-11-25
No feed summary available yet.
High signal Matched: research, state of the art
Modular · inference-infra · 2025-09-19
Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA
High signal Matched: blackwell, sota
Modular · inference-infra · 2025-09-12
Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance
High signal Matched: performance, blackwell, sota
llm-d · open-source · 2025-05-20
Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.
High signal Matched: inference, serving, distributed, performance, introducing, sota
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
High signal Matched: introducing, sota
BAIR · research · 2025-04-11
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...
High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota
Modular · inference-infra · 2024-12-17
MAX GPU: State of the Art Throughput on a New GenAI platform
High signal Matched: throughput, gpu, state of the art
Modular · inference-infra · 2024-09-13
MAX 24.5 - With SOTA CPU Performance for Llama 3.1
High signal Matched: performance, sota
Hugging Face · open-source · 2023-12-11
No feed summary available yet.
High signal Matched: mixture of experts, mixtral, sota
Hugging Face · open-source · 2022-03-02
No feed summary available yet.
High signal Matched: model, state of the art
AI2 · research · 2026-04-30
AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.
Watchlist Matched: model, frontier-model
Together AI · inference-infra · 2026-02-25
No feed summary available yet.
Watchlist Matched: training, agents, sota
Modal · inference-infra · 2026-02-11
GLM-5.1 establishes a new SotA for open models. Try it free today.
Watchlist Matched: sota
Hugging Face · open-source · 2025-10-02
No feed summary available yet.
Watchlist Matched: sota
Hugging Face · open-source · 2025-07-16
No feed summary available yet.
Watchlist Matched: sota
Hugging Face · open-source · 2024-11-25
No feed summary available yet.
Watchlist Matched: state of the art