MLSys Radar

agents

AWS Machine Learning Blog · cloud · 2026-06-02

Extending MCP support for Amazon Bedrock AgentCore Gateway

Score 11

While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized...

model-release cloud agents

Open

High signal Matched: model, bedrock, mcp

Lambda · cloud · 2026-06-01

Unbox one of NVIDIA's first co-packaged optics switches with us. See why we bet on CPO early.

Score 15

When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...

inference serving distributed benchmark hardware model-release rag agents

Open

High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic

AWS Machine Learning Blog · cloud · 2026-05-29

Evaluating Deep Agents using LangSmith on AWS

Score 9

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep...

research cloud evals agents

Open

High signal Matched: evaluation, bedrock, evals, evaluating, agent, agents

AWS Machine Learning Blog · cloud · 2026-05-29

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Score 13

Datasets in AgentCore is in public preview. Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchm...

benchmark research cloud evals agents

Open

High signal Matched: benchmark, evaluation, bedrock, agent

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

Open

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

BAIR · research · 2026-05-08

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Score 28

.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...

inference serving kv-cache speculative-decoding benchmark model-release research training fine-tuning evals long-context agents frontier-model

Open

High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model

Nota AI · korea · 2026-04-29

[NVIDIA Nemotron Hackathon] Grand Prize Among 20 Teams: Behind Two Sleepless Days

Score 32

  Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...

inference moe benchmark model-release research korea training fine-tuning quantization evals agents

Open

High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic

Together AI · inference-infra · 2026-04-29

DeepSeek-V4 Pro now available on Together AI

Score 10

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.

research long-context agents

Open

High signal Matched: research, long-context, agents

LMCache · open-source · 2026-04-04

LMCache’s New Architecture Boosts MoE Inference Performance by 10×

Score 34

Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...

inference serving kv-cache moe benchmark rag agents

Open

High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents

Nota AI · korea · 2026-03-31

The Real Reason TurboQuant Shook the Market: AI Optimization Has Gone Mainstream

Score 46

  Jaehoon Lee Technical Content Manager, Nota AI   In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...

inference serving kv-cache benchmark hardware model-release research training fine-tuning quantization agents frontier-model

Open

High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model

Nota AI · korea · 2026-03-23

[GTC 2026 Recap] The Trillion-Dollar Inference Race Begins: How Nota AI Fills the Gap

Score 42

  Jaehoon Lee Technical Content Manager, Nota AI   GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...

inference serving kernel cuda kv-cache benchmark hardware model-release research cloud training long-context agents open-source

Open

High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source

Together AI · inference-infra · 2026-03-12

Build real-time voice agents on Together AI

Score 10

Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.

benchmark agents

Open

High signal Matched: latency, agents

SkyPilot · open-source · 2026-02-27

Don't Run OpenClaw on Your Main Machine

Score 8

OpenClaw gives an AI agent full access to your system. Here's why you should run it on an isolated cloud VM, and how to set that up.

cloud agents

Open

High signal Matched: cloud, agent

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

inference benchmark model-release research training evals agents open-source

Open

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

Together AI · inference-infra · 2026-01-13

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Score 24

Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...

inference benchmark hardware model-release quantization agents

Open

High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents

SkyPilot · open-source · 2025-08-12

Self-host open-source LLM agent sandbox on your own cloud

Score 10

Your AI writes code. Now what? If you’re building AI agents in 2025, you probably wondered that as well. Your LLM generates some Python code that analyzes data, manipulates files, or calls APIs. But where does it run? Most people eit...

cloud agents open-source

Open

High signal Matched: cloud, agent, agents, open-source

BAIR · research · 2025-07-01

Whole-Body Conditioned Egocentric Video Prediction

Score 10

.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...

inference benchmark model-release research training evals agents

Open

High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents

AIBrix · open-source · 2025-02-21

Introducing AIBrix: Cost-Effective and Scalable Control Plane for vLLM

Score 26

Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...

inference benchmark model-release agents open-source

Open

High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source

AIBrix · open-source · 2025-02-19

AIBrix v0.2.0 Release: Distributed KV Cache, Orchestration and Heterogeneous GPU Support

Score 42

We’re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...

inference serving distributed kv-cache benchmark hardware model-release agents

Open

High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent

FriendliAI · inference-infra · 2026-06-03

Agents

Score 6

No feed summary available yet.

agents

Open

Watchlist Matched: agents

Microsoft Research · big-tech · 2026-05-29

Data Formulator 0.7: AI-powered data analytics for enterprise data

Score 5

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into...

research agents

Open

Watchlist Matched: research, agents

NVIDIA Technical Blog · hardware · 2026-05-20

Mastering Agentic Techniques: AI Agent Customization

Score 3

Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating...

agents

Open

Watchlist Matched: agent, agents, agentic

Cloudflare Blog · cloud · 2026-05-19

Announcing Claude Managed Agents on Cloudflare

Score 0

Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to pr...

agents

Open

Watchlist Matched: agent, agents

Cloudflare Blog · cloud · 2026-04-30

Agents can now create Cloudflare accounts, buy domains, and deploy

Score 0

Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...

agents api

Open

Watchlist Matched: agents, api

Lambda · cloud · 2026-04-30

Creating highly efficient agents: 450M tool-calling tokens distilled for post-training from top open-source models

Score 4

Harnesses If you've used Claude Code or Codex, you've used a harness. A harness is the infrastructure layer that wraps an AI coding agent and decides how it operates, what it can touch, and how you measure whether it worked. It's how most...

hardware training agents open-source

Open

Watchlist Matched: gpu, training, post-training, agent, agents, open-source

AI2 · research · 2026-04-13

Evaluating agents for scientific discovery

Score 0

Two benchmarks developed at Ai2 – ScienceWorld and DiscoveryWorld – reveal that even incredibly strong AI science agents struggle with problems human scientists solve routinely.

evals agents

Open

Watchlist Matched: evaluating, benchmarks, agents

AI2 · research · 2026-03-23

Highlights from Ai2 at NVIDIA GTC 2026

Score 0

A recap of Ai2's week at NVIDIA GTC 2026, covering panels on open models, live demos of Olmo Hybrid and Asta AutoDiscovery, and conversations on coding agents, hybrid architectures, and robotics.

agents

Open

Watchlist Matched: agents

BAIR · research · 2025-03-25

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Score 6

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...

serving kernel benchmark model-release research training agents

Open

Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents

Hugging Face · open-source · 2024-08-12

Tool Use, Unified

Score 1

No feed summary available yet.

agents

Open

Watchlist Matched: tool use