MLSys Radar

api

FuriosaAI · hardware · 2026-06-03

Furiosa SDK

Score 15

No feed summary available yet.

korea api

Open

High signal Matched: furiosa, sdk

AWS Machine Learning Blog · cloud · 2026-06-03

Object detection with Amazon Nova 2 Lite

Score 9

In this post, we'll walk through implementing object detection with Amazon Nova 2 Lite. You'll learn how to deploy an object detection application using Amazon Bedrock, AWS Lambda, and Amazon API Gateway. You'll also learn how to craft eff...

cloud api

Open

High signal Matched: bedrock, api

AWS Machine Learning Blog · cloud · 2026-05-29

Streamline external access to Amazon SageMaker MLflow using a REST API proxy

Score 11

In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation...

cloud api

Open

High signal Matched: cloud, sagemaker, api, sdk

LMCache · open-source · 2026-05-21

OpenAI API Is the New IPv4

Score 10

A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...

inference serving kv-cache api

Open

High signal Matched: serving, lmcache, api

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

Open

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

Together AI · inference-infra · 2026-05-11

Serving DeepSeek-V4: why million-token context is an inference systems problem

Score 22

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...

inference serving kernel hardware long-context api

Open

High signal Matched: inference, serving, endpoint, kernel, b200, long-context

Nota AI · korea · 2026-04-22

[Deep Dive: NetsPresso®] From Quantization to Graph Optimization: A Step-by-Step Model Deployment Pipeline

Score 54

  Jaehoon Lee Technical Content Manager, Nota AI   Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...

inference kernel cuda benchmark hardware model-release research korea training quantization evals api open-source

Open

High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source

vLLM Project · open-source · 2026-03-24

Model Runner V2: A Modular and Faster Core for vLLM

Score 12

We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...

model-release api

Open

High signal Matched: model, api

AIBrix · open-source · 2026-03-03

AIBrix v0.6.0 Release: Envoy Sidecar, Mixed LLM Workloads Routing, Routing Profiles, LoRA Delivery & New APIs

Score 28

🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...

inference model-release fine-tuning rag api

Open

High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible

vLLM Project · open-source · 2026-01-31

Streaming Requests & Realtime API in vLLM

Score 12

Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...

inference model-release api

Open

High signal Matched: inference, model, api

AIBrix · open-source · 2025-11-10

AIBrix v0.5.0 Release: Batch API, KVCache v1 Connector, and Enhanced P/D orchestration

Score 22

🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...

inference benchmark model-release research evals api

Open

High signal Matched: prefill, latency, release, evaluation, api, openai-compatible

llm-d · open-source · 2025-06-03

llm-d Week 1 Project News Round-Up

Score 12

llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.

inference api

Open

High signal Matched: inference, api

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

benchmark model-release research training fine-tuning evals rag api frontier-model

Open

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

Replicate · inference-infra · 2024-06-12

Run Stable Diffusion 3 with an API

Score 8

Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.

model-release cloud api

Open

High signal Matched: model, cloud, api

Replicate · inference-infra · 2023-07-27

Run Llama 2 with an API

Score 8

Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.

model-release cloud api open-source

Open

High signal Matched: model, cloud, api, open source

Vast.ai · cloud · 2026-06-03

Python SDK

Score 0

No feed summary available yet.

api

Open

Watchlist Matched: sdk

Vast.ai · cloud · 2026-06-03

API

Score 0

No feed summary available yet.

api

Open

Watchlist Matched: api

Mistral AI · model-lab · 2026-06-03

API pricing

Score 5

No feed summary available yet.

api

Open

Watchlist Matched: api

xAI · model-lab · 2026-06-03

API Console

Score 5

No feed summary available yet.

api

Open

Watchlist Matched: api

Groq · hardware · 2026-06-03

Free API key

Score 4

No feed summary available yet.

api

Open

Watchlist Matched: api

Cloudflare Blog · cloud · 2026-04-30

Agents can now create Cloudflare accounts, buy domains, and deploy

Score 0

Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...

agents api

Open

Watchlist Matched: agents, api

Replicate · inference-infra · 2025-05-06

Run MiniMax Speech-02 models with an API

Score 0

MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2024-09-09

Fine-tune FLUX.1 with an API

Score 0

Create and run your own fine-tuned Flux models programmatically using Replicate's HTTP API.

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2024-08-15

Fine-tune FLUX.1 with your own images

Score 6

We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.

inference fine-tuning api

Open

Watchlist Matched: generation, fine-tuning, lora, api

Replicate · inference-infra · 2024-08-01

Run FLUX with an API

Score 6

FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.

model-release api open-source

Open

Watchlist Matched: model, api, open-source

Replicate · inference-infra · 2023-11-23

How to run Yi chat models with an API

Score 6

The Yi series models are large language models trained from scratch by developers at 01.AI. Learn how to run them in the cloud with one line of code.

cloud api

Open

Watchlist Matched: cloud, api

Replicate · inference-infra · 2023-08-14

Streaming output for language models

Score 0

Our API now supports server-sent event streams for language models. Learn how to use them to make your apps more responsive.

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2023-08-08

Fine-tune SDXL with your own images

Score 0

We’ve added fine-tuning (Dreambooth, Textual Inversion and LoRA) support to SDXL 1.0. You can train SDXL on your own images with one line of code using the Replicate API.

fine-tuning api

Open

Watchlist Matched: fine-tuning, lora, api

Replicate · inference-infra · 2023-07-26

Run SDXL with an API

Score 0

How to run Stable Diffusion XL 1.0 using the Replicate API

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2022-08-29

Run Stable Diffusion with an API

Score 0

How to use Replicate to integrate Stable Diffusion into hacks, apps, and projects

api

Open

Watchlist Matched: api