api

High signal Matched: bedrock, api

AWS Machine Learning Blog · cloud · 2026-05-29

Streamline external access to Amazon SageMaker MLflow using a REST API proxy

Score 11

In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation...

cloud api

High signal Matched: cloud, sagemaker, api, sdk

vLLM Project · open-source · 2026-05-28

From Text to Multimodal Routing: Hardening Vision Signals in vLLM Semantic Router

Score 19

Most routing systems start with a prompt and choose a model endpoint. vLLM Semantic Router (VSR) makes a different bet: before a request reaches the serving model, the system should extract...

inference serving moe model-release api

High signal Matched: serving, endpoint, router, model

Lambda · cloud · 2026-05-21

Lambda Bare Metal Instances: full hardware control with API-driven operations

Score 8

The unit of AI compute has shifted from single hosts to rack-scale systems that integrate NVIDIA GPUs, CPUs, scale-up networking fabrics, and liquid cooling, such as the NVIDIA GB300 NVL72 and NVIDIA Vera Rubin NVL72. Teams at the frontier...

inference serving benchmark cloud training api

inference serving kv-cache api

High signal Matched: serving, performance, cloud, training, api

LMCache · open-source · 2026-05-21

OpenAI API Is the New IPv4

Score 10

A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...

inference serving benchmark model-release research api

High signal Matched: serving, lmcache, api

Together AI · inference-infra · 2026-05-15

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Score 24

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

High signal Matched: inference, endpoint, cost, launch, research

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kernel hardware long-context api

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

Together AI · inference-infra · 2026-05-11

Serving DeepSeek-V4: why million-token context is an inference systems problem

Score 22

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...

High signal Matched: inference, serving, endpoint, kernel, b200, long-context

Nota AI · korea · 2026-04-22

[Deep Dive: NetsPresso®] From Quantization to Graph Optimization: A Step-by-Step Model Deployment Pipeline

Score 54

  Jaehoon Lee Technical Content Manager, Nota AI   Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...

inference kernel cuda benchmark hardware model-release research korea training quantization evals api open-source

High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source

vLLM Project · open-source · 2026-03-24

Model Runner V2: A Modular and Faster Core for vLLM

Score 12

We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...

High signal Matched: model, api

Modal · inference-infra · 2026-03-04

Product Updates: Directory Snapshots, GLM-5, billing updates and more

Score 8

A roundup of everything we shipped in February: Directory Snapshots for Sandboxes, a free GLM-5 endpoint, new billing API, and more.

serving api

inference model-release fine-tuning rag api

High signal Matched: endpoint, api

AIBrix · open-source · 2026-03-03

AIBrix v0.6.0 Release: Envoy Sidecar, Mixed LLM Workloads Routing, Routing Profiles, LoRA Delivery & New APIs

Score 28

🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...

inference model-release api

High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible

vLLM Project · open-source · 2026-01-31

Streaming Requests & Realtime API in vLLM

Score 12

Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...

High signal Matched: inference, model, api

Hugging Face · open-source · 2025-11-20

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms

Score 10

No feed summary available yet.

inference benchmark model-release research evals api

High signal Matched: introducing, api

AIBrix · open-source · 2025-11-10

AIBrix v0.5.0 Release: Batch API, KVCache v1 Connector, and Enhanced P/D orchestration

Score 22

🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...

inference model-release api

High signal Matched: prefill, latency, release, evaluation, api, openai-compatible

Together AI · inference-infra · 2025-10-21

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

Score 16

Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.

High signal Matched: generation, model, openai-compatible

SqueezeBits · korea · 2025-10-02

Yetter, the GenAI API service: AI Optimization, Out of the Box

Score 14

Meet 'Yetter': the generative AI API service built for speed, efficiency, and scalability. Powered by our optimization inference engine, it delivers reliable image, video, and future LLM services at a fraction of the cost.

inference benchmark api

High signal Matched: inference, cost, api

Replicate · inference-infra · 2025-09-17

Introducing our new search API

Score 8

Find the best models and collections with a single API call.

inference benchmark model-release api

High signal Matched: introducing, api

Together AI · inference-infra · 2025-09-15

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Score 18

Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...

High signal Matched: inference, cost, model, api

SqueezeBits · korea · 2025-07-01

Bringing NPUs into Production: Our Journey with Intel Gaudi

Score 8

SqueezeBits has partnered with Intel to make Gaudi NPUs more usable in practice. We optimized LLMs and diffusion models for Gaudi-2 and created yetter, a generative AI API service.

benchmark model-release api

High signal Matched: api

Together AI · inference-infra · 2025-06-11

Introducing the Together AI Batch API: Process Thousands of LLM Requests at 50% Lower Cost

Score 16

No feed summary available yet.

High signal Matched: cost, introducing, api

llm-d · open-source · 2025-06-03

llm-d Week 1 Project News Round-Up

Score 12

llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.

benchmark model-release research training fine-tuning evals rag api frontier-model

High signal Matched: inference, api

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

inference model-release cloud api open-source

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

Replicate · inference-infra · 2025-03-05

Wan2.1: generate videos with an API

Score 10

Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.

High signal Matched: generation, model, cloud, api, open-source

Replicate · inference-infra · 2024-10-22

Ideogram v2 is an outstanding new inpainting model

Score 8

We've partnered with Ideogram to bring their inpainting model to Replicate's API.

High signal Matched: model, api

Replicate · inference-infra · 2024-07-23

Run Meta Llama 3.1 405B with an API

Score 8

Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.

inference model-release api

High signal Matched: model, cloud, api, open-source

Replicate · inference-infra · 2024-06-14

Push a custom version of Stable Diffusion 3

Score 8

Create your own custom version of Stability's latest image generation model and run it on Replicate via the web or API.

High signal Matched: generation, model, api

Replicate · inference-infra · 2024-06-12

Run Stable Diffusion 3 with an API

Score 8

Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.

model-release cloud api

High signal Matched: model, cloud, api

Replicate · inference-infra · 2024-04-23

Run Snowflake Arctic with an API

Score 8

Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open-source

Replicate · inference-infra · 2024-04-18

Run Meta Llama 3 with an API

Score 8

Llama 3 is the latest language model from Meta. Learn how to run it in the cloud with one line of code.

model-release cloud api

inference cloud api open-source

High signal Matched: model, cloud, api

Replicate · inference-infra · 2024-01-30

Run Code Llama 70B with an API

Score 8

Code Llama 70B is one of the powerful open-source code generation models. Learn how to run it in the cloud with one line of code.

benchmark model-release api open-source

High signal Matched: generation, cloud, api, open-source

Replicate · inference-infra · 2023-11-10

Using open-source models for faster and cheaper text embeddings

Score 10

An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.

High signal Matched: performance, model, api, open-source

Replicate · inference-infra · 2023-10-06

How to run Mistral 7B with an API

Score 8

Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open-source

Hugging Face · open-source · 2023-10-02

Deploying the AI Comic Factory using the Inference API

Score 10

No feed summary available yet.

High signal Matched: inference, api

Replicate · inference-infra · 2023-07-27

Run Llama 2 with an API

Score 8

Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open source

Replicate · inference-infra · 2022-11-21

Train and deploy a DreamBooth model on Replicate

Score 10

With just a handful of images and a single API call, you can train a model, publish it to Replicate, and run predictions on it in the cloud.

model-release cloud api

High signal Matched: model, cloud, api

Hugging Face · open-source · 2021-06-03

Few-shot learning in practice: GPT-Neo and the 🤗 Accelerated Inference API

Score 10

No feed summary available yet.

High signal Matched: inference, api

Hugging Face · open-source · 2021-01-18

How we sped up transformer inference 100x for 🤗 API customers

Score 10

No feed summary available yet.

High signal Matched: inference, api

Vast.ai · cloud · 2026-06-03

Python SDK

Score 0

No feed summary available yet.

Watchlist Matched: sdk

Vast.ai · cloud · 2026-06-03

Score 0

No feed summary available yet.

Watchlist Matched: api

FriendliAI · inference-infra · 2026-06-03

Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time Gemma

Score 6

No feed summary available yet.

Watchlist Matched: api

Fireworks AI · inference-infra · 2026-06-03

The Best 8 LLM API Providers in 2026

Score 6

No feed summary available yet.

Watchlist Matched: api

Moonshot AI Kimi · model-lab · 2026-06-03

OpenClaw and OpenClaw API: The Complete Guide

Score 5

No feed summary available yet.

Watchlist Matched: api

Mistral AI · model-lab · 2026-06-03

API pricing

Score 5

No feed summary available yet.

Watchlist Matched: api

xAI · model-lab · 2026-06-03

API Console

Score 5

No feed summary available yet.

Watchlist Matched: api

xAI · model-lab · 2026-06-03

May 29, 2026Grok Build 0.1 on API

Score 5

No feed summary available yet.

Watchlist Matched: api

Groq · hardware · 2026-06-03

Free API key

Score 4

No feed summary available yet.

Watchlist Matched: api

Cloudflare Blog · cloud · 2026-05-22

Announcing Claude Compliance API support with Cloudflare CASB

Score 0

Cloudflare now integrates with the Claude Compliance API, so that security teams can monitor Claude Enterprise activity directly in the Cloudflare Dashboard.

Watchlist Matched: api

Cloudflare Blog · cloud · 2026-04-30

Agents can now create Cloudflare accounts, buy domains, and deploy

Score 0

Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...

agents api

Watchlist Matched: agents, api

Modal · inference-infra · 2026-04-15

Building with Modal and the OpenAI Agents SDK

Score 1

Modal is an official sandbox provider for the OpenAI Agents SDK.

agents api

Watchlist Matched: agents, sdk

Modal · inference-infra · 2026-04-07

Product Updates: RTX Pro 6000 Blackwell, Command K, Sandbox FS API and more

Score 2

Product updates, community highlights, and upcoming events.

hardware api

Watchlist Matched: blackwell, api

Together AI · inference-infra · 2025-12-12

Announcing Together Python SDK v2.0

Score 3

No feed summary available yet.

Watchlist Matched: sdk

Together AI · inference-infra · 2025-05-20

Together Code Interpreter: execute LLM-generated code seamlessly with a simple API call

Score 3

No feed summary available yet.

Watchlist Matched: api

Replicate · inference-infra · 2025-05-06

Run MiniMax Speech-02 models with an API

Score 0

MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.

Watchlist Matched: api

Replicate · inference-infra · 2024-10-22

Stable Diffusion 3.5 is here

Score 6

Stability AI's latest text-to-image model is now available on Replicate and you can run it with an API.

Watchlist Matched: model, api

Replicate · inference-infra · 2024-09-09

Fine-tune FLUX.1 with an API

Score 0

Create and run your own fine-tuned Flux models programmatically using Replicate's HTTP API.

inference fine-tuning api

Watchlist Matched: api

Replicate · inference-infra · 2024-08-15

Fine-tune FLUX.1 with your own images

Score 6

We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.

model-release api open-source

Watchlist Matched: generation, fine-tuning, lora, api

Replicate · inference-infra · 2024-08-01

Run FLUX with an API

Score 6

FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.

Watchlist Matched: model, api, open-source

Replicate · inference-infra · 2024-07-26

Replicate Intelligence #8

Score 6

A top-tier open-ish language model, new safety classifiers, model search API

Watchlist Matched: model, api

SkyPilot · open-source · 2024-06-04

SkyPilot 0.6: Managed Jobs API, SkyServe on Kubernetes, Spot + On-demand mixing, Paperspace support

Score 1

Announcing SkyPilot 0.6.

Watchlist Matched: api

Hugging Face · open-source · 2024-04-04

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Score 1

No feed summary available yet.

Watchlist Matched: api

Hugging Face · open-source · 2024-02-08

From OpenAI to Open LLMs with Messages API on Hugging Face

Score 1

No feed summary available yet.

fine-tuning api open-source

Watchlist Matched: api

Replicate · inference-infra · 2023-12-06

Clone your voice using open-source models

Score 0

We’ve added fine-tuning for realistic voice cloning (RVC). You can train RVC on your own dataset from a YouTube video with a few lines of code using Replicate's API.

Watchlist Matched: fine-tuning, api, open-source

Replicate · inference-infra · 2023-11-23

How to run Yi chat models with an API

Score 6

The Yi series models are large language models trained from scratch by developers at 01.AI. Learn how to run them in the cloud with one line of code.

cloud api

Watchlist Matched: cloud, api

Replicate · inference-infra · 2023-08-14

Streaming output for language models

Score 0

Our API now supports server-sent event streams for language models. Learn how to use them to make your apps more responsive.

Watchlist Matched: api

Replicate · inference-infra · 2023-08-08

Fine-tune SDXL with your own images

Score 0

We’ve added fine-tuning (Dreambooth, Textual Inversion and LoRA) support to SDXL 1.0. You can train SDXL on your own images with one line of code using the Replicate API.

fine-tuning api