FuriosaAI · hardware · 2026-06-03
Furiosa SDK 2026.2: Boosting RNGD throughput and accelerating deployments
No feed summary available yet.
High signal Matched: throughput, furiosa, sdk
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: throughput, furiosa, sdk
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: furiosa, sdk
AWS Machine Learning Blog · cloud · 2026-06-03
In this post, we'll walk through implementing object detection with Amazon Nova 2 Lite. You'll learn how to deploy an object detection application using Amazon Bedrock, AWS Lambda, and Amazon API Gateway. You'll also learn how to craft eff...
High signal Matched: bedrock, api
AWS Machine Learning Blog · cloud · 2026-05-29
In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation...
High signal Matched: cloud, sagemaker, api, sdk
vLLM Project · open-source · 2026-05-28
Most routing systems start with a prompt and choose a model endpoint. vLLM Semantic Router (VSR) makes a different bet: before a request reaches the serving model, the system should extract...
High signal Matched: serving, endpoint, router, model
Lambda · cloud · 2026-05-21
The unit of AI compute has shifted from single hosts to rack-scale systems that integrate NVIDIA GPUs, CPUs, scale-up networking fabrics, and liquid cooling, such as the NVIDIA GB300 NVL72 and NVIDIA Vera Rubin NVL72. Teams at the frontier...
High signal Matched: serving, performance, cloud, training, api
LMCache · open-source · 2026-05-21
A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...
High signal Matched: serving, lmcache, api
Together AI · inference-infra · 2026-05-15
Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.
High signal Matched: inference, endpoint, cost, launch, research
Nota AI · korea · 2026-05-11
Jaehoon Lee Technical Content Manager, Nota AI NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...
High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api
Together AI · inference-infra · 2026-05-11
DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...
High signal Matched: inference, serving, endpoint, kernel, b200, long-context
Nota AI · korea · 2026-04-22
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
vLLM Project · open-source · 2026-03-24
We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...
High signal Matched: model, api
Modal · inference-infra · 2026-03-04
A roundup of everything we shipped in February: Directory Snapshots for Sandboxes, a free GLM-5 endpoint, new billing API, and more.
High signal Matched: endpoint, api
AIBrix · open-source · 2026-03-03
🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...
High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible
vLLM Project · open-source · 2026-01-31
Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...
High signal Matched: inference, model, api
Hugging Face · open-source · 2025-11-20
No feed summary available yet.
High signal Matched: introducing, api
AIBrix · open-source · 2025-11-10
🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...
High signal Matched: prefill, latency, release, evaluation, api, openai-compatible
Together AI · inference-infra · 2025-10-21
Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.
High signal Matched: generation, model, openai-compatible
SqueezeBits · korea · 2025-10-02
Meet 'Yetter': the generative AI API service built for speed, efficiency, and scalability. Powered by our optimization inference engine, it delivers reliable image, video, and future LLM services at a fraction of the cost.
High signal Matched: inference, cost, api
Replicate · inference-infra · 2025-09-17
Find the best models and collections with a single API call.
High signal Matched: introducing, api
Together AI · inference-infra · 2025-09-15
Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...
High signal Matched: inference, cost, model, api
SqueezeBits · korea · 2025-07-01
SqueezeBits has partnered with Intel to make Gaudi NPUs more usable in practice. We optimized LLMs and diffusion models for Gaudi-2 and created yetter, a generative AI API service.
High signal Matched: api
Together AI · inference-infra · 2025-06-11
No feed summary available yet.
High signal Matched: cost, introducing, api
llm-d · open-source · 2025-06-03
llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.
High signal Matched: inference, api
BAIR · research · 2025-04-11
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...
High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota
Replicate · inference-infra · 2025-03-05
Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.
High signal Matched: generation, model, cloud, api, open-source
Replicate · inference-infra · 2024-10-22
We've partnered with Ideogram to bring their inpainting model to Replicate's API.
High signal Matched: model, api
Replicate · inference-infra · 2024-07-23
Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Replicate · inference-infra · 2024-06-14
Create your own custom version of Stability's latest image generation model and run it on Replicate via the web or API.
High signal Matched: generation, model, api
Replicate · inference-infra · 2024-06-12
Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Replicate · inference-infra · 2024-04-23
Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Replicate · inference-infra · 2024-04-18
Llama 3 is the latest language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Replicate · inference-infra · 2024-01-30
Code Llama 70B is one of the powerful open-source code generation models. Learn how to run it in the cloud with one line of code.
High signal Matched: generation, cloud, api, open-source
Replicate · inference-infra · 2023-11-10
An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.
High signal Matched: performance, model, api, open-source
Replicate · inference-infra · 2023-10-06
Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Hugging Face · open-source · 2023-10-02
No feed summary available yet.
High signal Matched: inference, api
Replicate · inference-infra · 2023-07-27
Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open source
Replicate · inference-infra · 2022-11-21
With just a handful of images and a single API call, you can train a model, publish it to Replicate, and run predictions on it in the cloud.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2021-06-03
No feed summary available yet.
High signal Matched: inference, api
Hugging Face · open-source · 2021-01-18
No feed summary available yet.
High signal Matched: inference, api
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: sdk
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Cloudflare Blog · cloud · 2026-05-22
Cloudflare now integrates with the Claude Compliance API, so that security teams can monitor Claude Enterprise activity directly in the Cloudflare Dashboard.
Watchlist Matched: api
Cloudflare Blog · cloud · 2026-04-30
Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...
Watchlist Matched: agents, api
Modal · inference-infra · 2026-04-15
Modal is an official sandbox provider for the OpenAI Agents SDK.
Watchlist Matched: agents, sdk
Modal · inference-infra · 2026-04-07
Product updates, community highlights, and upcoming events.
Watchlist Matched: blackwell, api
Together AI · inference-infra · 2025-12-12
No feed summary available yet.
Watchlist Matched: sdk
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
Watchlist Matched: api
Replicate · inference-infra · 2025-05-06
MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.
Watchlist Matched: api
Replicate · inference-infra · 2024-10-22
Stability AI's latest text-to-image model is now available on Replicate and you can run it with an API.
Watchlist Matched: model, api
Replicate · inference-infra · 2024-09-09
Create and run your own fine-tuned Flux models programmatically using Replicate's HTTP API.
Watchlist Matched: api
Replicate · inference-infra · 2024-08-15
We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.
Watchlist Matched: generation, fine-tuning, lora, api
Replicate · inference-infra · 2024-08-01
FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.
Watchlist Matched: model, api, open-source
Replicate · inference-infra · 2024-07-26
A top-tier open-ish language model, new safety classifiers, model search API
Watchlist Matched: model, api
SkyPilot · open-source · 2024-06-04
Announcing SkyPilot 0.6.
Watchlist Matched: api
Hugging Face · open-source · 2024-04-04
No feed summary available yet.
Watchlist Matched: api
Hugging Face · open-source · 2024-02-08
No feed summary available yet.
Watchlist Matched: api
Replicate · inference-infra · 2023-12-06
We’ve added fine-tuning for realistic voice cloning (RVC). You can train RVC on your own dataset from a YouTube video with a few lines of code using Replicate's API.
Watchlist Matched: fine-tuning, api, open-source
Replicate · inference-infra · 2023-11-23
The Yi series models are large language models trained from scratch by developers at 01.AI. Learn how to run them in the cloud with one line of code.
Watchlist Matched: cloud, api
Replicate · inference-infra · 2023-08-14
Our API now supports server-sent event streams for language models. Learn how to use them to make your apps more responsive.
Watchlist Matched: api
Replicate · inference-infra · 2023-08-08
We’ve added fine-tuning (Dreambooth, Textual Inversion and LoRA) support to SDXL 1.0. You can train SDXL on your own images with one line of code using the Replicate API.
Watchlist Matched: fine-tuning, lora, api
Replicate · inference-infra · 2023-07-26
How to run Stable Diffusion XL 1.0 using the Replicate API
Watchlist Matched: api
Hugging Face · open-source · 2023-05-01
No feed summary available yet.
Watchlist Matched: api
Replicate · inference-infra · 2022-08-29
How to use Replicate to integrate Stable Diffusion into hacks, apps, and projects
Watchlist Matched: api
Replicate · inference-infra · 2022-07-18
The basics of using the API to create your own images from text.
Watchlist Matched: api