MLSys Radar

Searchable long-term record

Archive

Every normalized post remains available for filtering and historical lookup.

Vast.ai · cloud · 2026-06-03

GPU Cloud

Score 13

No feed summary available yet.

hardware cloud

Open

High signal Matched: gpu, cloud

CoreWeave · cloud · 2026-06-03

GPU compute

Score 11

No feed summary available yet.

hardware

Open

High signal Matched: gpu

Nebius · cloud · 2026-06-03

AI Cloud

Score 9

No feed summary available yet.

cloud

Open

High signal Matched: cloud

Crusoe · cloud · 2026-06-03

Cloud Overview

Score 9

No feed summary available yet.

cloud

Open

High signal Matched: cloud

Crusoe · cloud · 2026-06-03

NVIDIA GB200

Score 9

No feed summary available yet.

hardware

Open

High signal Matched: gb200

Crusoe · cloud · 2026-06-03

NVIDIA B200

Score 9

No feed summary available yet.

hardware

Open

High signal Matched: b200

Crusoe · cloud · 2026-06-03

NVIDIA H200

Score 9

No feed summary available yet.

hardware

Open

High signal Matched: h200

Crusoe · cloud · 2026-06-03

NVIDIA H100

Score 9

No feed summary available yet.

hardware

Open

High signal Matched: h100

Crusoe · cloud · 2026-06-03

AMD MI300X

Score 9

No feed summary available yet.

hardware

Open

High signal Matched: mi300x

Crusoe · cloud · 2026-06-03

Cloud Partners

Score 9

No feed summary available yet.

cloud

Open

High signal Matched: cloud

FriendliAI · inference-infra · 2026-06-03

Model APIs

Score 15

No feed summary available yet.

model-release

Open

High signal Matched: model

FuriosaAI · hardware · 2026-06-03

Furiosa SDK

Score 15

No feed summary available yet.

korea api

Open

High signal Matched: furiosa, sdk

Anthropic · model-lab · 2026-06-03

Research

Score 14

No feed summary available yet.

research

Open

High signal Matched: research

NAVER D2 · korea · 2026-06-03

naver D2

Score 11

No feed summary available yet.

korea

Open

High signal Matched: naver

NAVER D2 · korea · 2026-06-03

NAVER Developers

Score 11

No feed summary available yet.

korea

Open

High signal Matched: naver

NAVER D2 · korea · 2026-06-03

NAVER Corp.

Score 11

No feed summary available yet.

korea

Open

High signal Matched: naver

Kakao Tech · korea · 2026-06-03

kakao

Score 11

No feed summary available yet.

korea

Open

High signal Matched: kakao

LG AI Research · korea · 2026-06-03

LG AI Research

Score 11

No feed summary available yet.

research

Open

High signal Matched: research

LMCache · open-source · 2026-06-03

A New Chapter for LMCache and the KV Cache Community

Score 17

TL;DR: A key contributor to the LMCache community just secured a major investment. This will greatly accelerate our mission of building the best KV cache library for every developer. Come join us in building the future AI-native data layer...

kv-cache

Open

High signal Matched: kv cache, lmcache

AWS Machine Learning Blog · cloud · 2026-06-03

The art and science of hyperparameter optimization on Amazon Nova Forge

Score 11

Fine-tuning for domain-specific tasks means improving performance in one area without degrading the model’s general capabilities, and getting that balance right is harder than it looks. This post walks through how to navigate that balance,...

benchmark model-release training fine-tuning

Open

High signal Matched: performance, model, training, checkpointing, fine-tuning

AWS Machine Learning Blog · cloud · 2026-06-03

Object detection with Amazon Nova 2 Lite

Score 9

In this post, we'll walk through implementing object detection with Amazon Nova 2 Lite. You'll learn how to deploy an object detection application using Amazon Bedrock, AWS Lambda, and Amazon API Gateway. You'll also learn how to craft eff...

cloud api

Open

High signal Matched: bedrock, api

Lambda · cloud · 2026-06-03

Introducing workspaces for Lambda Cloud

Score 17

Lambda workspaces help teams organize cloud resources, control access, and separate dev, staging, and production in shared GPU environments. A junior researcher kills a production training run. A contractor sees weights they shouldn't. If...

hardware model-release cloud training

Open

High signal Matched: gpu, introducing, weights, cloud, training

AWS Machine Learning Blog · cloud · 2026-06-02

Extending MCP support for Amazon Bedrock AgentCore Gateway

Score 11

While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized...

model-release cloud agents

Open

High signal Matched: model, bedrock, mcp

Lambda · cloud · 2026-06-01

Unbox one of NVIDIA's first co-packaged optics switches with us. See why we bet on CPO early.

Score 15

When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...

inference serving distributed benchmark hardware model-release rag agents

Open

High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic

Nota AI · korea · 2026-05-29

Full-Stack Optimization for Low-Light Video on Jetson Orin NX: From 400 ms to 28 ms

Score 23

  Jaehoon Lee Technical Content Manager, Nota AI   When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...

inference serving benchmark hardware model-release research quantization evals

Open

High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard

AWS Machine Learning Blog · cloud · 2026-05-29

Build a custom portal with embedded Amazon SageMaker AI MLflow Apps

Score 11

In this post, you learn how to build a custom portal with embedded SageMaker AI MLflow Apps UI. You walk through the architecture pattern behind a React front end paired with a Flask reverse proxy that handles AWS Signature Version 4 (SigV...

cloud

Open

High signal Matched: cloud, sagemaker

AWS Machine Learning Blog · cloud · 2026-05-29

Streamline external access to Amazon SageMaker MLflow using a REST API proxy

Score 11

In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation...

cloud api

Open

High signal Matched: cloud, sagemaker, api, sdk

AWS Machine Learning Blog · cloud · 2026-05-29

Evaluating Deep Agents using LangSmith on AWS

Score 9

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep...

research cloud evals agents

Open

High signal Matched: evaluation, bedrock, evals, evaluating, agent, agents

AWS Machine Learning Blog · cloud · 2026-05-29

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Score 13

Datasets in AgentCore is in public preview. Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchm...

benchmark research cloud evals agents

Open

High signal Matched: benchmark, evaluation, bedrock, agent

AMD ROCm Blogs · hardware · 2026-05-29

Enabling Speculative Speculative Decoding on MI300X

Score 29

Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...

inference speculative-decoding benchmark hardware model-release

Open

High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model

vLLM Project · open-source · 2026-05-28

Native RL APIs in vLLM

Score 11

As post-training workloads continue to scale, we've seen widespread adoption of vLLM as the inference engine of choice. However, two issues repeatedly arise:

inference training

Open

High signal Matched: inference, training, post-training

AMD ROCm Blogs · hardware · 2026-05-27

Deep Dive Into 4-Wave Interleave FP8 GEMM

Score 17

Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves th...

kernel benchmark model-release quantization

Open

High signal Matched: gemm, performance, fp8

AMD ROCm Blogs · hardware · 2026-05-25

AI Inference on AMD Ryzen™ AI Max Processor

Score 20

Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...

inference distributed hardware model-release cloud quantization evals

Open

High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate

Lambda · cloud · 2026-05-22

DeepSeek V4: the most expected open-source model ever released, and the quietest landing

Score 18

After 15 months of incremental updates, leaks, and rumored leaks, DeepSeek released version 4. It arrived without the fanfare R1 and R1-preview commanded in early 2025. That quiet reception is the most interesting thing about the release....

inference serving benchmark model-release open-source

Open

High signal Matched: inference, serving, performance, cost, release, model, open-source

SkyPilot · open-source · 2026-05-22

RL Doesn't Work on Slurm

Score 8

Online reinforcement learning for LLMs breaks Slurm's batch scheduling model. We'll discuss why, and what can be done about it.

model-release

Open

High signal Matched: model

AMD ROCm Blogs · hardware · 2026-05-22

From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

Score 30

Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...

inference serving kernel triton benchmark model-release cloud open-source

Open

High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source

LMCache · open-source · 2026-05-21

OpenAI API Is the New IPv4

Score 10

A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...

inference serving kv-cache api

Open

High signal Matched: serving, lmcache, api

Lambda · cloud · 2026-05-20

Lambda’s NVIDIA HGX B200 on STAC-AI™ LANG6

Score 18

What the numbers mean for financial services Executive summary Lambda is the first to publish an audited STAC-AI™ LANG6 result on NVIDIA HGX B200, with independently verified performance data that Financial Services Industry (FSI) infrastr...

inference benchmark hardware model-release evals

Open

High signal Matched: inference, generation, performance, gpu, h200, b200, model, evaluating

AMD ROCm Blogs · hardware · 2026-05-20

ROCm 7.13: Expanding Hardware, Tools, and Reach

Score 14

AMD released ROCm Core 7.13, the AMD GPU Driver 31.30, and AMD GPU Virtualization 9.0. With these releases, ROCm software expands hardware support across enterprise datacenters. The platform introduces AMD’s latest Instinct accelerators, e...

benchmark hardware open-source

Open

High signal Matched: performance, gpu, rocm, open-source

PyTorch Foundation · open-source · 2026-05-14

PyTorch 2.12 Release Blog

Score 12

We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due...

kernel cuda model-release

Open

High signal Matched: cuda, release

vLLM Project · open-source · 2026-05-14

Elastic Expert Parallelism in vLLM

Score 16

Expert parallelism (EP) is a key technique for serving Mixture-of-Experts (MoE) models at high throughput. WideEP deployments (where EP spans many workers) maximize KV cache capacity, enabling...

inference serving kv-cache moe benchmark

Open

High signal Matched: serving, throughput, kv cache, moe

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

Open

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

Together AI · inference-infra · 2026-05-11

Serving DeepSeek-V4: why million-token context is an inference systems problem

Score 22

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...

inference serving kernel hardware long-context api

Open

High signal Matched: inference, serving, endpoint, kernel, b200, long-context

BAIR · research · 2026-05-08

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Score 28

.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...

inference serving kv-cache speculative-decoding benchmark model-release research training fine-tuning evals long-context agents frontier-model

Open

High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model

Together AI · inference-infra · 2026-05-08

Deploy and inference any model from HuggingFace

Score 20

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.

inference hardware model-release

Open

High signal Matched: inference, gpu, release, model

LMCache · open-source · 2026-05-05

Deepseek V4 explained, and why it matters to your wallet

Score 12

DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...

kv-cache model-release

Open

High signal Matched: kv cache, lmcache, model

Nota AI · korea · 2026-04-29

[NVIDIA Nemotron Hackathon] Grand Prize Among 20 Teams: Behind Two Sleepless Days

Score 32

  Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...

inference moe benchmark model-release research korea training fine-tuning quantization evals agents

Open

High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic

Together AI · inference-infra · 2026-04-29

DeepSeek-V4 Pro now available on Together AI

Score 10

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.

research long-context agents

Open

High signal Matched: research, long-context, agents

LMCache · open-source · 2026-04-29

Stop Calling It KV Cache: It’s Something Much Bigger

Score 14

For years, we have referred to one of the most critical components of modern LLM inference as a “KV cache.” That name made sense once. Today, it is increasingly misleading. What began as a small, ephemeral optimization inside a...

inference kv-cache

Open

High signal Matched: inference, kv cache, lmcache

LMCache · open-source · 2026-04-23

LMCache on Amazon SageMaker HyperPod: Accelerating LLM Inference with Managed Tiered KV Cache

Score 30

Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...

inference kv-cache benchmark hardware model-release cloud

Open

High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker

Nota AI · korea · 2026-04-22

[Deep Dive: NetsPresso®] From Quantization to Graph Optimization: A Step-by-Step Model Deployment Pipeline

Score 54

  Jaehoon Lee Technical Content Manager, Nota AI   Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...

inference kernel cuda benchmark hardware model-release research korea training quantization evals api open-source

Open

High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source

LMCache · open-source · 2026-04-18

LMCache: A Journey

Score 12

GTC wrapped up a month ago. Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial...

kv-cache open-source

Open

High signal Matched: kv cache, lmcache, open-source

SqueezeBits · korea · 2026-04-14

Recap: 2nd vLLM Korea Meetup 2026

Score 12

Check out highlights from the 2nd vLLM Korea Meetup! open-source use cases and real-world production examples that showcase vLLM's technical maturity!

korea open-source

Open

High signal Matched: korea, open-source

vLLM Project · open-source · 2026-04-14

vLLM Korea Meetup 2026 Wrap-Up

Score 16

Hosted by the vLLM KR Community, with support from Rebellions, SqueezeBits, Red Hat APAC, and PyTorch Korea, the vLLM Korea Meetup 2026 was held in Seoul on April 2nd.

korea

Open

High signal Matched: korea, seoul, rebellions

Rebellions · hardware · 2026-04-13

2026 vLLM Korea Meetup

Score 14

vLLM KR 커뮤니티가 주관하고, 리벨리온(Rebellions), SqueezeBits, Red Hat APAC, PyTorch Korea가 함께한 vLLM Korea Meetup 2026이 4월 2일 서울에서 열렸습니다.... The post 2026 vLLM Korea Meetup appeared first on Rebellions.

korea

Open

High signal Matched: korea, rebellions

Nota AI · korea · 2026-04-08

[Overview: NetsPresso®] A Platform That Handles Everything from Model Optimization to Target Deployment

Score 36

  Jaehoon Lee Technical Content Manager, Nota AI   AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...

inference distributed kv-cache speculative-decoding benchmark hardware model-release research quantization evals

Open

High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate

Together AI · inference-infra · 2026-04-07

What is an AI Native Cloud?

Score 12

AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.

cloud

Open

High signal Matched: cloud

LMCache · open-source · 2026-04-04

LMCache’s New Architecture Boosts MoE Inference Performance by 10×

Score 34

Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...

inference serving kv-cache moe benchmark rag agents

Open

High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents

Rebellions · hardware · 2026-04-02

NPU 서버 기반 피지컬 AI, 아랍에미리트(UAE) 수질 정화 로봇 솔루션

Score 14

Summary Challenge 석유 및 가스 산업이 발달한 중동 지역에서는 원유 생산 과정에서 불가피하게 발생하는 폐수와 기름을 처리해야 합니다. 특히, 저수지와... The post NPU 서버 기반 피지컬 AI, 아랍에미리트(UAE) 수질 정화 로봇 솔루션 appeared first on Rebellions.

hardware korea

Open

High signal Matched: npu, rebellions

NVIDIA Technical Blog · hardware · 2026-04-01

CUDA Tile Programming Now Available for BASIC!

Score 12

Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it's also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1...

kernel cuda

Open

High signal Matched: cuda

Together AI · inference-infra · 2026-04-01

Inside the Together AI kernels team

Score 16

The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.

kernel hardware

Open

High signal Matched: kernel, flashattention, gpu

Nota AI · korea · 2026-03-31

The Real Reason TurboQuant Shook the Market: AI Optimization Has Gone Mainstream

Score 46

  Jaehoon Lee Technical Content Manager, Nota AI   In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...

inference serving kv-cache benchmark hardware model-release research training fine-tuning quantization agents frontier-model

Open

High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model

Together AI · inference-infra · 2026-03-31

Aurora

Score 12

1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.

inference speculative-decoding open-source

Open

High signal Matched: decoding, speculative decoding, open-source

vLLM Project · open-source · 2026-03-24

Model Runner V2: A Modular and Faster Core for vLLM

Score 12

We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...

model-release api

Open

High signal Matched: model, api

Nota AI · korea · 2026-03-23

[GTC 2026 Recap] The Trillion-Dollar Inference Race Begins: How Nota AI Fills the Gap

Score 42

  Jaehoon Lee Technical Content Manager, Nota AI   GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...

inference serving kernel cuda kv-cache benchmark hardware model-release research cloud training long-context agents open-source

Open

High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source

Nota AI · korea · 2026-03-20

GenAI Everywhere: The Future of Edge AI Optimization with the New NetsPresso®

Score 26

  NP Product Team, Nota AI   The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...

inference kv-cache moe benchmark model-release research korea quantization

Open

High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization

Together AI · inference-infra · 2026-03-17

Mamba-3

Score 10

Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.

inference open-source

Open

High signal Matched: inference, open-source

Nota AI · korea · 2026-03-13

NotaMoEQuantization: An MoE-Specific Quantization Method for Solar-Open-100B

Score 62

  Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...

inference serving moe benchmark hardware model-release research korea training quantization evals long-context open-source

Open

High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context

BAIR · research · 2026-03-13

Identifying Interactions at Scale for LLMs

Score 18

--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...

inference serving benchmark model-release research training evals long-context rag

Open

High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag

llm-d · open-source · 2026-03-13

Predicted-Latency Based Scheduling for LLMs

Score 18

A lightweight ML model trained online from live traffic replaces manually tuned heuristic weights with direct latency predictions, achieving 43% improvement in P50 end-to-end latency and 70% improvement in TTFT on a production-realistic wo...

benchmark model-release

Open

High signal Matched: latency, ttft, model, weights

Together AI · inference-infra · 2026-03-12

Build real-time voice agents on Together AI

Score 10

Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.

benchmark agents

Open

High signal Matched: latency, agents

Together AI · inference-infra · 2026-03-04

Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving

Score 20

Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM...

inference serving benchmark long-context

Open

High signal Matched: inference, serving, prefill, throughput, long-context

vLLM Project · open-source · 2026-03-04

vLLM Triton Attention Backend Deep Dive

Score 14

This article is adapted from a Red Hat hosted vLLM Office Hours session with Burkhard Ringlein from IBM Research, featuring a deep technical walkthrough of the vLLM Triton attention backend....

kernel triton research

Open

High signal Matched: triton, research

AIBrix · open-source · 2026-03-03

AIBrix v0.6.0 Release: Envoy Sidecar, Mixed LLM Workloads Routing, Routing Profiles, LoRA Delivery & New APIs

Score 28

🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...

inference model-release fine-tuning rag api

Open

High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible

Together AI · inference-infra · 2026-03-02

Introducing Together AI’s new look

Score 14

We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.

model-release research open-source

Open

High signal Matched: introducing, research, open-source

SkyPilot · open-source · 2026-02-27

Don't Run OpenClaw on Your Main Machine

Score 8

OpenClaw gives an AI agent full access to your system. Here's why you should run it on an isolated cloud VM, and how to set that up.

cloud agents

Open

High signal Matched: cloud, agent

Nota AI · korea · 2026-02-26

ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models

Score 24

  Jewon Lee | Wooksu Shin | Seungmin Yang | Ki-Ung Song | Donguk Lim | Jaeyeon Kim | Tae-Ho Kim |  Bo-Kyeong KimEdgeFM Team, Nota AI ✔️ Resources for more information: GitHub, ArXiv, Project Page, Demo.✔️ Accepted at ICLR 2026. &...

inference speculative-decoding benchmark model-release research training evals

Open

High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks

Replicate · inference-infra · 2026-02-18

Recraft V4: image generation with design taste

Score 8

Recraft V4 generates art-directed images — and actual editable SVGs — with strong composition, accurate text rendering, and what the Recraft team calls "design taste." Four models are available on Replicate now.

inference

Open

High signal Matched: generation

Together AI · inference-infra · 2026-02-06

What do LLMs think when you don't tell them what to think about?

Score 10

What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-02-02

Fine-tuning open LLM judges to outperform GPT-5.2

Score 14

Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...

inference benchmark model-release fine-tuning evals open-source

Open

High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss

vLLM Project · open-source · 2026-01-31

Streaming Requests & Realtime API in vLLM

Score 12

Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...

inference model-release api

Open

High signal Matched: inference, model, api

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

inference benchmark model-release research training evals agents open-source

Open

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

Together AI · inference-infra · 2026-01-13

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Score 24

Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...

inference benchmark hardware model-release quantization agents

Open

High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents

BAIR · research · 2026-01-10

Information-Driven Design of Imaging Systems

Score 12

An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements distinguish objects. Man...

benchmark model-release research training evals

Open

High signal Matched: performance, model, paper, evaluation, training, evaluate

Rebellions · hardware · 2025-12-29

LLM/RAG 기반 몽골 관세청 물품 분류 코드 AI 추천 챗봇

Score 10

Summary Challenge 관세청은 매년 방대한 양의 수출입 신고서를 처리하며, 각 품목에 적합한 HS 코드(Harmonized System Code)를 정확하게 분류해야 하는 업무를... The post LLM/RAG 기반 몽골 관세청 물품 분류 코드 AI 추천 챗봇 appeared first on Rebellions.

korea rag

Open

High signal Matched: rebellions, rag

SqueezeBits · korea · 2025-12-24

Introducing rebellions ATOM™-MAX

Score 24

Introducing ATOM™-Max, rebellions’ next-generation NPU designed for high-performance AI inference. Learn how its runtime, profiling tools, and PyTorch-native integrations enable developers to run and serve models efficiently without sacrif...

inference serving benchmark hardware model-release korea

Open

High signal Matched: inference, generation, serve, performance, npu, introducing, rebellions

Nota AI · korea · 2025-12-19

NVIDIA Blackwell; The Impact of NVFP4 For LLM Inference

Score 74

  Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...

inference serving kernel cuda distributed benchmark hardware model-release research training quantization evals rag

Open

High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval

Together AI · inference-infra · 2025-12-17

Research POV: Yes, AGI Can Happen – A Computational Perspective

Score 14

Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...

benchmark research

Open

High signal Matched: performance, research

vLLM Project · open-source · 2025-12-13

Diving into speculative decoding training support for vLLM with Speculators v0.3.0

Score 24

- Speculative decoding serves as an optimization to improve inference performance; however, training a unique draft model for each LLM can be difficult and time-consuming, while production-ready...

inference speculative-decoding benchmark model-release training

Open

High signal Matched: inference, decoding, speculative decoding, draft model, performance, model, training

Together AI · inference-infra · 2025-12-03

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Score 20

AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...

inference speculative-decoding model-release

Open

High signal Matched: inference, decoding, speculative decoding, introducing

Together AI · inference-infra · 2025-12-01

Together AI delivers fastest inference for the top open-source models

Score 20

Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...

inference speculative-decoding hardware quantization evals open-source

Open

High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source

AIBrix · open-source · 2025-11-26

PrisKV: A Colocated Tiered KVCache Store for LLM Serving

Score 22

In recent years, large language models (LLMs) such as GPT, DeepSeek, Doubao and Qwen have advanced rapidly and are reshaping a wide range of industries. As the Scaling Law continues to be validated and pushed to its limits, LLM capabilitie...

inference serving benchmark

Open

High signal Matched: inference, serving, generation, throughput, performance, latency, cost

Rebellions · hardware · 2025-11-20

NPU로 구동되는 AI 기반 동물 영상 진단 보조 서비스

Score 14

Summary Challenge 최근 반려동물 양육 인구의 증가로 X-ray 영상 진단 수요가 빠르게 확대되고 있습니다. 그러나 국내 영상의학 전공 수의사는 수백... The post NPU로 구동되는 AI 기반 동물 영상 진단 보조 서비스 appeared first on Rebellions.

hardware korea

Open

High signal Matched: npu, rebellions

AIBrix · open-source · 2025-11-10

AIBrix v0.5.0 Release: Batch API, KVCache v1 Connector, and Enhanced P/D orchestration

Score 22

🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...

inference benchmark model-release research evals api

Open

High signal Matched: prefill, latency, release, evaluation, api, openai-compatible

Rebellions · hardware · 2025-11-07

vLLM Hands-on Workshop WrapUp

Score 14

리벨리온 NPU에서 직접 경험한 LLM 추론의 새로운 가능성 지난 8월 vLLM Korea Meetup에 이어, 10월 29일 리벨리온과 스퀴즈비츠 주관으로 vLLM... The post vLLM Hands-on Workshop WrapUp appeared first on Rebellions.

hardware korea

Open

High signal Matched: npu, korea, rebellions

Rebellions · hardware · 2025-10-20

지속 가능한 AI 확장을 위하여: 데이터센터 연산과 전력 공급의 혁신

Score 10

Summary Challenge 초대형 AI 시설은 이미 소도시 규모의 전력을 소비하고 있습니다. 단일 사이트의 수요가 100~200MW에 달해 소형 원자로급 수준입니다. AI... The post 지속 가능한 AI 확장을 위하여: 데이터센터 연산과 전력 공급의 혁신 appeared first on Rebellions.

korea

Open

High signal Matched: rebellions

Rebellions · hardware · 2025-09-17

The First vLLM Meetup in Korea

Score 14

리벨리온(Rebellions)과 레드햇(Rad Hat)이 주최하고 파이토치 코리아와 스퀴즈비츠(SqueezeBits)가 함께 기획한 제1회 vLLM 커뮤니티 밋업 코리아 행사가 2025년 8월 19일 서울에서 열렸습니다.... The post The First vLLM Meetup in Korea appeared first on Rebellions.

korea

Open

High signal Matched: korea, rebellions

SqueezeBits · korea · 2025-09-16

Guided Decoding Performance on vLLM and SGLang

Score 16

The guide to LLM guided decoding! This deep-dive benchmark compares XGrammar and LLGuidance on vLLM and SGLang to help you find the optimal setup for generating structured output based on your use case.

inference benchmark

Open

High signal Matched: decoding, benchmark, performance

BAIR · research · 2025-09-01

What exactly does word2vec learn?

Score 14

What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern lan...

benchmark model-release research training

Open

High signal Matched: benchmark, performance, model, weights, paper, training

Rebellions · hardware · 2025-08-21

AI로 예방 중심의 건설 & 플랜트 프로젝트 현장 안전 관리 실현

Score 14

비전 모델과 언어 모델을 결합한 멀티모달, GPU와 NPU를 결합한 하이브리드 인프라로 기존 시스템의 제약을 극복하는 차별화된 AI 기반 안전 관제 시스템, ‘AI 비전 인텔리전스'를 개발한 코오롱베니트의 사례 The post AI로 예방 중심의 건설 & 플랜트 프로젝트 현장 안전 관리 실현 appeared first on Rebellions.

hardware korea

Open

High signal Matched: gpu, npu, rebellions

Rebellions · hardware · 2025-08-21

SOC의 보안 위협 탐지와 대응에 LLM 기반 AI 접목

Score 10

Summary Challenge 현대의 보안관제센터(Security Operation Center, SOC)는 세 가지 과제를 동시에 해결해야 하는 트릴레마(Trilemma) 상황에 놓여 있습니다. 새로운 유형의 공격을... The post SOC의 보안 위협 탐지와 대응에 LLM 기반 AI 접목 appeared first on Rebellions.

korea

Open

High signal Matched: rebellions

Rebellions · hardware · 2025-08-21

학습용 현실 데이터 생성: 생성형 AI로 구현하는 Physical AI

Score 10

Physical AI를 위한 로봇 학습용 데이터 생성과 활용 방안은? Physical AI가 도입되어 실제 환경과 AI가 상호작용하기 위해서는 모델이 매우 정교하게... The post 학습용 현실 데이터 생성: 생성형 AI로 구현하는 Physical AI appeared first on Rebellions.

korea

Open

High signal Matched: rebellions

SkyPilot · open-source · 2025-08-12

Self-host open-source LLM agent sandbox on your own cloud

Score 10

Your AI writes code. Now what? If you’re building AI agents in 2025, you probably wondered that as well. Your LLM generates some Python code that analyzes data, manipulates files, or calls APIs. But where does it run? Most people eit...

cloud agents open-source

Open

High signal Matched: cloud, agent, agents, open-source

AIBrix · open-source · 2025-08-05

AIBrix v0.4.0 Release: P/D Disaggregation and Expert Parallelism Support, KVCache v1 Connector, KV Event Synchronization & Multi‑Engine Support

Score 20

AIBrix is a composable, cloud‑native LLM inference infrastructure designed to deliver high performance and low cost at scale. We now present a major update in a new release - v0.4.0. This release tackles key bottlenecks in orchestration an...

inference serving benchmark hardware model-release cloud

Open

High signal Matched: inference, prefill, generation, token generation, throughput, performance, cost, gpu, release, cloud

SqueezeBits · korea · 2025-07-21

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

Score 20

LoRA excels at efficient fine-tuning but suffers at higher ranks due to gradient entanglement. We introduce GraLoRA, which addresses these issues through finer-grained, block-wise updates, significantly enhancing performance and expressivi...

benchmark fine-tuning

Open

High signal Matched: performance, cost, fine-tuning, lora

SkyPilot · open-source · 2025-07-16

The Evolution of AI Job Orchestration. Part 2: The AI-Native Control Plane & Orchestration that Finally Works for ML

Score 16

This is Part 2 of our series on the evolution of AI Job Orchestration. In Part 1, we explored how Neoclouds are democratizing GPU access but leaving the “last mile” unsolved. Now we’ll discover how AI-native orchestration...

distributed benchmark hardware cloud

Open

High signal Matched: infiniband, performance, cost, gpu, cloud

Nota AI · korea · 2025-07-10

Video Self-Distillation for Single-Image Encoders: Learning Temporal Priors from Unlabeled Video

Score 20

  Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH   SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...

inference benchmark model-release research training fine-tuning evals

Open

High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks

Replicate · inference-infra · 2025-07-07

Compare AI video models

Score 8

It's hard keeping up with every new video model. In this post we'll help you pick the best one for your needs.

model-release

Open

High signal Matched: model

BAIR · research · 2025-07-01

Whole-Body Conditioned Egocentric Video Prediction

Score 10

.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...

inference benchmark model-release research training evals agents

Open

High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents

llm-d · open-source · 2025-06-25

llm-d Community Update - June 2025

Score 10

Help shape llm-d's future: Take our 5-minute community survey, subscribe to our YouTube channel, and access exclusive resources for LLM serving innovation.

inference serving

Open

High signal Matched: serving

Modal · inference-infra · 2025-06-09

Introducing: Modal 1.0

Score 10

We've released v1.0 of the Modal client, marking a new milestone of maturity and stability for our platform.

model-release

Open

High signal Matched: introducing

llm-d · open-source · 2025-06-03

llm-d Week 1 Project News Round-Up

Score 12

llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.

inference api

Open

High signal Matched: inference, api

AIBrix · open-source · 2025-05-22

AIBrix v0.3.0 Release: KVCache Offloading, Prefix Cache, Fairness Routing, and Benchmarking Tools

Score 24

AIBrix is a composable, cloud-native AI infrastructure toolkit designed to power scalable and cost-effective large language model (LLM) inference. As production demands for memory-efficient and latency-aware LLM services continue to grow,...

inference kv-cache benchmark model-release cloud

Open

High signal Matched: inference, prefix cache, latency, cost, release, model, cloud

llm-d · open-source · 2025-05-20

llm-d Press Release

Score 20

Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.

inference distributed model-release cloud open-source

Open

High signal Matched: inference, distributed, release, cloud, open source

Nota AI · korea · 2025-05-08

SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization for Edge AI Devices

Score 20

  Jaewoo SongSoftware Engineer, Nota AI   SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...

benchmark model-release research quantization

Open

High signal Matched: performance, model, weights, research, quantization, int8, int4

Nota AI · korea · 2025-05-07

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features</span#x3E;

Score 28

&nbsp; Jewon Lee | Ki-Ung Song | Seungmin Yang | Donguk Lim | Jaeyeon Kim | Wooksu Shin | Bo-Kyeong Kim | Tae-Ho KimEdgeFM Team, Nota AI Yong Jae Lee, Ph. D.Associate Professor, UW-Madison &nbsp; SummaryOur method, Trimmed-Llama, reduces t...

inference kv-cache benchmark model-release research training evals open-source

Open

High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source

Modal · inference-infra · 2025-04-18

How sync. uses Modal to lipsync 100 hours of video a day

Score 8

sync. is a research lab training foundational models to understand and manipulate humans in video. After outgrowing Google Colab, they partnered with Modal for efficient deployment, allowing rapid iteration and scaling to process over 100...

research training

Open

High signal Matched: research, training

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

benchmark model-release research training fine-tuning evals rag api frontier-model

Open

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

BAIR · research · 2025-04-08

Repurposing Protein Folding Models for Generation with Latent Diffusion

Score 20

PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...

inference benchmark model-release research training rag

Open

High signal Matched: inference, generation, cost, model, weights, research, training, retrieval

Nota AI · korea · 2025-04-08

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Score 24

&nbsp; Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO &amp; Co-Founder, Nota AI &nbsp; SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...

inference kernel benchmark model-release research evals

Open

High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks

SqueezeBits · korea · 2025-03-26

TensorRT-LLM Goes Open Source!

Score 12

With TensorRT-LLM now open source, we can finally take a deep dive into the secret sauce behind its impressive performance.

benchmark open-source

Open

High signal Matched: performance, open source

AIBrix · open-source · 2025-03-10

DeepSeek-R1 671B multi-host Deployment in AIBrix

Score 20

This blog post introduces deploying DeepSeek R1 using AIBrix. DeepSeek-R1 demonstrates remarkable proficiency in reasoning tasks through step-by-step training process. It features 671B total parameters with 37B active parameters, and 128k...

inference distributed benchmark model-release training long-context

Open

High signal Matched: inference, distributed, benchmark, model, weights, training, context length

SkyPilot · open-source · 2025-03-05

Abusing SQLite to Handle Concurrency

Score 8

SkyPilot uses the venerable SQLite for state management. SQLite can handle millions of QPS, and terabytes of data. However, our efforts to scale our Managed Jobs feature ran up against the one downfall of SQLite: many concurrent writers. S...

benchmark

Open

High signal Matched: qps

SkyPilot · open-source · 2025-02-26

Using DeepSeek R1 for RAG: Do's and Don'ts

Score 10

DeepSeek R1 has shown great reasoning capability when it is firstly released. In this blog post, we detail our learnings in using DeepSeek R1 to build a Retrieval-Augmented Generation (RAG) system, tailored for legal documents. We choose l...

inference research rag

Open

High signal Matched: generation, research, rag, retrieval-augmented generation, retrieval

AIBrix · open-source · 2025-02-21

Introducing AIBrix: Cost-Effective and Scalable Control Plane for vLLM

Score 26

Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...

inference benchmark model-release agents open-source

Open

High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source

AIBrix · open-source · 2025-02-19

AIBrix v0.2.0 Release: Distributed KV Cache, Orchestration and Heterogeneous GPU Support

Score 42

We&rsquo;re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...

inference serving distributed kv-cache benchmark hardware model-release agents

Open

High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent

SqueezeBits · korea · 2024-11-21

[Intel Gaudi] #1. Introduction

Score 12

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

benchmark hardware evals

Open

High signal Matched: performance, accelerator, evaluate

Replicate · inference-infra · 2024-11-15

NVIDIA L40S GPUs are here

Score 8

NVIDIA L40S GPUs are here, with better performance and lower cost.

benchmark

Open

High signal Matched: performance, cost

AIBrix · open-source · 2024-11-13

Introducing AIBrix v0.1.0: Building the Future of Scalable, Cost-Effective AI Infrastructure for Large Models

Score 32

In recent years, large language models (LLMs) have revolutionized AI applications, powering solutions in areas like chatbots, automated content generation, and advanced recommendation engines. Services like OpenAI’s have gained significant...

inference kv-cache benchmark hardware model-release cloud open-source

Open

High signal Matched: decoding, prefill, generation, kv cache, performance, cost, gpu, release, introducing, cloud, open-source

Replicate · inference-infra · 2024-10-03

FLUX1.1 [pro] is here

Score 10

Black Forest Labs continue to push boundaries with their latest release of FLUX.1 image generation model.

inference model-release

Open

High signal Matched: generation, release, model

SqueezeBits · korea · 2024-10-01

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Score 22

This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks for serving LLMs, evaluating their performance based on key metrics like throughput, TTFT, and TPOT to offer insights for practitioners in optimizing LLM depl...

inference serving benchmark research evals

Open

High signal Matched: serving, throughput, performance, ttft, tpot, evaluation, evaluating

Modal · inference-infra · 2024-08-06

GPU prices are falling...

Score 10

...and we're passing the savings to you. 15-30% price cuts on GPUs and CPUs.

hardware

Open

High signal Matched: gpu

Nota AI · korea · 2024-08-02

Deploying an Efficient Vision-Language Model on Mobile Devices

Score 38

&nbsp; Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI &nbsp; IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...

inference benchmark model-release research cloud training fine-tuning evals open-source

Open

High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source

Replicate · inference-infra · 2024-06-12

H100s are coming to Replicate

Score 8

We'll soon support NVIDIA's H100 GPUs for predictions and training. Let us know if you want early access.

hardware training

Open

High signal Matched: h100, training

Replicate · inference-infra · 2024-06-12

Run Stable Diffusion 3 with an API

Score 8

Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.

model-release cloud api

Open

High signal Matched: model, cloud, api

Modal · inference-infra · 2024-02-27

Introducing: WebSockets on Modal

Score 10

Modal now supports WebSocket connections, enabling real-time, bidirectional data transfer between client and server.

model-release

Open

High signal Matched: introducing

Replicate · inference-infra · 2023-07-27

Run Llama 2 with an API

Score 8

Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.

model-release cloud api open-source

Open

High signal Matched: model, cloud, api, open source

Hugging Face · open-source · 2022-12-20

Model Cards

Score 10

No feed summary available yet.

model-release

Open

High signal Matched: model

PyTorch Foundation · open-source · 2026-06-04

Using Muon Optimizer with DeepSpeed

Score 4

TL;DR DeepSpeed now supports Muon Optimizer! Muon Optimizer has gained great momentum with significant adoption from frontier AI Labs. One of those AI Labs is Moonshot AI, which has adopted...

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Glossary

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Examples

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Quickstart

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Introduction

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Local Installation

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Building from Source

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Kubernetes Deployment

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Contribution Guide

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Support Matrix

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Feature Matrix

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

NVIDIA Dynamo · open-source · 2026-06-03

Digest

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Mooncake · open-source · 2026-06-03

Build Guide

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Mooncake · open-source · 2026-06-03

Quick Start

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Mooncake · open-source · 2026-06-03

Observability

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

All Careers

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

Perplexity Enterprise

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

Brand Guidelines

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

Inquiries

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

Supply Store

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

Security

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Docs

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Get Started

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Skip to content

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Community

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Models

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

GitHub

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Moreh · korea · 2026-06-03

Moreh

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Moreh · korea · 2026-06-03

Website

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Moreh · korea · 2026-06-03

Next Overview

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Home

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Install on AKS

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Install on EKS

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Install on GKE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Configure autoscaling

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

KubeAI · open-source · 2026-06-03

Install models

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Team

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Publications

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

GitHub Issues

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Contributors

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

xLLM · open-source · 2026-06-03

Docker

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

Get Support

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

DigitalOcean

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

Browse all products

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

See all solutions

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

Blog homepage

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

DigitalOcean AI/ML · cloud · 2026-06-03

Partners

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Gcore · cloud · 2026-06-03

Start onboarding

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Gcore · cloud · 2026-06-03

Under attack?

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Gcore · cloud · 2026-06-03

Sign up for free

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Perplexity Research · model-lab · 2026-06-03

We're Hiring

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Skip to main content

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Login

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

LAB01

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

COMPUTE02

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

RESEARCH03

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Careers24

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Book a call

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Start training

Score 6

No feed summary available yet.

training

Open

Watchlist Matched: training

Prime Intellect · inference-infra · 2026-06-03

Latest

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Announcements

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Prime Intellect · inference-infra · 2026-06-03

Partnerships

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

TensorRT LLM

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Overview

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Quick Start Guide

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Installation

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Installation Guide

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Build from Source

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Container Images

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Supported Hardware

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

LLM Examples

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Generate text

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

TensorRT-LLM · open-source · 2026-06-03

Sparse Attention

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

OpenAI · model-lab · 2026-06-03

Developers

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

OpenAI · model-lab · 2026-06-03

Learn about safety

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

BentoML · inference-infra · 2026-06-03

Sign UpSign Up

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Discord

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

LinkedIn

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Sign up

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

Main menu

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

View the platform

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

Why CoreWeave?

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

CoreWeave ARENA

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

NVIDIA Hopper

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

NVIDIA Ada Lovelace

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

CPU compute

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

Bare metal servers

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

Networking

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

CoreWeave · cloud · 2026-06-03

AI Object storage

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Cerebrium

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Large Language Models

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Voice

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Image & Video

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

More examples

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Book a demo

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Cerebrium · inference-infra · 2026-06-03

Twitter

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Nebius · cloud · 2026-06-03

Privacy Policy.

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Nebius · cloud · 2026-06-03

Self-service

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Nebius · cloud · 2026-06-03

Token Factory

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Crusoe · cloud · 2026-06-03

Command Center

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Crusoe · cloud · 2026-06-03

Crusoe Edge Zones

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Crusoe · cloud · 2026-06-03

AMD MI355X

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Crusoe · cloud · 2026-06-03

Data Prep

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Crusoe · cloud · 2026-06-03

Data Centers

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Python SDK

Score 0

No feed summary available yet.

api

Open

Watchlist Matched: sdk

Vast.ai · cloud · 2026-06-03

API

Score 0

No feed summary available yet.

api

Open

Watchlist Matched: api

Vast.ai · cloud · 2026-06-03

Clusters

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Serverless

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Hosting

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Financing

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Hardware

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Earnings Calculator

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

Use Cases

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Vast.ai · cloud · 2026-06-03

AI/ML Frameworks

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

LMSYS · open-source · 2026-06-03

About

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

LMSYS · open-source · 2026-06-03

Blog

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Read full article

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Talk to an engineer

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Dedicated Endpoints

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Container

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Why FriendliAI

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FriendliAI · inference-infra · 2026-06-03

Agents

Score 6

No feed summary available yet.

agents

Open

Watchlist Matched: agents

FriendliAI · inference-infra · 2026-06-03

Chatbots

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FuriosaAI · hardware · 2026-06-03

Hugging Face Hub

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FuriosaAI · hardware · 2026-06-03

Customer Support

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FuriosaAI · hardware · 2026-06-03

Forums

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FuriosaAI · hardware · 2026-06-03

Newsroom

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

FuriosaAI · hardware · 2026-06-03

Careers

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

LMSYS · open-source · 2026-06-03

Contact

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Fireworks AI · inference-infra · 2026-06-03

Training

Score 6

No feed summary available yet.

training

Open

Watchlist Matched: training

Fireworks AI · inference-infra · 2026-06-03

Log In

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

Learn more here

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

Customers

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

All

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

AI engineering

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

Infrastructure

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

AI models

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

Product

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Baseten · inference-infra · 2026-06-03

Foundations

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

LMSYS · open-source · 2026-06-03

Projects

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

LMSYS · open-source · 2026-06-03

Donations

Score 6

No feed summary available yet.

Open

Watchlist Matched: none

Moonshot AI Kimi · model-lab · 2026-06-03

Console

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax M3

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax M2.7

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax M2.5

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax Speech 2.8

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax Music 2.6

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

MiniMax Code

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

Video Hailuo

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

Audio

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

Talkie

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

News

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

Investor Relations

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

MiniMax · model-lab · 2026-06-03

Contact Us

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Z.AI · model-lab · 2026-06-03

营业执照

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Z.AI · model-lab · 2026-06-03

许可证

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Contact sales

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Start building

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Plans

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

API pricing

Score 5

No feed summary available yet.

api

Open

Watchlist Matched: api

Mistral AI · model-lab · 2026-06-03

For enterprises

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Delivery methodology

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Financial services

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Manufacturing

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Use case overview

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Coding

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Document intelligence

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Mistral AI · model-lab · 2026-06-03

Speech

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Anthropic · model-lab · 2026-06-03

Skip to footer

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Anthropic · model-lab · 2026-06-03

Economic Futures

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Anthropic · model-lab · 2026-06-03

Try Claude

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Anthropic · model-lab · 2026-06-03

Developer docs

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Products

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Solutions

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Developer

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Company

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Pricing

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Try for free

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Business

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Government

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

iOS

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Android

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Grok on X

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

API Console

Score 5

No feed summary available yet.

api

Open

Watchlist Matched: api

xAI · model-lab · 2026-06-03

Documentation

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

CLI

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

xAI · model-lab · 2026-06-03

Read More

Score 5

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Groq

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

GroqCloud

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

LPU Architecture

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

See Pricing

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Customer Stories

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Changelog

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Whitepapers

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Subscribe

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Groq · hardware · 2026-06-03

Free API key

Score 4

No feed summary available yet.

api

Open

Watchlist Matched: api

Groq · hardware · 2026-06-03

Enterprises

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Anyscale

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

About Us

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Resources

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Ray Training

Score 4

No feed summary available yet.

training

Open

Watchlist Matched: training

Anyscale · inference-infra · 2026-06-03

Ray Docs

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Anyscale Docs

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Anyscale Platform

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Anyscale Support

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Newsletter signup

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Website Terms of Use

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Privacy Policy

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Cookie Policy

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Service Status

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Trust Center

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

In the News

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Press kit

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cerebras · hardware · 2026-06-03

Customer Spotlight

Score 4

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Learn more

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Customization

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Models Overview

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Technology

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Energy and Utilities

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Cohere · model-lab · 2026-06-03

Public Sector

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Pricing & Licensing

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Automotive

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Consumer Electronics

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Ecommerce

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Finance

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Industrial & Robotics

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Case Studies

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Demos

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Developer Community

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Hackathons

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Liquid AI · model-lab · 2026-06-03

Our Partners

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

메뉴

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

D2 News

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

About D2

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

DEVIEW

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

OpenSource

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

NAVER D2 · korea · 2026-06-03

D2 STARTUP FACTORY

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

본문 바로가기

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

메뉴 바로가기

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

More →

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

tech

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

recruitment

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

career

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

ai

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

conference

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

new-krew

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

ifkakao

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

meetup

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

developer relations

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

AI Native

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

frontend

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Kakao Tech · korea · 2026-06-03

front-end

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Upstage · korea · 2026-06-03

Insurance

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Upstage · korea · 2026-06-03

Healthcare

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Upstage · korea · 2026-06-03

Apps

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

MISSION

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

LEADERSHIP

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

ETHICS PRINCIPLES

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

LOCATION

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

SOLUTION

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

EXAONE Showroom

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

SUPERINTELLIGENCE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

EXAONE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

LANGUAGE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

PHYSICAL INTELLIGENCE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

BIO INTELLIGENCE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

DATA INTELLIGENCE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

MATERIALS INTELLIGENCE

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

PUBLICATION

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

LG AI Research · korea · 2026-06-03

RECRUIT

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

People

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

Report

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

HELM

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

Levanter

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

FMTI

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

Openness

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

Ecosystem Graphs

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

Policy

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

HELM Arabic Enterprise

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Stanford CRFM · research · 2026-06-03

HELM Arabic

Score 2

No feed summary available yet.

Open

Watchlist Matched: none

Anyscale · inference-infra · 2026-06-03

Events

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

GPUs

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

Sign In

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

X

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

YouTube

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

Maas

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

GMI Cloud · cloud · 2026-06-03

Studio

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

LightSeek Foundation · research · 2026-06-03

LightSeek Foundation

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Databricks AI · big-tech · 2026-06-03

For App Developers

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Databricks AI · big-tech · 2026-06-03

For Executives

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Databricks AI · big-tech · 2026-06-03

For Startups

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Databricks AI · big-tech · 2026-06-03

Lakehouse Architecture

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-06-02

How we reduced core unit boot time from hours to minutes

Score 0

We investigated why firmware updates were causing our core servers to take four hours to reboot. By diving into UEFI data structures and iPXE automation, we eliminated unnecessary timeouts and cut boot times back down to minutes.

Open

Watchlist Matched: none

Microsoft Research · big-tech · 2026-05-29

Data Formulator 0.7: AI-powered data analytics for enterprise data

Score 5

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into...

research agents

Open

Watchlist Matched: research, agents

SqueezeBits · korea · 2026-05-28

2026 Efficient AI Offline Meetup

Score 2

Wrap up 8 weeks of online studies and take a look at how SqueezeBits makes an effort to maintain the AI compression community to expand!

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-05-28

Iran's Internet is partially restored, Cloudflare Radar data shows

Score 0

Cloudflare Radar data confirms early indications of a partial Internet restoration in Iran, nearly three months after the shutdown began. Traffic spikes and DNS queries have risen, but network activity is currently just 40% of pre-shutdown...

Open

Watchlist Matched: none

Microsoft Research · big-tech · 2026-05-28

Extending Human Intelligence Through AI

Score 5

Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research.

research

Open

Watchlist Matched: research

PyTorch Foundation · open-source · 2026-05-21

PyTorch Docathon 2026 Results in 150+ Merged Pull Requests

Score 1

Thank you to everyone who participated in the PyTorch Docathon 2026! Once again, the community showed up with incredible energy and dedication to make PyTorch documentation better for developers everywhere....

Open

Watchlist Matched: none

AI2 · research · 2026-05-21

Building accessibility tools on a truly open foundation

Score 0

PointCheck, an independent project, uses Molmo, MolmoWeb, and Olmo 3 to test web accessibility the way a keyboard user would—by navigating real pages and inspecting what's actually on screen.

Open

Watchlist Matched: none

NVIDIA Technical Blog · hardware · 2026-05-20

Mastering Agentic Techniques: AI Agent Customization

Score 3

Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating...

agents

Open

Watchlist Matched: agent, agents, agentic

Cloudflare Blog · cloud · 2026-05-19

Announcing Claude Managed Agents on Cloudflare

Score 0

Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to pr...

agents

Open

Watchlist Matched: agent, agents

AI2 · research · 2026-05-19

OlmoEarth v1.1: A more efficient family of models

Score 6

OlmoEarth v1.1 is a more efficient family of remote-sensing models that cuts compute costs by up to 3x while maintaining similar performance, making large-scale satellite mapping faster and cheaper to run.

benchmark

Open

Watchlist Matched: performance

Cloudflare Blog · cloud · 2026-05-18

Project Glasswing: what Mythos showed us

Score 0

In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them needs to look like...

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-05-08

Building for the future

Score 0

This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloudflare.

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-05-07

How Cloudflare responded to the “Copy Fail” Linux vulnerability

Score 4

When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no maliciou...

kernel

Open

Watchlist Matched: kernel

Cloudflare Blog · cloud · 2026-04-30

Agents can now create Cloudflare accounts, buy domains, and deploy

Score 0

Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...

agents api

Open

Watchlist Matched: agents, api

Lambda · cloud · 2026-04-30

Creating highly efficient agents: 450M tool-calling tokens distilled for post-training from top open-source models

Score 4

Harnesses If you've used Claude Code or Codex, you've used a harness. A harness is the infrastructure layer that wraps an AI coding agent and decides how it operates, what it can touch, and how you measure whether it worked. It's how most...

hardware training agents open-source

Open

Watchlist Matched: gpu, training, post-training, agent, agents, open-source

Together AI · inference-infra · 2026-04-30

Announcing Together AI and Adaption Partnership

Score 3

Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.

fine-tuning evals

Open

Watchlist Matched: fine-tuning, evaluate

AI2 · research · 2026-04-29

Molmo learns to point and act

Score 0

MolmoPoint and MolmoWeb extend the Molmo family from visual understanding to visual action, giving researchers open tools for models that can point, navigate, and interact with the world they see.

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-04-22

Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen

Score 0

Panics in Rust Workers were historically fatal, poisoning the entire instance. By collaborating upstream on the wasm‑bindgen project, Rust Workers now support resilient critical error recovery, including panic unwinding using WebAssembly E...

Open

Watchlist Matched: none

Cloudflare Blog · cloud · 2026-04-21

Moving past bots vs. humans

Score 0

As AI assistants and privacy proxies challenge the capabilities of traditional bot detection, the Web needs new models for accountability. We believe that control should remain with the client, and that an open ecosystem of anonymous crede...

Open

Watchlist Matched: none

AI2 · research · 2026-04-13

Evaluating agents for scientific discovery

Score 0

Two benchmarks developed at Ai2 – ScienceWorld and DiscoveryWorld – reveal that even incredibly strong AI science agents struggle with problems human scientists solve routinely.

evals agents

Open

Watchlist Matched: evaluating, benchmarks, agents

Modal · inference-infra · 2026-04-10

Butter is joining Modal

Score 1

Butter, a San Francisco-based AI sandbox technology, is joining Modal.

Open

Watchlist Matched: none

Hugging Face · open-source · 2026-04-01

Falcon Perception

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

vLLM Project · open-source · 2026-03-30

Extracting hidden states from vLLM

Score 3

PR #33736 (included in vllm>=v0.18.0) introduced a new hidden states extraction system to vLLM. This blog post explores the motivation, design, usage, and future direction of this feature, and its...

Open

Watchlist Matched: none

SqueezeBits · korea · 2026-03-27

Our Experience Running a Booth at GTC 2026

Score 0

Sharing GTC 2026 insights, which is the Largest AI Industry Conference for developers! If you’ve ever wondered what it’s like for an AI startup to run a booth at such a massive event, you won’t want to miss this!

Open

Watchlist Matched: none

Hugging Face · open-source · 2026-03-27

Liberate your OpenClaw

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

AI2 · research · 2026-03-23

Highlights from Ai2 at NVIDIA GTC 2026

Score 0

A recap of Ai2's week at NVIDIA GTC 2026, covering panels on open models, live demos of Olmo Hybrid and Asta AutoDiscovery, and conversations on coding agents, hybrid architectures, and robotics.

agents

Open

Watchlist Matched: agents

LY Corporation Tech Blog · korea · 2026-03-18

Unification of Group Chat on the LINE App

Score 0

This article was originally published on the pre-merger blog (first published on February 24, 2022) ...

Open

Watchlist Matched: none

SkyPilot · open-source · 2026-03-03

SkyPilot Job Groups: Run RL on Heterogenous Hardware

Score 1

SkyPilot Job Groups let you define heterogeneous RL workloads in a single YAML. Run your PPO trainer on beefy H100s, rollout servers on cheap T4s, and replay buffers on high-memory CPUs, all as one managed job.

Open

Watchlist Matched: none

Replicate · inference-infra · 2026-02-24

How to prompt Seedream 5.0

Score 6

Seedream 5.0 brings multi-step reasoning, example-based editing, and deep domain knowledge to image generation. Here's what you should know.

inference

Open

Watchlist Matched: generation

SkyPilot · open-source · 2026-02-10

Migrating from Slurm to Kubernetes

Score 1

Moving from Slurm to Kubernetes doesn&#39;t have to mean losing the workflow you know. Here&#39;s how SkyPilot brings Slurm-like simplicity to K8s.

Open

Watchlist Matched: none

LY Corporation Tech Blog · korea · 2026-02-06

Creating the cloud of the future

Score 6

Hello, I’m Young Hee Park from the Cloud Service CBU, where I’m responsible for the private cloud th...

cloud

Open

Watchlist Matched: cloud

Modal · inference-infra · 2025-12-28

Keeping 20,000 GPUs healthy

Score 1

How we do active and passive monitoring on hyperscalers and neoclouds.

Open

Watchlist Matched: none

vLLM Project · open-source · 2025-12-27

Announcing vllm.ai Website and Some Community Updates

Score 3

For a long time, vllm.ai simply redirected to the vLLM GitHub page. Thanks to our community, we now have a brand-new vllm.ai website, drawing inspiration from the PyTorch website.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-12-18

Rime voice models now available on Together AI

Score 3

Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.

Open

Watchlist Matched: none

Modular · inference-infra · 2025-12-05

The path to Mojo 1.0

Score 1

The path to Mojo 1.0

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-11-25

Run FLUX.2 on Replicate

Score 6

FLUX.2 brings professional-grade image generation and editing with unprecedented detail, multi-reference support, and enterprise efficiency.

inference

Open

Watchlist Matched: generation

Replicate · inference-infra · 2025-11-20

How to prompt Nano Banana Pro

Score 6

Nano Banana Pro brings powerful new capabilities in image generation and editing. Here are the main prompt tricks you should know.

inference

Open

Watchlist Matched: generation

BAIR · research · 2025-11-01

RL without TD learning

Score 4

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalabilit...

benchmark model-release research training

Open

Watchlist Matched: benchmark, performance, model, paper, training

LY Corporation Tech Blog · korea · 2025-10-20

End to End Testing on PRs

Score 4

At LY Corporation we're constantly working to improve our pre-release test process and reduce the ri...

model-release

Open

Watchlist Matched: release

Hugging Face · open-source · 2025-10-17

AI for Food Allergies

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-10-16

How to prompt Veo 3.1

Score 6

Google's Veo 3.1 brings powerful new video generation capabilities including reference images, first/last frame control, and enhanced image-to-video. Here's everything you need to know.

inference

Open

Watchlist Matched: generation

Modal · inference-infra · 2025-09-29

Announcing our $87M Series B

Score 0

We’re excited to announce that we have raised more than $80M in a Series B round, led by Lux Capital. Our post-money valuation is $1.1B.

Open

Watchlist Matched: none

SkyPilot · open-source · 2025-09-23

Scaling Vector Search to 1M Documents for $0.85

Score 1

How to build production vector search with RedisVL and SkyPilot: 1M documents indexed for $0.85, sub-100ms queries, no Kubernetes required.

Open

Watchlist Matched: none

Hugging Face · open-source · 2025-08-14

Kimina-Prover-RL

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

LY Corporation Tech Blog · korea · 2025-08-14

Hack the planet! A recap of Hack Day 2025

Score 0

Hello. I'm Jeonghoon Kim from the Redis team at LINE Plus. From July 2nd to 4th, I participated in t...

Open

Watchlist Matched: none

Modal · inference-infra · 2025-07-24

What is an AI code sandbox?

Score 1

AI code sandboxes are seeing an explosion of adoption as the volume of LLM-generated code in the world grows.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-07-21

Generate consistent characters

Score 0

We compare the best image models for generating consistent characters from a single reference image.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-07-17

Bria is now on Replicate

Score 6

We've partnered with Bria to bring a suite of commercial-grade image generation and editing models to Replicate. Built entirely on licensed data, Bria’s tools are designed for enterprises and developers building safely with visual AI.

inference

Open

Watchlist Matched: generation

Modal · inference-infra · 2025-07-10

Jamsocket is joining Modal

Score 1

Jamsocket, a backend platform for building sync engines, is joining Modal.

Open

Watchlist Matched: none

Modal · inference-infra · 2025-07-07

How Modal powered 250,000 Lovable app creations in a weekend

Score 0

During a single weekend event, Lovable users built 250,000 new applications, all running in isolated development environments. Lovable used Modal to generate 1 million code sandboxes—with 20,000 running concurrently at peak—over just 48 ho...

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-07-01

The FLUX.1 Kontext hackathon

Score 0

We hosted a hackathon with BFL for FLUX.1 Kontext. Here were the winners.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-06-09

The Frontier is Open

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Modal · inference-infra · 2025-05-28

Twirl is joining Modal

Score 1

Twirl, a Stockholm-based data orchestration platform, is joining Modal.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-05-07

Ideogram 3.0 on Replicate

Score 0

Ideogram 3.0 is packed with powerful design, style transfer, and realism capabilities.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-05-06

Run MiniMax Speech-02 models with an API

Score 0

MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.

api

Open

Watchlist Matched: api

Modal · inference-infra · 2025-04-30

Modal SDKs for JavaScript and Go (alpha)

Score 1

Today we're releasing lightweight client libraries for JavaScript and Go, making it easier to start sandboxes and call serverless functions — no Python required.

Open

Watchlist Matched: none

Hugging Face · open-source · 2025-04-26

PipelineRL

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-04-16

Easel AI is now on Replicate

Score 0

Advanced face swap and AI avatars from Easel AI are now on Replicate.

Open

Watchlist Matched: none

Modal · inference-infra · 2025-04-15

Our first brand campaign

Score 1

Behind the scenes of updating our visual identity and launching our first-ever out-of-home campaign in San Francisco.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-04-01

Stylized video with Wan2.1

Score 0

One of the most fun ways to use Wan2.1 is video style transfer. Learn how here.

Open

Watchlist Matched: none

Hugging Face · open-source · 2025-03-27

Open R1: Update #4

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

BAIR · research · 2025-03-25

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Score 6

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...

serving kernel benchmark model-release research training agents

Open

Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents

Hugging Face · open-source · 2025-03-18

Xet is on the Hub

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2025-03-12

Open R1: Update #3

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

SqueezeBits · korea · 2025-03-10

When Should I Use Fits on Chips?

Score 1

This article describes when to use Fits on Chips toolkit with specific use cases.

Open

Watchlist Matched: none

Replicate · inference-infra · 2025-03-05

Wan2.1 parameter sweep

Score 6

We've been playing with Alibaba's WAN2.1 text-to-video model lately. What happens when you tweak those mysterious parameters? Let's find out.

model-release

Open

Watchlist Matched: model

Hugging Face · open-source · 2025-02-11

Open R1: Update #2

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

SqueezeBits · korea · 2025-02-06

The Rise and Fall of ONNX (feat. PyTorch 2.0)

Score 1

This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.

Open

Watchlist Matched: none

Hugging Face · open-source · 2025-02-02

Open-R1: Update #1

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-11-26

FLUX fine-tunes are now fast

Score 0

We've made running fine-tunes on Replicate much faster, and the optimizations are open-source.

open-source

Open

Watchlist Matched: open-source

Modal · inference-infra · 2024-11-07

Tidbyt is joining Modal

Score 1

Tidbyt, a NYC-based hardware manufacturer, is joining Modal

Open

Watchlist Matched: none

Hugging Face · open-source · 2024-11-05

Hugging Face + PyCharm

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-10-10

FLUX is fast and it's open source

Score 0

FLUX is now much faster on Replicate, and we’ve made our optimizations open-source so you can see exactly how they work and build upon them.

open-source

Open

Watchlist Matched: open-source, open source

Hugging Face · open-source · 2024-10-09

Welcome, Gradio 5

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2024-09-13

Accelerate 1.0.0

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-09-09

Fine-tune FLUX.1 with an API

Score 0

Create and run your own fine-tuned Flux models programmatically using Replicate's HTTP API.

api

Open

Watchlist Matched: api

Modal · inference-infra · 2024-09-04

Modal supports HIPAA compliance

Score 1

You can now enter BAAs with Modal to run HIPAA-compliant workloads.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-08-23

Replicate Intelligence #12

Score 0

Flux LoRAs, Hot Zuck, and Replicate on Lex Fridman

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-08-16

Replicate Intelligence #11

Score 0

Fine tune FLUX.1, generative video games, a vision for the metaverse

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-08-15

Fine-tune FLUX.1 with your own images

Score 6

We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.

inference fine-tuning api

Open

Watchlist Matched: generation, fine-tuning, lora, api

Hugging Face · open-source · 2024-08-13

Introduction to ggml

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2024-08-12

Tool Use, Unified

Score 1

No feed summary available yet.

agents

Open

Watchlist Matched: tool use

Replicate · inference-infra · 2024-08-09

Replicate Intelligence #10

Score 0

Flux developments, Minecraft bot, Streamlit cookbook with Zeke

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-08-02

FLUX.1: First Impressions

Score 0

We explore FLUX.1's unique strengths and aesthetics to see what we can generate.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-08-01

Run FLUX with an API

Score 6

FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.

model-release api open-source

Open

Watchlist Matched: model, api, open-source

SkyPilot · open-source · 2024-07-23

Finetune Llama 3.1 on Your Infra

Score 1

Operational guide to finetune Llama 3.1, with everything packaged in a simple SkyPilot YAML.

Open

Watchlist Matched: none

Modal · inference-infra · 2024-07-03

Competitive prompt engineering

Score 1

Learn how Basis partnered with Modal to bring the spirit of competitive programming to prompt engineering.

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-06-14

Replicate Intelligence #4

Score 0

Find concepts in GPT models, real-time speech to text in the browser, H100s are coming

Open

Watchlist Matched: none

Replicate · inference-infra · 2024-05-31

Replicate Intelligence #2

Score 6

Faster image generation, AI-powered world simulator, insights on AI dataset complexity

inference

Open

Watchlist Matched: generation

Replicate · inference-infra · 2024-05-24

Replicate Intelligence #1

Score 0

DIY Llama 3 implementation, open-source smart glasses, steering language models with dictionary learning

open-source

Open

Watchlist Matched: open-source

Modal · inference-infra · 2023-12-20

How to fine-tune an LLM on Modal

Score 1

An operational guide to fine-tuning an LLM on any dataset in minutes (ft. CodeLlama, Llama 2, Mistral, and more)

fine-tuning

Open

Watchlist Matched: fine-tuning

Hugging Face · open-source · 2023-12-18

2023, year of open LLMs

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-11-23

How to run Yi chat models with an API

Score 6

The Yi series models are large language models trained from scratch by developers at 01.AI. Learn how to run them in the cloud with one line of code.

cloud api

Open

Watchlist Matched: cloud, api

Modal · inference-infra · 2023-10-10

Modal is now generally available

Score 1

Modal offically launches today with no waitlist. And we also raised a Series A!

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-08-22

Painting with words: a history of text-to-image AI

Score 6

With the recent release of Stable Diffusion XL fine-tuning on Replicate, and today being the 1-year anniversary of Stable Diffusion, now feels like the perfect opportunity to take a step back and reflect on how text-to-image AI has improve...

model-release fine-tuning

Open

Watchlist Matched: release, fine-tuning

Replicate · inference-infra · 2023-08-16

We're cutting our prices in half

Score 0

The price of public models is being cut in half, and soon we'll start charging new users for setup and idle time on private models.

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-08-14

Streaming output for language models

Score 0

Our API now supports server-sent event streams for language models. Learn how to use them to make your apps more responsive.

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2023-08-08

Fine-tune SDXL with your own images

Score 0

We’ve added fine-tuning (Dreambooth, Textual Inversion and LoRA) support to SDXL 1.0. You can train SDXL on your own images with one line of code using the Replicate API.

fine-tuning api

Open

Watchlist Matched: fine-tuning, lora, api

Replicate · inference-infra · 2023-07-26

Run SDXL with an API

Score 0

How to run Stable Diffusion XL 1.0 using the Replicate API

api

Open

Watchlist Matched: api

Hugging Face · open-source · 2023-07-17

Building an AI WebTV

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2023-06-22

Panel on Hugging Face

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Modal · inference-infra · 2023-06-15

Modal is SOC2 compliant

Score 1

Modal is excited to announce that it has successfully completed a System and Organization Controls (SOC) 2 Type 1 audit.

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-05-18

Status page

Score 0

We've added a status page to provide real-time updates on the health of Replicate.

Open

Watchlist Matched: none

Hugging Face · open-source · 2023-03-23

Jupyter X Hugging Face

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-03-18

Week 3 of LLaMA 🦙

Score 0

A roundup of recent developments from the llamaverse.

Open

Watchlist Matched: none

Replicate · inference-infra · 2023-02-21

Machine learning needs better tools

Score 0

Lots of people want to build things with machine learning, but they don't have the expertise to use it.

Open

Watchlist Matched: none

Hugging Face · open-source · 2022-11-30

VQ-Diffusion

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2022-11-29

We are hiring interns!

Score 0

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2022-08-29

Run Stable Diffusion with an API

Score 0

How to use Replicate to integrate Stable Diffusion into hacks, apps, and projects

api

Open

Watchlist Matched: api

Replicate · inference-infra · 2022-08-11

Join us at Uncanny Spaces

Score 0

We're bringing people together to explore what's being created with machine learning.

Open

Watchlist Matched: none

Replicate · inference-infra · 2022-08-05

Automating image collection

Score 0

Using CLIP and LAION5B to collect thousands of captioned images.

Open

Watchlist Matched: none

Replicate · inference-infra · 2022-05-27

Constraining CLIPDraw

Score 0

An introduction to differentiable programming and the process of refining generative art models.

Open

Watchlist Matched: none

Hugging Face · open-source · 2022-05-16

Gradio 3.0 is Out!

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Replicate · inference-infra · 2022-05-16

Hello, world!

Score 0

We're a small team of engineers and machine learning enthusiasts working to make machine learning more accessible.

Open

Watchlist Matched: none

Hugging Face · open-source · 2022-04-05

~Don't~ Repeat Yourself

Score 1

No feed summary available yet.

Open

Watchlist Matched: none

Hugging Face · open-source · 2021-09-24

Summer at Hugging Face

Score 1

No feed summary available yet.

Open

Watchlist Matched: none