model-release

hardware model-release cloud training

High signal Matched: performance, model, training, checkpointing, fine-tuning

Lambda · cloud · 2026-06-03

Introducing workspaces for Lambda Cloud

Score 17

Lambda workspaces help teams organize cloud resources, control access, and separate dev, staging, and production in shared GPU environments. A junior researcher kills a production training run. A contractor sees weights they shouldn't. If...

model-release cloud agents

High signal Matched: gpu, introducing, weights, cloud, training

AWS Machine Learning Blog · cloud · 2026-06-02

Extending MCP support for Amazon Bedrock AgentCore Gateway

Score 11

While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized...

inference hardware model-release

High signal Matched: model, bedrock, mcp

AWS Machine Learning Blog · cloud · 2026-06-02

Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant

Score 15

If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for in...

High signal Matched: inference, gpu, model

Hugging Face · open-source · 2026-06-02

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Score 15

No feed summary available yet.

High signal Matched: introducing, model

vLLM Project · open-source · 2026-06-02

Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon LLM Agents

Score 15

Long-horizon LLM agents create a routing problem that single-turn prompt routers were not designed to solve. A router still needs to know which model is best for the current request, but it also...

moe model-release agents

inference serving distributed benchmark hardware model-release rag agents

High signal Matched: router, model, agents, agentic

Lambda · cloud · 2026-06-01

Unbox one of NVIDIA's first co-packaged optics switches with us. See why we bet on CPO early.

Score 15

When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...

High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic

Hugging Face · open-source · 2026-06-01

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

Score 11

No feed summary available yet.

inference serving model-release research evals

High signal Matched: model

vLLM Project · open-source · 2026-06-01

vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation

Score 17

A technical deep dive on running vLLM on NVIDIA DGX Spark and GB10 systems, covering sm_121 architecture, unified memory behavior, NVFP4 model serving, Nemotron-3-Super configuration, Docker deployment, Prometheus metrics, and local evalua...

High signal Matched: serving, model, evaluation

NVIDIA Technical Blog · hardware · 2026-05-29

DynoSim: Simulating the Pareto Frontier

Score 15

Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker...

High signal Matched: serving, prefill, model

NVIDIA Technical Blog · hardware · 2026-05-29

How to Automate AI Model Documentation with the NVIDIA MCG Toolkit

Score 13

As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including  California’s AB-2013 and the EU AI Act, software teams...

inference serving benchmark hardware model-release research quantization evals

High signal Matched: model

Nota AI · korea · 2026-05-29

Full-Stack Optimization for Low-Light Video on Jetson Orin NX: From 400 ms to 28 ms

Score 23

  Jaehoon Lee Technical Content Manager, Nota AI   When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...

model-release cloud training

High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard

AWS Machine Learning Blog · cloud · 2026-05-29

Training Azerbaijani language models on Amazon SageMaker AI

Score 13

Azercell Telecom LLC, Azerbaijan's leading telecommunications provider, wanted to build an Azerbaijani large language model (LLM) on Amazon SageMaker AI for telecom use cases and a customer-facing chatbot. The challenge: adapting foundatio...

inference model-release cloud agents

High signal Matched: model, sagemaker, training

AWS Machine Learning Blog · cloud · 2026-05-29

Claude Opus 4.8 is now available on AWS

Score 11

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.

inference speculative-decoding benchmark hardware model-release

High signal Matched: inference, model, bedrock, agentic

AMD ROCm Blogs · hardware · 2026-05-29

Enabling Speculative Speculative Decoding on MI300X

Score 29

Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...

kernel hardware model-release

High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model

PyTorch Foundation · open-source · 2026-05-28

Why Is PyTorch Compile So Fast: Kernel Fusion

Score 15

When you use PyTorch’s compiler, your model runs faster, up to 10x faster. But what’s actually happening? Without compilation, the GPU runs a kernel, a function on the GPU, for...

inference benchmark hardware model-release agents

High signal Matched: kernel, gpu, model

PyTorch Foundation · open-source · 2026-05-28

Up to 580tps! New Speed Record of Qwen3.5-397B-A17B on GPU for Agentic Workloads with TokenSpeed

Score 17

TL;DR: The TokenSpeed inference engine achieved a record-breaking 580 tps running the Qwen3.5-397B-A17B model on GPUs. This extreme performance for agentic workloads is driven by systematic elimination of memory copies,...

inference speculative-decoding model-release training

High signal Matched: inference, performance, gpu, model, agentic

vLLM Project · open-source · 2026-05-28

Speculators v0.5.0: DFlash Support and Online Training

Score 19

The v0.5.0 release brings significant architectural improvements to speculative decoding model training, introducing DFlash algorithm support, fully unified online training capabilities, and a...

High signal Matched: decoding, speculative decoding, release, introducing, model, training

vLLM Project · open-source · 2026-05-28

From Text to Multimodal Routing: Hardening Vision Signals in vLLM Semantic Router

Score 19

Most routing systems start with a prompt and choose a model endpoint. vLLM Semantic Router (VSR) makes a different bet: before a request reaches the serving model, the system should extract...

inference serving moe model-release api

kernel benchmark model-release quantization

High signal Matched: serving, endpoint, router, model

AMD ROCm Blogs · hardware · 2026-05-27

Deep Dive Into 4-Wave Interleave FP8 GEMM

Score 17

Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves th...

High signal Matched: gemm, performance, fp8

Modal · inference-infra · 2026-05-27

Role-Based Access Control for humans and agents

Score 9

Introducing Role-Based Access Control for humans and agents, now available for all users on Teams and Enterprise plans.

kernel triton hardware model-release

High signal Matched: introducing, agents

PyTorch Foundation · open-source · 2026-05-26

TLX Block Attention: A Warp-Specialized Blackwell Kernel for Fixed-Block Sparse Self-Attention

Score 18

Code available at: https://github.com/facebookresearch/ads_model_kernel_library In this post, we present the design of TLX Block Attention — a Triton kernel targeting NVIDIA Blackwell GPUs that exploits compile-time knowledge of a block-di...

kernel cuda benchmark hardware model-release

High signal Matched: kernel, triton, blackwell, model

NVIDIA Technical Blog · hardware · 2026-05-26

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates

Score 21

NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...

inference distributed hardware model-release cloud quantization evals

High signal Matched: cuda, performance, gpu, launch

AMD ROCm Blogs · hardware · 2026-05-25

AI Inference on AMD Ryzen™ AI Max Processor

Score 20

Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...

inference serving benchmark model-release open-source

High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate

Lambda · cloud · 2026-05-22

DeepSeek V4: the most expected open-source model ever released, and the quietest landing

Score 18

After 15 months of incremental updates, leaks, and rumored leaks, DeepSeek released version 4. It arrived without the fanfare R1 and R1-preview commanded in early 2025. That quiet reception is the most interesting thing about the release....

High signal Matched: inference, serving, performance, cost, release, model, open-source

SkyPilot · open-source · 2026-05-22

RL Doesn't Work on Slurm

Score 8

Online reinforcement learning for LLMs breaks Slurm's batch scheduling model. We'll discuss why, and what can be done about it.

inference serving kernel triton benchmark model-release cloud open-source

High signal Matched: model

AMD ROCm Blogs · hardware · 2026-05-22

From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

Score 30

Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...

inference benchmark hardware model-release evals

High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source

Lambda · cloud · 2026-05-20

Lambda’s NVIDIA HGX B200 on STAC-AI™ LANG6

Score 18

What the numbers mean for financial services Executive summary Lambda is the first to publish an audited STAC-AI™ LANG6 result on NVIDIA HGX B200, with independently verified performance data that Financial Services Industry (FSI) infrastr...

inference benchmark model-release quantization

High signal Matched: inference, generation, performance, gpu, h200, b200, model, evaluating

AMD ROCm Blogs · hardware · 2026-05-20

QuickReduce FP4 Quantization and Benchmarking on MI355

Score 12

Large Language Models (LLMs) typically contain billions — or even tens of billions — of parameters. During inference, tensor parallelism is commonly employed to distribute the workload across multiple GPUs. This approach demands frequent,...

High signal Matched: inference, latency, introducing, quantization

NVIDIA Technical Blog · hardware · 2026-05-19

NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents

Score 10

Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...

benchmark model-release research evals agents

High signal Matched: model, agent, agents, mcp

NVIDIA Technical Blog · hardware · 2026-05-19

Mastering Agentic Techniques: AI Agent Evaluation

Score 16

Evaluating an AI model and evaluating an AI agent are related—but they answer fundamentally different questions. A model benchmark tests the capability of a...

High signal Matched: benchmark, model, evaluation, evaluating, agent, agentic

Hugging Face · open-source · 2026-05-19

Introducing the Ettin Reranker Family

Score 10

No feed summary available yet.

inference hardware model-release

High signal Matched: introducing

PyTorch Foundation · open-source · 2026-05-19

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

Score 14

TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for PyTorch models on Apple Silicon Macs, using Apple’s MLX framework. The delegate seamlessly integrates with...

High signal Matched: inference, gpu, introducing

Modal · inference-infra · 2026-05-19

Introducing Claude Managed Agents with Modal Sandboxes

Score 10

No feed summary available yet.

inference serving benchmark model-release research api

High signal Matched: introducing, agents

Together AI · inference-infra · 2026-05-15

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Score 24

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.

inference model-release agents

High signal Matched: inference, endpoint, cost, launch, research

NVIDIA Technical Blog · hardware · 2026-05-14

How the NVIDIA Vera Rubin Platform is Solving Agentic AI’s Scale-Up Problem

Score 12

Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,...

kernel cuda model-release

High signal Matched: inference, introducing, agentic

PyTorch Foundation · open-source · 2026-05-14

PyTorch 2.12 Release Blog

Score 12

We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due...

benchmark model-release research

High signal Matched: cuda, release

Microsoft Research · big-tech · 2026-05-14

GridSFM: A new, small foundation model for the electric grid

Score 12

Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and...

High signal Matched: cost, introducing, model, research

vLLM Project · open-source · 2026-05-14

Announcing VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models

Score 10

We are excited to announce the pre-release of VeRL-Omni, a general reinforcement learning (RL) post-training framework focused on multimodal generative models, built on top of verl and vllm-omni.

kv-cache moe hardware model-release quantization agents

High signal Matched: release, training, post-training

LMCache · open-source · 2026-05-13

Benchmarking LMCache for Multi-Turn Agentic Workloads on AMD MI300X

Score 20

A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation trac...

benchmark model-release evals

High signal Matched: lmcache, moe, mi300x, rocm, fp8, agentic

AI2 · research · 2026-05-13

Introducing AIMIP: The AI weather and climate model intercomparison project

Score 14

AIMIP is a new open benchmark and dataset for evaluating AI climate models, showing they can match or beat conventional models on some historical climate metrics while still struggling to generalize reliably to long-term warming trends and...

High signal Matched: benchmark, introducing, model, evaluating

Microsoft Research · big-tech · 2026-05-12

Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

Score 8

MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials w...

model-release research

inference serving model-release fine-tuning

High signal Matched: model, research

NVIDIA Technical Blog · hardware · 2026-05-12

How to Eliminate Pipeline Friction in AI Model Serving

Score 16

The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...

High signal Matched: serving, model, fine-tuning

Modular · inference-infra · 2026-05-12

Inkwell: Why Your Inference Platform Matters As Much As Your Model

Score 14

Inkwell: Why Your Inference Platform Matters As Much As Your Model

High signal Matched: inference, model

Together AI · inference-infra · 2026-05-12

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

Score 12

Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.

inference model-release training

High signal Matched: introducing

Hugging Face · open-source · 2026-05-12

Building Blocks for Foundation Model Training and Inference on AWS

Score 14

No feed summary available yet.

High signal Matched: inference, model, training

NVIDIA Technical Blog · hardware · 2026-05-11

Introducing NVIDIA Fleet Intelligence for Real-Time GPU Fleet Visibility and Optimization

Score 16

The compute capability of large GPU fleets presents unprecedented opportunities to innovate and provide value to customers in record time. Yet these...

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

High signal Matched: gpu, introducing

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kv-cache speculative-decoding benchmark model-release research training fine-tuning evals long-context agents frontier-model

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

BAIR · research · 2026-05-08

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Score 28

.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...

inference model-release agents

High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model

NVIDIA Technical Blog · hardware · 2026-05-08

Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding

Score 20

Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is...

inference hardware model-release

High signal Matched: decoding, generation, model, agents

Together AI · inference-infra · 2026-05-08

Deploy and inference any model from HuggingFace

Score 20

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.

moe benchmark model-release training

High signal Matched: inference, gpu, release, model

AI2 · research · 2026-05-08

EMO: Pretraining mixture of experts for emergent modularity

Score 12

EMO is a new mixture-of-experts model trained so modular expert groups emerge from data, enabling users to select small task-specific expert subsets while preserving near full-model performance.

inference benchmark model-release training quantization

High signal Matched: mixture of experts, performance, model, pretraining

NVIDIA Technical Blog · hardware · 2026-05-07

Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer

Score 16

Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices such as NVIDIA GeForce RTX GPUs. By...

High signal Matched: inference, performance, model, training, post-training, quantization

LMCache · open-source · 2026-05-05

Deepseek V4 explained, and why it matters to your wallet

Score 12

DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...

kv-cache model-release

serving benchmark model-release

High signal Matched: kv cache, lmcache, model

Cloudflare Blog · cloud · 2026-05-01

Introducing Dynamic Workflows: durable execution that follows the tenant

Score 10

Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.

kernel cuda hardware model-release agents

High signal Matched: serve, cost, introducing

NVIDIA Technical Blog · hardware · 2026-04-30

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

Score 20

NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...

High signal Matched: kernel, cuda, gpu, model, agents

Nota AI · korea · 2026-04-29

[NVIDIA Nemotron Hackathon] Grand Prize Among 20 Teams: Behind Two Sleepless Days

Score 32

  Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...

inference moe benchmark model-release research korea training fine-tuning quantization evals agents

model-release long-context agents

High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic

Hugging Face · open-source · 2026-04-29

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Score 10

No feed summary available yet.

model-release agents open-source

High signal Matched: introducing, long-context, agents

NVIDIA Technical Blog · hardware · 2026-04-28

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

Score 16

Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...

model-release agents open-source

High signal Matched: model, open model, agent, agentic

Together AI · inference-infra · 2026-04-28

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

Score 12

NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.

High signal Matched: model, open model, agentic

vLLM Project · open-source · 2026-04-28

Run Highly Efficient Multimodal Agentic AI with NVIDIA Nemotron 3 Nano Omni Using vLLM

Score 10

We are excited to support the newly released NVIDIA Nemotron 3 Nano Omni model on vLLM.

High signal Matched: model, agentic

Sakana AI · model-lab · 2026-04-24

Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model

Score 9

No feed summary available yet.

inference kv-cache benchmark hardware model-release cloud

High signal Matched: model, agent

LMCache · open-source · 2026-04-23

LMCache on Amazon SageMaker HyperPod: Accelerating LLM Inference with Managed Tiered KV Cache

Score 30

Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...

High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker

AI2 · research · 2026-04-23

Introducing OlmoEarth embeddings: Custom embedding exports from OlmoEarth Studio for downstream analysis

Score 8

OlmoEarth Studio now lets users export custom Earth-observation embeddings from our OlmoEarth foundation models and use them for tasks like similarity search, few-shot mapping, change detection, and unsupervised exploration.

High signal Matched: introducing

Nota AI · korea · 2026-04-22

[Deep Dive: NetsPresso®] From Quantization to Graph Optimization: A Step-by-Step Model Deployment Pipeline

Score 54

  Jaehoon Lee Technical Content Manager, Nota AI   Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...

inference kernel cuda benchmark hardware model-release research korea training quantization evals api open-source

inference serving kv-cache hardware model-release quantization long-context

High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source

vLLM Project · open-source · 2026-04-22

The State of FP8 KV-Cache and Attention Quantization in vLLM

Score 18

Long-context LLM serving is increasingly memory-bound: for standard full-attention decoders, the KV cache often dominates GPU memory at 128k+ contexts, and each decode step must read a large...

hardware model-release cloud

High signal Matched: serving, kv cache, gpu, fp8, quantization, long-context

SkyPilot · open-source · 2026-04-22

GPU Compass: Navigate the GPU Frontier Across 20+ Clouds & 2K+ Offerings

Score 18

Introducing GPU Compass: One dashboard to browse, compare pricing, and launch across every GPU cloud.

inference serving benchmark model-release training quantization

High signal Matched: gpu, introducing, launch, cloud

NVIDIA Technical Blog · hardware · 2026-04-20

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

Score 18

As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy...

benchmark model-release research training evals

High signal Matched: generation, throughput, fp8, training

BAIR · research · 2026-04-20

Gradient-based Planning for World Models at Longer Horizons

Score 16

.grasp-results-table table { font-size: 0.875rem; line-height: 1.35; width: 100%; } .grasp-results-table th, .grasp-results-table td { padding: 0.35rem 0.5rem; } /* Consistent whitespace between major sections (this post is long and hr-hea...

High signal Matched: performance, model, paper, arxiv, evaluation, training

Together AI · inference-infra · 2026-04-15

Parcae: Doing more with fewer parameters using stable looped models

Score 14

Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just...

High signal Matched: performance, model

NVIDIA Technical Blog · hardware · 2026-04-14

NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems

Score 10

NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising...

High signal Matched: model

NVIDIA Technical Blog · hardware · 2026-04-12

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

Score 12

The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,...

model-release cloud training agents

High signal Matched: release, model, agentic

SkyPilot · open-source · 2026-04-10

SkyPilot Agent Skill: Let Agents Manage Your GPUs

Score 10

With the SkyPilot Agent Skill, your AI coding agent can launch clusters, run training jobs and manage cloud resources across any infrastructure using natural language.

High signal Matched: launch, cloud, training, agent, agents

NVIDIA Technical Blog · hardware · 2026-04-09

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

Score 16

Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume...

High signal Matched: model, weights, checkpoint, training

Google Research · big-tech · 2026-04-09

Improving the academic workflow: Introducing two AI agents for better figures and peer review

Score 8

Generative AI

inference distributed kv-cache speculative-decoding benchmark hardware model-release research quantization evals

High signal Matched: introducing, agents

Nota AI · korea · 2026-04-08

[Overview: NetsPresso®] A Platform That Handles Everything from Model Optimization to Target Deployment

Score 36

  Jaehoon Lee Technical Content Manager, Nota AI   AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...

High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate

AI2 · research · 2026-04-07

Introducing WildDet3D: Open-world 3D detection from a single image

Score 12

WildDet3D is an open model that predicts 3D bounding boxes from a single image. It generalizes across cameras and object categories, and folds in depth signals when available—alongside a new dataset of verified 3D annotations.

High signal Matched: introducing, model, open model

Together AI · inference-infra · 2026-04-03

Wan 2.7 video model suite now available on Together AI

Score 14

A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.

inference model-release cloud

High signal Matched: generation, model

LY Corporation Tech Blog · korea · 2026-04-02

Cloud infrastructure transformation at LY Corporation: introducing the architecture of Flava, the next-generation platform integrating two massive cl...

Score 14

Hello. I’m Inoue, and I work on private cloud infrastructure at LY Corporation.What powers LY Corpor...

serving benchmark hardware model-release

High signal Matched: generation, introducing, cloud

NVIDIA Technical Blog · hardware · 2026-04-02

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight

Score 14

In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU...

High signal Matched: throughput, gpu, model

NVIDIA Technical Blog · hardware · 2026-04-02

Bringing AI Closer to the Edge and On-Device with Gemma 4

Score 10

The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from...

inference model-release agents

High signal Matched: launch

Together AI · inference-infra · 2026-04-02

Deepgram speech-to-text and voice models now available natively on Together AI

Score 14

Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.

High signal Matched: inference, model, agents

Modular · inference-infra · 2026-04-02

Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD

Score 14

Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD

High signal Matched: performance, launch

vLLM Project · open-source · 2026-04-02

Announcing Gemma 4 on vLLM: Byte for byte, the most capable open models

Score 16

With the debut of Gemma 4, vLLM introduces immediate support for Google's most sophisticated open model lineup, spanning multiple hardware backends, with first-ever Day 0 support on Google TPUs,...

inference serving kv-cache benchmark hardware model-release research training fine-tuning quantization agents frontier-model

High signal Matched: model, open model

Nota AI · korea · 2026-03-31

The Real Reason TurboQuant Shook the Market: AI Optimization Has Gone Mainstream

Score 46

  Jaehoon Lee Technical Content Manager, Nota AI   In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...

serving benchmark hardware model-release

High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model

NVIDIA Technical Blog · hardware · 2026-03-25

Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads

Score 18

In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition...

High signal Matched: throughput, gpu, model

NVIDIA Technical Blog · hardware · 2026-03-25

Designing Protein Binders Using the Generative Model Proteina-Complexa

Score 12

Developing new protein-based therapies and catalysts involves the challenging task of designing protein binders, or proteins that bind to a target protein or...

High signal Matched: model

vLLM Project · open-source · 2026-03-24

Model Runner V2: A Modular and Faster Core for vLLM

Score 12

We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...

High signal Matched: model, api

Nota AI · korea · 2026-03-23

[GTC 2026 Recap] The Trillion-Dollar Inference Race Begins: How Nota AI Fills the Gap

Score 42

  Jaehoon Lee Technical Content Manager, Nota AI   GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...

inference serving kernel cuda kv-cache benchmark hardware model-release research cloud training long-context agents open-source

High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source

NVIDIA Technical Blog · hardware · 2026-03-23

Deploying Disaggregated LLM Inference Workloads on Kubernetes

Score 18

As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its limits. Prefill and decode stages...

High signal Matched: inference, serving, prefill, model

Hugging Face · open-source · 2026-03-21

Build a Domain-Specific Embedding Model in Under a Day

Score 10

No feed summary available yet.

inference kv-cache moe benchmark model-release research korea quantization

High signal Matched: model

Nota AI · korea · 2026-03-20

GenAI Everywhere: The Future of Edge AI Optimization with the New NetsPresso®

Score 26

  NP Product Team, Nota AI   The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...

serving benchmark model-release training fine-tuning

High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization

Together AI · inference-infra · 2026-03-18

Together AI expands fine-tuning service with tool calling, reasoning, and vision support

Score 14

Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.

High signal Matched: throughput, cost, model, training, fine-tuning

AI2 · research · 2026-03-18

MolmoPoint: Better pointing architecture for vision-language models

Score 8

MolmoPoint is a new vision-language model architecture that replaces text-based coordinate outputs with a more natural, token-based pointing mechanism that directly selects regions from visual features.

High signal Matched: model

NVIDIA Technical Blog · hardware · 2026-03-16

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI

Score 12

AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward...

inference serving moe benchmark hardware model-release research korea training quantization evals long-context open-source

High signal Matched: introducing, agentic

Nota AI · korea · 2026-03-13

NotaMoEQuantization: An MoE-Specific Quantization Method for Solar-Open-100B

Score 62

  Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...

inference serving benchmark model-release research training evals long-context rag

High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context

BAIR · research · 2026-03-13

Identifying Interactions at Scale for LLMs

Score 18

--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...

High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag

llm-d · open-source · 2026-03-13

Predicted-Latency Based Scheduling for LLMs

Score 18

A lightweight ML model trained online from live traffic replaces manually tuned heuristic weights with direct latency predictions, achieving 43% improvement in P50 end-to-end latency and 70% improvement in TTFT on a production-realistic wo...

inference speculative-decoding model-release

High signal Matched: latency, ttft, model, weights

vLLM Project · open-source · 2026-03-13

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

Score 26

EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you...

High signal Matched: inference, decoding, speculative decoding, eagle, model

Google Research · big-tech · 2026-03-12

Introducing Groundsource: Turning news reports into data with Gemini

Score 8

Climate & Sustainability

High signal Matched: introducing

vLLM Project · open-source · 2026-03-11

Run Highly Efficient and Accurate Multi-Agent AI with NVIDIA Nemotron 3 Super Using vLLM

Score 10

We are excited to support the newly released NVIDIA Nemotron 3 Super model on vLLM.

High signal Matched: model, agent

SkyPilot · open-source · 2026-03-11

SkyPilot Recipes: Templatize your AI Workflows

Score 8

SkyPilot Recipes let you store SkyPilot YAMLs in a shared, team-accessible registry. Launch workloads directly from the CLI without local files.

High signal Matched: launch

Hugging Face · open-source · 2026-03-10

Introducing Storage Buckets on the Hugging Face Hub

Score 10

No feed summary available yet.

High signal Matched: introducing

vLLM Project · open-source · 2026-03-10

vLLM Semantic Router v0.2 Athena: ClawOS, Model Refresh, and the System Brain

Score 18

Since v0.1 Iris, vLLM Semantic Router has made a large jump. In one release cycle, the project rebuilt its model stack, expanded routing into safety, semantic caching, memory, retrieval, and...

moe model-release rag

High signal Matched: router, release, model, retrieval

Hugging Face · open-source · 2026-03-05

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Score 10

No feed summary available yet.

High signal Matched: introducing

AI2 · research · 2026-03-05

Introducing Olmo Hybrid: Combining transformers and linear RNNs for superior scaling

Score 10

Olmo Hybrid is a fully open 7B language model that combines transformer attention with linear RNN layers to achieve greater expressivity and significantly improved data and compute efficiency compared to pure transformer models.

High signal Matched: introducing, model

Hugging Face · open-source · 2026-03-04

PRX Part 3 — Training a Text-to-Image Model in 24h!

Score 10

No feed summary available yet.

inference model-release fine-tuning rag api

High signal Matched: model, training

AIBrix · open-source · 2026-03-03

AIBrix v0.6.0 Release: Envoy Sidecar, Mixed LLM Workloads Routing, Routing Profiles, LoRA Delivery & New APIs

Score 28

🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...

model-release research open-source

High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible

Together AI · inference-infra · 2026-03-02

Introducing Together AI’s new look

Score 14

We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.

inference speculative-decoding benchmark model-release research training evals

High signal Matched: introducing, research, open-source

Nota AI · korea · 2026-02-26

ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models

Score 24

serving moe hardware model-release cloud

High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks

vLLM Project · open-source · 2026-02-26

Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock

Score 30

Organizations and individuals running multiple custom AI models, especially recent Mixture of Experts (MoE) model families, can face the challenge of paying for idle GPU capacity when the...

High signal Matched: serve, moe, mixture of experts, gpu, model, sagemaker, bedrock

Modal · inference-infra · 2026-02-24

Directory Snapshots: Resumable project state for Sandboxes

Score 8

Introducing Directory Snapshots, a programatic way to snapshot a specific directory within a running Sandbox and mount it into another Sandbox later, independently of the base image and the rest of the filesystem.

High signal Matched: introducing

Together AI · inference-infra · 2026-02-12

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Score 16

Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.

High signal Matched: inference, introducing

Hugging Face · open-source · 2026-02-06

Introducing SyGra Studio

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2026-02-04

H Company's new Holo2 model takes the lead in UI Localization

Score 10

No feed summary available yet.

inference benchmark model-release fine-tuning evals open-source

High signal Matched: model

Together AI · inference-infra · 2026-02-02

Fine-tuning open LLM judges to outperform GPT-5.2

Score 14

Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...

benchmark hardware model-release open-source

High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss

vLLM Project · open-source · 2026-02-01

GPT-OSS Performance Optimizations on NVIDIA Blackwell: Pushing the Pareto Frontier

Score 18

TL;DR: In collaboration with the open-source community, vLLM + NVIDIA has achieved significant performance milestones on the gpt-oss-120b model running on NVIDIA's Blackwell GPUs. Through deep...

inference model-release api

High signal Matched: performance, blackwell, model, open-source, oss

vLLM Project · open-source · 2026-01-31

Streaming Requests & Realtime API in vLLM

Score 12

Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...

High signal Matched: inference, model, api

Hugging Face · open-source · 2026-01-29

Introducing Daggr: Chain apps programmatically, inspect visually

Score 10

No feed summary available yet.

inference benchmark model-release research training evals agents open-source

High signal Matched: introducing

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

Google Research · big-tech · 2026-01-24

Introducing GIST: The next stage in smart sampling

Score 8

Algorithms & Theory

High signal Matched: introducing

Hugging Face · open-source · 2026-01-20

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

Score 10

No feed summary available yet.

inference benchmark hardware model-release quantization agents

High signal Matched: introducing

Together AI · inference-infra · 2026-01-13

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Score 24

Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...

distributed hardware model-release training

High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents

Together AI · inference-infra · 2026-01-12

Inside multi-node training: How to scale model training across GPU clusters

Score 22

Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.

benchmark model-release research training evals

High signal Matched: distributed, multi-node, gpu, model, training, distributed training

BAIR · research · 2026-01-10

Information-Driven Design of Imaging Systems

Score 12

An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements distinguish objects. Man...

benchmark model-release evals open-source

High signal Matched: performance, model, paper, evaluation, training, evaluate

Together AI · inference-infra · 2026-01-08

How to choose the right open model for production

Score 20

Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.

inference serving model-release fine-tuning

High signal Matched: performance, cost, model, open model, evaluating, open-source

SqueezeBits · korea · 2026-01-07

Intel® Gaudi® Hands-on Workshop | A Recap of the Gaudi Workshop with SqueezeBits x Lablup

Score 12

A recap of the Intel® Gaudi® hands-on workshop co-hosted by SqueezeBits and Lablup. AI model compression, fine-tuning, and vLLM serving on Gaudi® hardware with Backend.AI.

High signal Matched: serving, model, fine-tuning

Hugging Face · open-source · 2026-01-05

Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture

Score 10

No feed summary available yet.

High signal Matched: introducing

vLLM Project · open-source · 2026-01-05

vLLM Semantic Router v0.1 Iris: The First Major Release

Score 16

vLLM Semantic Router is the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems. It lives between users and models, capturing signals from...

moe model-release

High signal Matched: router, release

vLLM Project · open-source · 2026-01-02

Introducing vLLM Playground: A Modern Web Interface for Managing and Interacting with vLLM Servers

Score 12

As a passionate vLLM community member who wants to see vLLM thrive and reach even more developers, I'm excited to announce vLLM Playground – a modern, feature-rich web interface for managing and...

inference serving benchmark hardware model-release korea

High signal Matched: introducing

SqueezeBits · korea · 2025-12-24

Introducing rebellions ATOM™-MAX

Score 24

Introducing ATOM™-Max, rebellions’ next-generation NPU designed for high-performance AI inference. Learn how its runtime, profiling tools, and PyTorch-native integrations enable developers to run and serve models efficiently without sacrif...

High signal Matched: inference, generation, serve, performance, npu, introducing, rebellions

SkyPilot · open-source · 2025-12-19

Launch AI Jobs faster with SkyPilot Templates

Score 10

SkyPilot now includes predefined templates to launch clusters with popular frameworks and patterns. Deploy fully configured environments without writing long YAMLs.

High signal Matched: launch

Nota AI · korea · 2025-12-19

NVIDIA Blackwell; The Impact of NVFP4 For LLM Inference

Score 74

  Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...

inference serving kernel cuda distributed benchmark hardware model-release research training quantization evals rag

model-release cloud frontier-model

High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval

Together AI · inference-infra · 2025-12-15

Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest reasoning model

Score 14

Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud

High signal Matched: model, cloud, reasoning model

vLLM Project · open-source · 2025-12-15

Encoder Disaggregation for Scalable Multimodal Model Serving

Score 18

Modern Large Multimodal Models (LMMs) introduce a unique serving-time bottleneck: before any text generation can begin, all images must be processed by a visual encoder (e.g., ViT). This encoder...

model-release quantization agents

High signal Matched: serving, generation, model

vLLM Project · open-source · 2025-12-15

Run Highly Efficient and Accurate AI Agents with NVIDIA Nemotron 3 Nano on vLLM

Score 10

Jan 28th Update: NVIDIA just released their Nemotron 3 Nano model in NVFP4 precision. This model is supported by vLLM out of the box and it uses a new method called Quantization-Aware Distillation...

inference serving moe benchmark model-release

High signal Matched: model, quantization, agents

vLLM Project · open-source · 2025-12-13

vLLM Router: A High-Performance and Prefill/Decode Aware Load Balancer for Large-scale Serving

Score 26

Efficiently managing request distribution across a fleet of model replicas is a critical requirement for large-scale, production vLLM deployments. Standard load balancers often fall short as they...

inference speculative-decoding benchmark model-release training

High signal Matched: serving, prefill, router, performance, model

vLLM Project · open-source · 2025-12-13

Diving into speculative decoding training support for vLLM with Speculators v0.3.0

Score 24

- Speculative decoding serves as an optimization to improve inference performance; however, training a unique draft model for each LLM can be difficult and time-consuming, while production-ready...

High signal Matched: inference, decoding, speculative decoding, draft model, performance, model, training

Hugging Face · open-source · 2025-12-12

New in llama.cpp: Model Management

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2025-12-05

Introducing swift-huggingface: The Complete Swift Client for Hugging Face

Score 10

No feed summary available yet.

inference speculative-decoding model-release

High signal Matched: introducing

Together AI · inference-infra · 2025-12-03

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Score 20

AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...

kernel cuda hardware model-release

High signal Matched: inference, decoding, speculative decoding, introducing

vLLM Project · open-source · 2025-12-03

Tracing Hanging and Complicated GPU Kernels Down To The Source Code

Score 16

Several months ago, we published a blog post about CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond, introducing a powerful technique for debugging illegal memory access...

High signal Matched: cuda, gpu, introducing

Hugging Face · open-source · 2025-12-01

Transformers v5: Simple model definitions powering the AI ecosystem

Score 10

No feed summary available yet.

High signal Matched: model

vLLM Project · open-source · 2025-11-30

Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving

Score 20

We are excited to announce the official release of vLLM-Omni, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.

High signal Matched: serving, generation, release, model

Google Research · big-tech · 2025-11-22

Reducing EV range anxiety: How a simple AI model predicts port availability

Score 8

Algorithms & Theory

inference serving distributed model-release

High signal Matched: model

vLLM Project · open-source · 2025-11-22

Streamlined multi-node serving with Ray symmetric-run

Score 18

Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster, simplifying the workflow to spawn vLLM servers...

High signal Matched: serving, multi-node, launch

Hugging Face · open-source · 2025-11-20

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms

Score 10

No feed summary available yet.

benchmark hardware model-release

High signal Matched: introducing, api

Modal · inference-infra · 2025-11-19

How Reducto improved enterprise-scale document processing latency by 3x

Score 14

Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.

inference benchmark model-release research evals api

High signal Matched: latency, gpu, model

AIBrix · open-source · 2025-11-10

AIBrix v0.5.0 Release: Batch API, KVCache v1 Connector, and Enhanced P/D orchestration

Score 22

🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...

High signal Matched: prefill, latency, release, evaluation, api, openai-compatible

Google Research · big-tech · 2025-11-08

Introducing Nested Learning: A new ML paradigm for continual learning

Score 8

Algorithms & Theory

High signal Matched: introducing

Modular · inference-infra · 2025-11-07

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

Score 10

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

inference benchmark model-release

High signal Matched: model

SqueezeBits · korea · 2025-10-31

Winning both speed and quality: How Yetter deals with diffusion models

Score 16

Explore how the Yetter Inference Engine overcomes the limitations of step caching and model distillation for diffusion models. We analyze latency, diversity, quality, and negative-prompt handling to reveal what truly matters for scalable,...

High signal Matched: inference, generation, latency, model

Hugging Face · open-source · 2025-10-23

Building the Open Agent Ecosystem Together: Introducing OpenEnv

Score 10

No feed summary available yet.

inference model-release api

High signal Matched: introducing, agent

Together AI · inference-infra · 2025-10-21

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

Score 16

Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.

High signal Matched: generation, model, openai-compatible

Google Research · big-tech · 2025-10-02

Introducing interactive on-device segmentation in Snapseed

Score 8

Human-Computer Interaction and Visualization

model-release research evals rag

High signal Matched: introducing

Hugging Face · open-source · 2025-10-01

Introducing RTEB: A New Standard for Retrieval Evaluation

Score 14

No feed summary available yet.

High signal Matched: introducing, evaluation, retrieval

Replicate · inference-infra · 2025-09-23

Which image editing model should I use?

Score 8

Here is the ultimate comparison post on all the latest image editing models.

High signal Matched: model

Replicate · inference-infra · 2025-09-17

Introducing our new search API

Score 8

Find the best models and collections with a single API call.

inference benchmark model-release api

High signal Matched: introducing, api

Together AI · inference-infra · 2025-09-15

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Score 18

Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...

High signal Matched: inference, cost, model, api

Hugging Face · open-source · 2025-09-12

Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason!

Score 10

No feed summary available yet.

High signal Matched: introducing

Modal · inference-infra · 2025-09-09

Introducing Notebooks

Score 12

A collaborative environment for high-performance interactive computing on GPUs.

High signal Matched: performance, introducing

Hugging Face · open-source · 2025-09-04

Welcome EmbeddingGemma, Google's new efficient embedding model

Score 10

No feed summary available yet.

benchmark model-release research training

High signal Matched: model

BAIR · research · 2025-09-01

What exactly does word2vec learn?

Score 14

What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern lan...

High signal Matched: benchmark, performance, model, weights, paper, training

Together AI · inference-infra · 2025-08-27

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

Score 16

Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.

moe model-release evals

High signal Matched: deepseek-v3, model, swe-bench

SqueezeBits · korea · 2025-08-20

[Efficient AI Study] AI Model Compression Community Study and Meetup

Score 12

Efficient AI Study & Meetup recap: SqueezeBits' community study on AI model compression, featuring paper reviews, participant interviews, and networking from the offline meetup.

model-release research

model-release fine-tuning open-source

High signal Matched: model, paper

Together AI · inference-infra · 2025-08-15

Fine-Tuning Small Open-Source LLMs to Outperform Large Closed-Source Models by 60% on Specialized Tasks

Score 12

Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.

High signal Matched: model, fine-tuning, open-source

Hugging Face · open-source · 2025-08-08

Introducing AI Sheets: a tool to work with datasets using open AI models!

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2025-08-07

Vision Language Model Alignment in TRL ⚡️

Score 10

No feed summary available yet.

inference serving benchmark hardware model-release cloud

High signal Matched: model

AIBrix · open-source · 2025-08-05

AIBrix v0.4.0 Release: P/D Disaggregation and Expert Parallelism Support, KVCache v1 Connector, KV Event Synchronization & Multi‑Engine Support

Score 20

AIBrix is a composable, cloud‑native LLM inference infrastructure designed to deliver high performance and low cost at scale. We now present a major update in a new release - v0.4.0. This release tackles key bottlenecks in orchestration an...

High signal Matched: inference, prefill, generation, token generation, throughput, performance, cost, gpu, release, cloud

Modular · inference-infra · 2025-08-05

Modular Platform 25.5: Introducing Large Scale Batch Inference

Score 14

Modular Platform 25.5: Introducing Large Scale Batch Inference

High signal Matched: inference, introducing

Together AI · inference-infra · 2025-08-05

Announcing the Availability of OpenAI's Open Models on Together AI

Score 12

Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.

High signal Matched: model, oss

Hugging Face · open-source · 2025-08-05

Welcome GPT OSS, the new open-source model family from OpenAI!

Score 10

No feed summary available yet.

High signal Matched: model, open-source, oss

SqueezeBits · korea · 2025-08-04

Vocabulary Trimming: An Easy and Effective Method for SLM Acceleration

Score 10

Trimming large multilingual vocabularies in Small Language Models (SLM) is a simple, low-risk way to boost efficiency to its limit. It accelerates the model inference significantly while keeping accuracy almost unchanged.

High signal Matched: inference, model

SkyPilot · open-source · 2025-07-30

Slurm vs K8s for AI Infra: Academic HPC vs Cloud-Native Reality - the non-ideal solutions

Score 12

There are a lot of discussions happening in AI infrastructure right now. On one side, we have researchers who trained on Slurm in grad school, comfortable with sbatch train_model.sh and the predictability of academic HPC clusters. On the o...

High signal Matched: model, cloud

Hugging Face · open-source · 2025-07-29

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

Score 10

No feed summary available yet.

benchmark model-release open-source

High signal Matched: introducing

Together AI · inference-infra · 2025-07-28

Together Evaluations: Benchmark Models for Your Tasks

Score 16

Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.

model-release evals agents

High signal Matched: benchmark, model, open-source

Together AI · inference-infra · 2025-07-25

Qwen3-Coder: The Most Capable Agentic Coding Model Now Available on Together AI

Score 12

Unlock agentic coding with Qwen3-Coder on Together AI: 256K context, SWE-bench rivaling Claude Sonnet 4, zero-setup instant deployment.

High signal Matched: model, swe-bench, agentic

SkyPilot · open-source · 2025-07-24

SkyPilot 0.10: Enterprise-Ready AI Infrastructure with SSO, Dashboard, Workspaces, and More

Score 8

Announcing SkyPilot 0.10 - the largest release yet with enterprise-grade features.

High signal Matched: release

Hugging Face · open-source · 2025-07-23

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Score 10

No feed summary available yet.

High signal Matched: model

Modal · inference-infra · 2025-07-16

Dollars per token considered harmful

Score 8

Engineers of language model applications should think about requests, not tokens.

benchmark model-release agents open-source

High signal Matched: model

Together AI · inference-infra · 2025-07-14

Kimi K2: Leading Open-Source Model Now Available on Together AI

Score 16

Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.

distributed hardware model-release training

High signal Matched: cost, model, open model, agentic, open-source

Modal · inference-infra · 2025-07-11

Product updates: Multi-node training clusters, B200 and H200s, and Client 1.0 release

Score 18

Welcome to another round of Modal Product Updates! Here's what's new this month.

inference benchmark model-release research training fine-tuning evals

High signal Matched: multi-node, b200, release, training

Nota AI · korea · 2025-07-10

Video Self-Distillation for Single-Image Encoders: Learning Temporal Priors from Unlabeled Video

Score 20

  Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH   SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...

High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks

Replicate · inference-infra · 2025-07-07

Compare AI video models

Score 8

It's hard keeping up with every new video model. In this post we'll help you pick the best one for your needs.

inference benchmark model-release research training evals agents

High signal Matched: model

BAIR · research · 2025-07-01

Whole-Body Conditioned Egocentric Video Prediction

Score 10

.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...

benchmark model-release api

High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents

Together AI · inference-infra · 2025-06-11

Introducing the Together AI Batch API: Process Thousands of LLM Requests at 50% Lower Cost

Score 16

No feed summary available yet.

High signal Matched: cost, introducing, api

Hugging Face · open-source · 2025-06-11

Introducing Training Cluster as a Service - a new collaboration with NVIDIA

Score 10

No feed summary available yet.

High signal Matched: introducing, training

SqueezeBits · korea · 2025-06-10

[Japan IT Week Spring 2025] What We Saw on the Global AI Frontline in Tokyo

Score 8

SqueezeBits at Japan IT Week Spring 2025 in Tokyo: AI model compression demos, OwLite and Fits on Chips introductions, Japan market entry experiences, and team stories from the frontline.

High signal Matched: model

Modular · inference-infra · 2025-06-10

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

Score 10

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

High signal Matched: introducing

Modal · inference-infra · 2025-06-09

Introducing: Modal 1.0

Score 10

We've released v1.0 of the Modal client, marking a new milestone of maturity and stability for our platform.

High signal Matched: introducing

Together AI · inference-infra · 2025-06-05

Model-Preserving Adaptive Rounding with YAQA

Score 12

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2025-06-03

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Score 10

No feed summary available yet.

High signal Matched: model

Modal · inference-infra · 2025-05-30

Introducing: B200s and H200s on Modal

Score 18

We’re excited to be making Nvidia B200 and H200 GPUs available on Modal starting today!

High signal Matched: h200, b200, introducing

Modal · inference-infra · 2025-05-22

Introducing Modal Batch: Process 1 million jobs with 1 line of code

Score 10

Modal Batch is a new interface backed by a new durable queue system built specifically to make job processing easy, scalable, and fault-tolerant.

High signal Matched: introducing

Replicate · inference-infra · 2025-05-22

Generate incredible images with Google's Imagen 4

Score 8

Google's flagship image generation model, Imagen 4, is now available for you to try on Replicate. Create images with fine detail, versatile styles, and improved typography.

inference kv-cache benchmark model-release cloud

High signal Matched: generation, model

AIBrix · open-source · 2025-05-22

AIBrix v0.3.0 Release: KVCache Offloading, Prefix Cache, Fairness Routing, and Benchmarking Tools

Score 24

AIBrix is a composable, cloud-native AI infrastructure toolkit designed to power scalable and cost-effective large language model (LLM) inference. As production demands for memory-efficient and latency-aware LLM services continue to grow,...

inference serving distributed benchmark model-release frontier-model

High signal Matched: inference, prefix cache, latency, cost, release, model, cloud

llm-d · open-source · 2025-05-20

Announcing the llm-d community!

Score 20

Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.

model-release quantization

High signal Matched: inference, serving, distributed, performance, introducing, sota

SqueezeBits · korea · 2025-05-20

How to Quantize Transformer-based model for TensorRT Deployment

Score 12

This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.

inference distributed model-release cloud open-source

High signal Matched: model, quantized

llm-d · open-source · 2025-05-20

llm-d Press Release

Score 20

Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.

model-release frontier-model

High signal Matched: inference, distributed, release, cloud, open source

Together AI · inference-infra · 2025-05-20

Introducing Together Code Sandbox & Together Code Interpreter: SOTA code execution for AI

Score 12

No feed summary available yet.

High signal Matched: introducing, sota

Hugging Face · open-source · 2025-05-15

The Transformers Library: standardizing model definitions

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2025-05-14

Improving Hugging Face Model Access for Kaggle Users

Score 10

No feed summary available yet.

benchmark model-release research quantization

High signal Matched: model

Nota AI · korea · 2025-05-08

SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization for Edge AI Devices

Score 20

  Jaewoo SongSoftware Engineer, Nota AI   SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...

inference kv-cache benchmark model-release research training evals open-source

High signal Matched: performance, model, weights, research, quantization, int8, int4

Nota AI · korea · 2025-05-07

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features</span#x3E;

Score 28

model-release quantization

High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source

Hugging Face · open-source · 2025-04-29

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

Score 10

No feed summary available yet.

model-release evals long-context

High signal Matched: introducing, quantization

Hugging Face · open-source · 2025-04-16

Introducing HELMET: Holistically Evaluating Long-context Language Models

Score 10

No feed summary available yet.

benchmark model-release research training fine-tuning evals rag api frontier-model

High signal Matched: introducing, evaluating, long-context

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

benchmark model-release quantization

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

SqueezeBits · korea · 2025-04-11

OwLite: No More Compromising on AI Performance After Quantization

Score 16

Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.

inference benchmark model-release research training rag

High signal Matched: performance, model, quantization

BAIR · research · 2025-04-08

Repurposing Protein Folding Models for Generation with Latent Diffusion

Score 20

PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...

inference kernel benchmark model-release research evals

High signal Matched: inference, generation, cost, model, weights, research, training, retrieval

Nota AI · korea · 2025-04-08

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Score 24

  Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI   SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...

benchmark model-release cloud training

High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks

SkyPilot · open-source · 2025-04-08

High-Performance Model Checkpointing on the Cloud

Score 18

Techniques to speed up checkpointing by 9.6x and how to easily achieve them in SkyPilot

High signal Matched: performance, model, cloud, checkpointing

Hugging Face · open-source · 2025-04-08

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2025-03-24

Introducing Gradio's new Dataframe!

Score 10

No feed summary available yet.

High signal Matched: introducing

SkyPilot · open-source · 2025-03-11

Introducing SkyPilot Client-Server Architecture

Score 10

Transforming SkyPilot into a scalable, multi-user platform.

inference distributed benchmark model-release training long-context

High signal Matched: introducing

AIBrix · open-source · 2025-03-10

DeepSeek-R1 671B multi-host Deployment in AIBrix

Score 20

This blog post introduces deploying DeepSeek R1 using AIBrix. DeepSeek-R1 demonstrates remarkable proficiency in reasoning tasks through step-by-step training process. It features 671B total parameters with 37B active parameters, and 128k...

inference model-release cloud api open-source

High signal Matched: inference, distributed, benchmark, model, weights, training, context length

Replicate · inference-infra · 2025-03-05

Wan2.1: generate videos with an API

Score 10

Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.

High signal Matched: generation, model, cloud, api, open-source

Hugging Face · open-source · 2025-02-27

HuggingFace, IISc partner to supercharge model building on India's diverse languages

Score 10

No feed summary available yet.

inference benchmark model-release research training fine-tuning

High signal Matched: model

Nota AI · korea · 2025-02-25

A Study on Detecting LLM-Generated Multilingual Content

Score 18

  Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI Jaeyeon KimAI Research Engineer, Nota AI   SummaryIn this study, we propose a method for determining whether given multilingual...

inference benchmark model-release agents open-source

High signal Matched: generation, performance, model, paper, research, training, fine-tuning

AIBrix · open-source · 2025-02-21

Introducing AIBrix: Cost-Effective and Scalable Control Plane for vLLM

Score 26

Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...

inference serving distributed kv-cache benchmark hardware model-release agents

High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source

AIBrix · open-source · 2025-02-19

AIBrix v0.2.0 Release: Distributed KV Cache, Orchestration and Heterogeneous GPU Support

Score 42

We’re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...

High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent

Hugging Face · open-source · 2025-02-18

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

Score 14

No feed summary available yet.

High signal Matched: inference, introducing

Modular · inference-infra · 2025-02-18

MAX 25.1 - Introducing MAX Builds

Score 10

MAX 25.1 - Introducing MAX Builds

High signal Matched: introducing

Modal · inference-infra · 2025-01-28

Memory snapshots: Checkpoint/restore for sub-second startup

Score 10

Serializing container state to disk for aggressive cold start optimization.

High signal Matched: checkpoint

Hugging Face · open-source · 2025-01-23

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2025-01-22

Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2025-01-16

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Score 18

No feed summary available yet.

High signal Matched: inference, generation, introducing

Hugging Face · open-source · 2025-01-16

Timm ❤️ Transformers: Use any timm model with transformers

Score 10

No feed summary available yet.

benchmark hardware model-release quantization evals

High signal Matched: model

SqueezeBits · korea · 2025-01-13

[Intel Gaudi] #4. FP8 Quantization

Score 20

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

High signal Matched: performance, accelerator, fp8, quantization, evaluate

Hugging Face · open-source · 2024-12-31

Introducing smolagents: simple agents that write actions in code.

Score 10

No feed summary available yet.

High signal Matched: introducing, agents

Hugging Face · open-source · 2024-12-23

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

Score 14

No feed summary available yet.

High signal Matched: generation, model

Modal · inference-infra · 2024-12-19

Introducing: L40S GPUs on Modal

Score 10

NVIDIA L40S GPUs available on Modal now!

High signal Matched: introducing

Hugging Face · open-source · 2024-12-19

Finally, a Replacement for BERT: Introducing ModernBERT

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-12-18

Bamba: Inference-Efficient Hybrid Mamba2 Model

Score 14

No feed summary available yet.

High signal Matched: inference, model

Modular · inference-infra · 2024-12-17

Introducing MAX 24.6: A GPU Native Generative AI Platform

Score 14

Introducing MAX 24.6: A GPU Native Generative AI Platform

High signal Matched: gpu, introducing

Hugging Face · open-source · 2024-12-17

Benchmarking Language Model Performance on 5th Gen Xeon at GCP

Score 14

No feed summary available yet.

High signal Matched: performance, model

Hugging Face · open-source · 2024-12-16

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-11-26

SmolVLM - small yet mighty Vision Language Model

Score 10

No feed summary available yet.

High signal Matched: model

Modal · inference-infra · 2024-11-24

Press release: Modal signs strategic collaboration agreement with AWS to deliver accelerated generative AI solutions

Score 12

Announcing Modal's newest cloud partnership.

High signal Matched: release, cloud

Hugging Face · open-source · 2024-11-20

Introducing the Open Leaderboard for Japanese LLMs!

Score 10

No feed summary available yet.

inference kv-cache benchmark hardware model-release cloud open-source

High signal Matched: introducing, leaderboard

AIBrix · open-source · 2024-11-13

Introducing AIBrix v0.1.0: Building the Future of Scalable, Cost-Effective AI Infrastructure for Large Models

Score 32

In recent years, large language models (LLMs) have revolutionized AI applications, powering solutions in areas like chatbots, automated content generation, and advanced recommendation engines. Services like OpenAI’s have gained significant...

High signal Matched: decoding, prefill, generation, kv cache, performance, cost, gpu, release, introducing, cloud, open-source

Hugging Face · open-source · 2024-10-29

Universal Assisted Generation: Faster Decoding with Any Assistant Model

Score 18

No feed summary available yet.

High signal Matched: decoding, generation, model

Hugging Face · open-source · 2024-10-23

Introducing SynthID Text

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-10-23

Introducing HUGS - Scale your AI with Open Models

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-10-22

Hugging Face Teams Up with Protect AI: Enhancing Model Security for the ML Community

Score 10

No feed summary available yet.

High signal Matched: model

Replicate · inference-infra · 2024-10-22

Ideogram v2 is an outstanding new inpainting model

Score 8

We've partnered with Ideogram to bring their inpainting model to Replicate's API.

High signal Matched: model, api

Hugging Face · open-source · 2024-10-10

Introducing the AMD 5th Gen EPYC™ CPU

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-10-04

Introducing the Open FinLLM Leaderboard

Score 10

No feed summary available yet.

High signal Matched: introducing, leaderboard

Replicate · inference-infra · 2024-10-03

FLUX1.1 [pro] is here

Score 10

Black Forest Labs continue to push boundaries with their latest release of FLUX.1 image generation model.

High signal Matched: generation, release, model

Hugging Face · open-source · 2024-09-17

Introducing the SQL Console on Datasets

Score 10

No feed summary available yet.

inference serving benchmark model-release

High signal Matched: introducing

Modal · inference-infra · 2024-09-16

Boost your throughput with dynamic batching

Score 14

Learn how we used our new dynamic batching feature to improve throughput and reduce inference costs for the Whisper model with a single line of code!

High signal Matched: inference, throughput, model

Hugging Face · open-source · 2024-09-16

Introducing Community Tools on HuggingChat

Score 10

No feed summary available yet.

High signal Matched: introducing

SkyPilot · open-source · 2024-09-16

Can Multimodal LLMs Truly "See" Images? A Deep Dive with ASCII Art

Score 8

With last week’s Pixtral release, multimodal large language models (LLMs) like OpenAI’s GPT-4o, Google’s Gemini Pro, and Pixtral are making significant strides. These models are not only able to generate text from images...

High signal Matched: release

Hugging Face · open-source · 2024-08-12

Welcome Falcon Mamba: The first strong attention-free 7B model

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2024-08-06

Introducing TextImage Augmentation for Document Images

Score 10

No feed summary available yet.

inference benchmark model-release research cloud training fine-tuning evals open-source

High signal Matched: introducing

Nota AI · korea · 2024-08-02

Deploying an Efficient Vision-Language Model on Mobile Devices

Score 38

  Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI   IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...

High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source

Replicate · inference-infra · 2024-07-23

Run Meta Llama 3.1 405B with an API

Score 8

Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open-source

Modular · inference-infra · 2024-07-09

Bring your own PyTorch model

Score 10

Bring your own PyTorch model

High signal Matched: model

Hugging Face · open-source · 2024-07-03

Accelerating Protein Language Model ProtST on Intel Gaudi 2

Score 10

No feed summary available yet.

High signal Matched: model

SqueezeBits · korea · 2024-06-26

How much can we save through compression?

Score 10

Estimating the cost savings from model compression.

High signal Matched: cost, model

Hugging Face · open-source · 2024-06-25

XLSCOUT Unveils ParaEmbed 2.0: a Powerful Embedding Model Tailored for Patents and IP with Expert Support from Hugging Face

Score 10

No feed summary available yet.

inference model-release api

High signal Matched: model

Replicate · inference-infra · 2024-06-14

Push a custom version of Stable Diffusion 3

Score 8

Create your own custom version of Stability's latest image generation model and run it on Replicate via the web or API.

benchmark model-release research evals

High signal Matched: generation, model, api

Nota AI · korea · 2024-06-13

Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking

Score 8

  Jeongho KimResearch Engineer, Nota AI   SummaryOnline multi-camera system for efficient individual trackingAccurate ID management with Cluster Self-Refinement (CSR)Improved performance with enhanced pose estimation   Intro...

High signal Matched: performance, model, paper, research, evaluation, leaderboard

Replicate · inference-infra · 2024-06-12

Run Stable Diffusion 3 with an API

Score 8

Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.

model-release cloud api

High signal Matched: model, cloud, api

Hugging Face · open-source · 2024-06-07

Introducing the Hugging Face Embedding Container for Amazon SageMaker

Score 14

No feed summary available yet.

model-release quantization

High signal Matched: introducing, sagemaker

Modular · inference-infra · 2024-06-07

MAX 24.4 - Introducing quantization APIs and MAX on macOS

Score 10

MAX 24.4 - Introducing quantization APIs and MAX on macOS

High signal Matched: introducing, quantization

Hugging Face · open-source · 2024-06-05

Introducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs

Score 10

No feed summary available yet.

High signal Matched: introducing

Modular · inference-infra · 2024-05-29

What ownership is really about: a mental model approach

Score 10

What ownership is really about: a mental model approach

High signal Matched: model

Hugging Face · open-source · 2024-05-24

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens and 11 languages

Score 10

No feed summary available yet.

model-release fine-tuning

High signal Matched: model

Modal · inference-infra · 2024-05-21

Create an infinite icon library by fine-tuning Stable Diffusion

Score 8

How we fine-tuned a Stable Diffusion model on the Heroicons library to generate all the icons we could dream of.

High signal Matched: model, fine-tuning

Hugging Face · open-source · 2024-05-21

Introducing Spaces Dev Mode for a seamless developer experience

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-05-14

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2024-05-14

Introducing the Open Arabic LLM Leaderboard

Score 10

No feed summary available yet.

High signal Matched: introducing, leaderboard

Hugging Face · open-source · 2024-05-13

License to Call: Introducing Transformers Agents 2.0

Score 10

No feed summary available yet.

High signal Matched: introducing, agents

Modal · inference-infra · 2024-05-13

Introducing: Region selection

Score 12

You can now specify which cloud region you would like to run your Functions in.

High signal Matched: introducing, cloud

Hugging Face · open-source · 2024-05-05

Introducing the Open Leaderboard for Hebrew LLMs!

Score 10

No feed summary available yet.

High signal Matched: introducing, leaderboard

Modular · inference-infra · 2024-05-02

MAX 24.3 - Introducing MAX Engine Extensibility

Score 10

MAX 24.3 - Introducing MAX Engine Extensibility

High signal Matched: introducing

SqueezeBits · korea · 2024-04-24

Accuracy Degradation in AI Compression: Myth or Truth?

Score 8

Clarifying the misunderstandings in AI model compression

High signal Matched: model

Hugging Face · open-source · 2024-04-23

Introducing the Open Chain of Thought Leaderboard

Score 10

No feed summary available yet.

High signal Matched: introducing, leaderboard

Replicate · inference-infra · 2024-04-23

Run Snowflake Arctic with an API

Score 8

Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open-source

SqueezeBits · korea · 2024-04-19

Things to check if your business utilizes AI

Score 8

Do I need to COMPRESS my AI model? : the short answer is “YES” — and here’s why.

High signal Matched: model

Replicate · inference-infra · 2024-04-18

Run Meta Llama 3 with an API

Score 8

Llama 3 is the latest language model from Meta. Learn how to run it in the cloud with one line of code.

model-release cloud api

model-release research evals

High signal Matched: model, cloud, api

Hugging Face · open-source · 2024-04-16

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Score 14

No feed summary available yet.

High signal Matched: introducing, evaluation, leaderboard

SqueezeBits · korea · 2024-04-15

AI Compression for Acceleration: 4 Key Methods.

Score 8

AI model compression for acceleration is essential. The question is HOW? Here are 4 key methodologies.

High signal Matched: model

Hugging Face · open-source · 2024-04-15

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Score 14

No feed summary available yet.

High signal Matched: introducing, model

Hugging Face · open-source · 2024-04-10

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2024-04-09

CodeGemma - an official Google release for code LLMs

Score 10

No feed summary available yet.

High signal Matched: release

Hugging Face · open-source · 2024-03-21

Introducing the Chatbot Guardrails Arena

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2024-03-20

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2024-03-05

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Score 14

No feed summary available yet.

High signal Matched: introducing, model

Modal · inference-infra · 2024-02-27

Introducing: WebSockets on Modal

Score 10

Modal now supports WebSocket connections, enabling real-time, bidirectional data transfer between client and server.

High signal Matched: introducing

Hugging Face · open-source · 2024-02-23

Introducing the Red-Teaming Resistance Leaderboard

Score 10

No feed summary available yet.

High signal Matched: introducing, leaderboard

Modal · inference-infra · 2024-02-21

How Suno shaved 4 months off their launch timeline with Modal

Score 12

Find out how Suno uses Modal to scale inference and batch pre-processing to thousands of GPUs.

inference serving benchmark model-release cloud

High signal Matched: inference, launch

SkyPilot · open-source · 2024-02-20

Introducing SkyServe: 50% Cheaper AI Serving on Any Cloud with High Availability

Score 20

SkyServe: A simple, cost-efficient, multi-region/cloud library for serving GenAI models.

model-release research korea evals

High signal Matched: serving, cost, introducing, cloud

Hugging Face · open-source · 2024-02-20

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Score 18

No feed summary available yet.

High signal Matched: introducing, evaluation, korean, leaderboard

Modal · inference-infra · 2024-02-06

Introducing: H100s on Modal

Score 14

We’re excited to be making Nvidia H100 GPUs available on Modal starting today!

High signal Matched: h100, introducing

Hugging Face · open-source · 2024-01-31

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Score 10

No feed summary available yet.

inference serving moe benchmark hardware model-release

High signal Matched: introducing, leaderboard

SkyPilot · open-source · 2023-12-21

Scaling Mixtral LLM Serving with High GPU Availability and Cost Efficiency

Score 24

A tutorial for serving Mixtral 8x7B model with SkyPilot and SkyServe.

benchmark model-release api open-source

High signal Matched: serving, mixtral, cost, gpu, model

Replicate · inference-infra · 2023-11-10

Using open-source models for faster and cheaper text embeddings

Score 10

An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.

High signal Matched: performance, model, api, open-source

Hugging Face · open-source · 2023-11-07

Introducing Prodigy-HF: a direct integration with Hugging Face

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2023-11-03

Introducing Storage Regions on the HF Hub

Score 10

No feed summary available yet.

High signal Matched: introducing

Replicate · inference-infra · 2023-10-25

Generate images in one second on your Mac using a latent consistency model

Score 8

How to run a latent consistency model on your M1 or M2 Mac

inference model-release rag

High signal Matched: model

Replicate · inference-infra · 2023-10-17

How to use retrieval augmented generation with ChromaDB and Mistral

Score 10

In this post we'll explore the basics of retrieval augmented generation by creating an example app that uses bge-large-en for embeddings, ChromaDB for vector store, and mistral-7b-instruct for language model generation.

High signal Matched: generation, model, retrieval augmented generation, retrieval

Modal · inference-infra · 2023-10-10

Press release: Modal Labs announces Series A financing round

Score 14

Modal Labs Announces Series A Financing Round, Securing $16 Million Investment to Launch Cloud-Based Infrastructure Platform, Build Towards End-to-End Enterprise Data Stack

High signal Matched: release, launch, cloud

Replicate · inference-infra · 2023-10-06

How to run Mistral 7B with an API

Score 8

Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open-source

Hugging Face · open-source · 2023-09-13

Introducing Würstchen: Fast Diffusion for Image Generation

Score 14

No feed summary available yet.

High signal Matched: generation, introducing

Hugging Face · open-source · 2023-08-22

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model

Score 14

No feed summary available yet.

High signal Matched: introducing, model

Hugging Face · open-source · 2023-08-22

Introducing SafeCoder

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2023-08-01

Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny

Score 10

No feed summary available yet.

High signal Matched: weights

Replicate · inference-infra · 2023-07-27

Run Llama 2 with an API

Score 8

Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.

High signal Matched: model, cloud, api, open source

Hugging Face · open-source · 2023-07-24

Introducing Agents.js: Give tools to your LLMs using JavaScript

Score 10

No feed summary available yet.

High signal Matched: introducing, agents

Replicate · inference-infra · 2023-07-19

What happened with Llama 2 in the last 24 hours? 🦙

Score 8

A roundup of recent developments from the llamaverse following the second major release of Meta's open-source large language model.

inference model-release cloud

High signal Matched: release, model, open-source

Hugging Face · open-source · 2023-05-31

Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Score 18

No feed summary available yet.

High signal Matched: inference, introducing, sagemaker

Hugging Face · open-source · 2023-05-31

Introducing BERTopic Integration with the Hugging Face Hub

Score 10

No feed summary available yet.

High signal Matched: introducing

Replicate · inference-infra · 2023-05-26

Make any large language model a better poet

Score 8

Prompt engineering and training are often the first solutions we reach for to improve language model behavior, but they're not the only way.

High signal Matched: model, training

Hugging Face · open-source · 2023-05-24

Hugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure

Score 14

No feed summary available yet.

High signal Matched: launch, model

Hugging Face · open-source · 2023-05-15

Introducing RWKV - An RNN with the advantages of a transformer

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2023-04-27

Training a language model with 🤗 Transformers using TensorFlow and TPUs

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2023-04-24

Introducing HuggingFace blog for Chinese speakers: Fostering Collaboration with the Chinese AI community

Score 10

No feed summary available yet.

High signal Matched: introducing

Replicate · inference-infra · 2023-04-21

Language model roundup, April 2023

Score 8

A roundup of recent developments from the world of open-source language models.

model-release fine-tuning

High signal Matched: model, open-source

Replicate · inference-infra · 2023-03-23

How to use Alpaca-LoRA to fine-tune a model like ChatGPT

Score 8

No feed summary available yet.

High signal Matched: model, lora

Hugging Face · open-source · 2023-02-07

Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system

Score 10

No feed summary available yet.

model-release cloud fine-tuning

High signal Matched: introducing, agents

Replicate · inference-infra · 2023-02-07

Introducing LoRA: A faster way to fine-tune Stable Diffusion

Score 10

It's like DreamBooth, but much faster. And you can run it in the cloud on Replicate.

High signal Matched: introducing, cloud, lora

Hugging Face · open-source · 2022-12-20

Model Cards

Score 10

No feed summary available yet.

High signal Matched: model

Replicate · inference-infra · 2022-11-21

Train and deploy a DreamBooth model on Replicate

Score 10

With just a handful of images and a single API call, you can train a model, publish it to Replicate, and run predictions on it in the cloud.

model-release cloud api

benchmark model-release cloud

High signal Matched: model, cloud, api

SkyPilot · open-source · 2022-11-16

SkyPilot: ML and Data Science on any cloud with massive cost savings

Score 16

Introducing SkyPilot.

High signal Matched: cost, introducing, cloud

Hugging Face · open-source · 2022-11-08

Introducing our new pricing

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-10-24

Evaluating Language Model Bias with 🤗 Evaluate

Score 10

No feed summary available yet.

High signal Matched: model, evaluate, evaluating

Hugging Face · open-source · 2022-10-07

Introducing DOI: the Digital Object Identifier to Datasets and Models

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-09-07

How to train a Language Model with Megatron-LM

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2022-08-12

Introducing Skops

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-08-03

Introducing the Private Hub: A New Way to Build With Machine Learning

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-07-28

Introducing new audio and vision documentation in 🤗 Datasets

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-07-16

How to train your model dynamically using adversarial data

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2022-07-12

Introducing The World's Largest Open Multilingual Language Model: BLOOM

Score 14

No feed summary available yet.

High signal Matched: introducing, model

Replicate · inference-infra · 2022-07-05

A new template for model READMEs

Score 8

Inspired by model cards, we've created templates for documenting models on Replicate.

High signal Matched: model

Hugging Face · open-source · 2022-06-28

Accelerate Large Model Training using DeepSpeed

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2022-06-07

The Annotated Diffusion Model

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2022-05-26

Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers

Score 10

No feed summary available yet.

High signal Matched: launch

Hugging Face · open-source · 2022-05-25

Introducing Pull Requests and Discussions 🥳

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-05-02

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2022-04-25

Introducing Hugging Face for Education 🤗

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-04-12

Habana Labs and Hugging Face Partner to Accelerate Transformer Model Training

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2022-03-28

Introducing Decision Transformers on Hugging Face 🤗

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2022-03-17

Fine-Tune a Semantic Segmentation Model with a Custom Dataset

Score 10

No feed summary available yet.

model-release frontier-model

High signal Matched: model

Hugging Face · open-source · 2022-03-02

BERT 101 - State Of The Art NLP Model Explained

Score 10

No feed summary available yet.

High signal Matched: model, state of the art

Hugging Face · open-source · 2021-12-15

Perceiver IO: a scalable, fully-attentional model that works on any modality

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2021-12-02

Introducing Snowball Fight ☃️, our first ML-Agents environment

Score 10

No feed summary available yet.

High signal Matched: introducing, agents

Hugging Face · open-source · 2021-11-29

Introducing the Data Measurements Tool: an Interactive Tool for Looking at Datasets

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2021-11-04

Scaling up BERT-like model Inference on modern CPU - Part 2

Score 14

No feed summary available yet.

High signal Matched: inference, model

Hugging Face · open-source · 2021-10-25

Train a Sentence Embedding Model with 1B Training Pairs

Score 10

No feed summary available yet.

High signal Matched: model, training

Hugging Face · open-source · 2021-09-14

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2021-04-16

Introducing 🤗 Accelerate

Score 10

No feed summary available yet.

High signal Matched: introducing

Hugging Face · open-source · 2020-11-09

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Score 10

No feed summary available yet.

High signal Matched: model

Hugging Face · open-source · 2020-02-14

How to train a new language model from scratch using Transformers and Tokenizers

Score 10

No feed summary available yet.

High signal Matched: model

Microsoft Research · big-tech · 2026-05-09

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

Score 6

Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern...

model-release research

Watchlist Matched: release, research

AI2 · research · 2026-05-05

MolmoAct 2: An open foundation for robots that work in the real world

Score 6

MolmoAct 2 is a fully open robotics foundation model that brings faster, stronger 3D action reasoning to real-world robot tasks, alongside a major new bimanual manipulation dataset for researchers to study, reproduce, and build on.

benchmark model-release training

Watchlist Matched: model

Lambda · cloud · 2026-05-04

Most AI teams treat compute as a commodity. It's not.

Score 6

Consider two teams provisioning 8,192 GPUs for a large training run. Same model, same dataset, same budget. Team A lands on a facility purpose-built for AI with sufficient power density, carefully engineered liquid cooling, a high-performa...

model-release frontier-model

Watchlist Matched: performance, model, training

AI2 · research · 2026-04-30

AstaBench update: New results, plus adoption from industry

Score 6

AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.

Watchlist Matched: model, frontier-model

AI2 · research · 2026-04-20

Train separately, merge together: Modular post-training with mixture-of-experts

Score 6

BAR is a recipe for post-training language models one capability at a time—train domain experts independently, merge them into a single mixture-of-experts model, and upgrade any expert without impacting the others.

Watchlist Matched: model, training, post-training

Replicate · inference-infra · 2026-04-15

How to make remarkable videos with Seedance 2.0

Score 6

If you have never tried a video model before, now is the time.

model-release training agents

Watchlist Matched: model

AI2 · research · 2026-03-24

MolmoWeb: An open agent for automating web tasks

Score 6

Introducing MolmoWeb, an open visual web agent that navigates and completes tasks in a browser using screenshots alone, along with MolmoWebMix, the largest public dataset for training web agents.

model-release training fine-tuning

Watchlist Matched: introducing, training, agent, agents

AI2 · research · 2026-03-11

MolmoBot: Training robot manipulation entirely in simulation

Score 6

MolmoBot is an open robotic manipulation model suite trained entirely in simulation—demonstrating zero-shot transfer to real-world robots without any real-world data collection or fine-tuning.

Watchlist Matched: model, training, fine-tuning

AI2 · research · 2026-03-11

Ai2 introduces open, simulation-first stack for physical AI, achieving zero-shot transfer to real robots

Score 6

Introducing MolmoBot and MolmoSpaces, an open foundation for training real-world robots to advance science.

Watchlist Matched: introducing, training

AI2 · research · 2026-02-13

Olmix: A framework for data mixing throughout LM development

Score 6

Olmix is a framework for language model data mixing that provides empirically grounded defaults and efficient reuse techniques.

Watchlist Matched: model

Replicate · inference-infra · 2025-11-26

Run Isaac 0.1 on Replicate

Score 6

Isaac 0.1 is a lightweight, grounded vision-language model built for real-world perception.

benchmark model-release research training

Watchlist Matched: model

BAIR · research · 2025-11-01

RL without TD learning

Score 4

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalabilit...

Watchlist Matched: benchmark, performance, model, paper, training

LY Corporation Tech Blog · korea · 2025-10-20

End to End Testing on PRs

Score 4

At LY Corporation we're constantly working to improve our pre-release test process and reduce the ri...

Watchlist Matched: release

Replicate · inference-infra · 2025-07-31

Open source video is back

Score 6

Wan 2.2 is our fastest, cheapest video model.

Watchlist Matched: model, open source

Replicate · inference-infra · 2025-06-06

Get the most from Google Veo 3

Score 6

We're sharing our experiments and tips on Google's new Veo 3 model.

Watchlist Matched: model

Replicate · inference-infra · 2025-05-29

Use FLUX.1 Kontext to edit images with words

Score 6

This is how to get the most from Black Forest Labs' new image editing model.

serving kernel benchmark model-release research training agents

Watchlist Matched: model

BAIR · research · 2025-03-25

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Score 6

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...

Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents

Replicate · inference-infra · 2025-03-05

Wan2.1 parameter sweep

Score 6

We've been playing with Alibaba's WAN2.1 text-to-video model lately. What happens when you tweak those mysterious parameters? Let's find out.

Watchlist Matched: model

Replicate · inference-infra · 2024-10-22

Stable Diffusion 3.5 is here

Score 6

Stability AI's latest text-to-image model is now available on Replicate and you can run it with an API.

Watchlist Matched: model, api

Replicate · inference-infra · 2024-08-30

Fine-tune FLUX.1 to create images of yourself

Score 6

Create your own fine-tuned Flux model to generate new images of yourself.

Watchlist Matched: model

Replicate · inference-infra · 2024-08-02

Replicate Intelligence #9

Score 6

Open source frontier image model, cut objects from videos, new Python web framework from Jeremy Howard

model-release api open-source

Watchlist Matched: model, open source

Replicate · inference-infra · 2024-08-01

Run FLUX with an API

Score 6

FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.

Watchlist Matched: model, api, open-source

Replicate · inference-infra · 2024-07-26

Replicate Intelligence #8

Score 6

A top-tier open-ish language model, new safety classifiers, model search API

Watchlist Matched: model, api

Replicate · inference-infra · 2024-06-28

Replicate Intelligence #6

Score 6

Google's Gemma2 models, language model leaderboard, tips for Stable Diffusion 3

Watchlist Matched: model, leaderboard

Replicate · inference-infra · 2024-06-21

Replicate Intelligence #5

Score 6

Really good coding model, AI search breakthroughs, Discord support bot

Watchlist Matched: model

SqueezeBits · korea · 2024-05-27

Experiencing AI Model Compression Firsthand: Our IT Exhibition Story

Score 2

SqueezeBits' IT exhibition recap: from AI model compression demos to hands-on OwLite experiences, booth visitor reactions, and more. Read our on-the-ground event story!

Watchlist Matched: model

Replicate · inference-infra · 2023-11-08

Generate music from chord progressions and text prompts with MusicGen-Chord

Score 6

We’ve added chord conditioning to Meta’s MusicGen model, so you can create automatic backing tracks in any style using text prompts and chord progressions.

model-release fine-tuning

Watchlist Matched: model

Replicate · inference-infra · 2023-08-22

Painting with words: a history of text-to-image AI

Score 6

With the recent release of Stable Diffusion XL fine-tuning on Replicate, and today being the 1-year anniversary of Stable Diffusion, now feels like the perfect opportunity to take a step back and reflect on how text-to-image AI has improve...