VESSL AI · korea · 2026-06-03
Introducing the Dashboard: Monitor Your GPU Workloads at a Glance
No feed summary available yet.
High signal Matched: gpu, introducing
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, introducing
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: launch, cloud
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: release
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: introducing
KubeAI · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: model
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, serve, performance, model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model, frontier model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: inference, model
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: model, training
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding, model, training
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: performance, model
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model, agentic
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model, training, frontier model
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, model, open model
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
High signal Matched: model, evaluation
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: release
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: introducing, tool use
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Upstage · korea · 2026-06-03
No feed summary available yet.
High signal Matched: model
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: model
AWS Machine Learning Blog · cloud · 2026-06-03
Fine-tuning for domain-specific tasks means improving performance in one area without degrading the model’s general capabilities, and getting that balance right is harder than it looks. This post walks through how to navigate that balance,...
High signal Matched: performance, model, training, checkpointing, fine-tuning
Lambda · cloud · 2026-06-03
Lambda workspaces help teams organize cloud resources, control access, and separate dev, staging, and production in shared GPU environments. A junior researcher kills a production training run. A contractor sees weights they shouldn't. If...
High signal Matched: gpu, introducing, weights, cloud, training
AWS Machine Learning Blog · cloud · 2026-06-02
While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized...
High signal Matched: model, bedrock, mcp
AWS Machine Learning Blog · cloud · 2026-06-02
If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for in...
High signal Matched: inference, gpu, model
Hugging Face · open-source · 2026-06-02
No feed summary available yet.
High signal Matched: introducing, model
vLLM Project · open-source · 2026-06-02
Long-horizon LLM agents create a routing problem that single-turn prompt routers were not designed to solve. A router still needs to know which model is best for the current request, but it also...
High signal Matched: router, model, agents, agentic
Lambda · cloud · 2026-06-01
When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...
High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic
Hugging Face · open-source · 2026-06-01
No feed summary available yet.
High signal Matched: model
vLLM Project · open-source · 2026-06-01
A technical deep dive on running vLLM on NVIDIA DGX Spark and GB10 systems, covering sm_121 architecture, unified memory behavior, NVFP4 model serving, Nemotron-3-Super configuration, Docker deployment, Prometheus metrics, and local evalua...
High signal Matched: serving, model, evaluation
NVIDIA Technical Blog · hardware · 2026-05-29
Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker...
High signal Matched: serving, prefill, model
NVIDIA Technical Blog · hardware · 2026-05-29
As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams...
High signal Matched: model
Nota AI · korea · 2026-05-29
Jaehoon Lee Technical Content Manager, Nota AI When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...
High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard
AWS Machine Learning Blog · cloud · 2026-05-29
Azercell Telecom LLC, Azerbaijan's leading telecommunications provider, wanted to build an Azerbaijani large language model (LLM) on Amazon SageMaker AI for telecom use cases and a customer-facing chatbot. The challenge: adapting foundatio...
High signal Matched: model, sagemaker, training
AWS Machine Learning Blog · cloud · 2026-05-29
This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.
High signal Matched: inference, model, bedrock, agentic
AMD ROCm Blogs · hardware · 2026-05-29
Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...
High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model
PyTorch Foundation · open-source · 2026-05-28
When you use PyTorch’s compiler, your model runs faster, up to 10x faster. But what’s actually happening? Without compilation, the GPU runs a kernel, a function on the GPU, for...
High signal Matched: kernel, gpu, model
PyTorch Foundation · open-source · 2026-05-28
TL;DR: The TokenSpeed inference engine achieved a record-breaking 580 tps running the Qwen3.5-397B-A17B model on GPUs. This extreme performance for agentic workloads is driven by systematic elimination of memory copies,...
High signal Matched: inference, performance, gpu, model, agentic
vLLM Project · open-source · 2026-05-28
The v0.5.0 release brings significant architectural improvements to speculative decoding model training, introducing DFlash algorithm support, fully unified online training capabilities, and a...
High signal Matched: decoding, speculative decoding, release, introducing, model, training
vLLM Project · open-source · 2026-05-28
Most routing systems start with a prompt and choose a model endpoint. vLLM Semantic Router (VSR) makes a different bet: before a request reaches the serving model, the system should extract...
High signal Matched: serving, endpoint, router, model
AMD ROCm Blogs · hardware · 2026-05-27
Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves th...
High signal Matched: gemm, performance, fp8
Modal · inference-infra · 2026-05-27
Introducing Role-Based Access Control for humans and agents, now available for all users on Teams and Enterprise plans.
High signal Matched: introducing, agents
PyTorch Foundation · open-source · 2026-05-26
Code available at: https://github.com/facebookresearch/ads_model_kernel_library In this post, we present the design of TLX Block Attention — a Triton kernel targeting NVIDIA Blackwell GPUs that exploits compile-time knowledge of a block-di...
High signal Matched: kernel, triton, blackwell, model
NVIDIA Technical Blog · hardware · 2026-05-26
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
High signal Matched: cuda, performance, gpu, launch
AMD ROCm Blogs · hardware · 2026-05-25
Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...
High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate
Lambda · cloud · 2026-05-22
After 15 months of incremental updates, leaks, and rumored leaks, DeepSeek released version 4. It arrived without the fanfare R1 and R1-preview commanded in early 2025. That quiet reception is the most interesting thing about the release....
High signal Matched: inference, serving, performance, cost, release, model, open-source
SkyPilot · open-source · 2026-05-22
Online reinforcement learning for LLMs breaks Slurm's batch scheduling model. We'll discuss why, and what can be done about it.
High signal Matched: model
AMD ROCm Blogs · hardware · 2026-05-22
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...
High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source
Lambda · cloud · 2026-05-20
What the numbers mean for financial services Executive summary Lambda is the first to publish an audited STAC-AI™ LANG6 result on NVIDIA HGX B200, with independently verified performance data that Financial Services Industry (FSI) infrastr...
High signal Matched: inference, generation, performance, gpu, h200, b200, model, evaluating
AMD ROCm Blogs · hardware · 2026-05-20
Large Language Models (LLMs) typically contain billions — or even tens of billions — of parameters. During inference, tensor parallelism is commonly employed to distribute the workload across multiple GPUs. This approach demands frequent,...
High signal Matched: inference, latency, introducing, quantization
NVIDIA Technical Blog · hardware · 2026-05-19
Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...
High signal Matched: model, agent, agents, mcp
NVIDIA Technical Blog · hardware · 2026-05-19
Evaluating an AI model and evaluating an AI agent are related—but they answer fundamentally different questions. A model benchmark tests the capability of a...
High signal Matched: benchmark, model, evaluation, evaluating, agent, agentic
Hugging Face · open-source · 2026-05-19
No feed summary available yet.
High signal Matched: introducing
PyTorch Foundation · open-source · 2026-05-19
TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for PyTorch models on Apple Silicon Macs, using Apple’s MLX framework. The delegate seamlessly integrates with...
High signal Matched: inference, gpu, introducing
Modal · inference-infra · 2026-05-19
No feed summary available yet.
High signal Matched: introducing, agents
Together AI · inference-infra · 2026-05-15
Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.
High signal Matched: inference, endpoint, cost, launch, research
NVIDIA Technical Blog · hardware · 2026-05-14
Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,...
High signal Matched: inference, introducing, agentic
PyTorch Foundation · open-source · 2026-05-14
We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due...
High signal Matched: cuda, release
Microsoft Research · big-tech · 2026-05-14
Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and...
High signal Matched: cost, introducing, model, research
vLLM Project · open-source · 2026-05-14
We are excited to announce the pre-release of VeRL-Omni, a general reinforcement learning (RL) post-training framework focused on multimodal generative models, built on top of verl and vllm-omni.
High signal Matched: release, training, post-training
LMCache · open-source · 2026-05-13
A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation trac...
High signal Matched: lmcache, moe, mi300x, rocm, fp8, agentic
AI2 · research · 2026-05-13
AIMIP is a new open benchmark and dataset for evaluating AI climate models, showing they can match or beat conventional models on some historical climate metrics while still struggling to generalize reliably to long-term warming trends and...
High signal Matched: benchmark, introducing, model, evaluating
Microsoft Research · big-tech · 2026-05-12
MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials w...
High signal Matched: model, research
NVIDIA Technical Blog · hardware · 2026-05-12
The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...
High signal Matched: serving, model, fine-tuning
Modular · inference-infra · 2026-05-12
Inkwell: Why Your Inference Platform Matters As Much As Your Model
High signal Matched: inference, model
Together AI · inference-infra · 2026-05-12
Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.
High signal Matched: introducing
Hugging Face · open-source · 2026-05-12
No feed summary available yet.
High signal Matched: inference, model, training
NVIDIA Technical Blog · hardware · 2026-05-11
The compute capability of large GPU fleets presents unprecedented opportunities to innovate and provide value to customers in record time. Yet these...
High signal Matched: gpu, introducing
Nota AI · korea · 2026-05-11
Jaehoon Lee Technical Content Manager, Nota AI NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...
High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api
BAIR · research · 2026-05-08
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...
High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model
NVIDIA Technical Blog · hardware · 2026-05-08
Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is...
High signal Matched: decoding, generation, model, agents
Together AI · inference-infra · 2026-05-08
Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.
High signal Matched: inference, gpu, release, model
AI2 · research · 2026-05-08
EMO is a new mixture-of-experts model trained so modular expert groups emerge from data, enabling users to select small task-specific expert subsets while preserving near full-model performance.
High signal Matched: mixture of experts, performance, model, pretraining
NVIDIA Technical Blog · hardware · 2026-05-07
Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices such as NVIDIA GeForce RTX GPUs. By...
High signal Matched: inference, performance, model, training, post-training, quantization
LMCache · open-source · 2026-05-05
DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...
High signal Matched: kv cache, lmcache, model
Cloudflare Blog · cloud · 2026-05-01
Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.
High signal Matched: serve, cost, introducing
NVIDIA Technical Blog · hardware · 2026-04-30
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...
High signal Matched: kernel, cuda, gpu, model, agents
Nota AI · korea · 2026-04-29
Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...
High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic
Hugging Face · open-source · 2026-04-29
No feed summary available yet.
High signal Matched: introducing, long-context, agents
NVIDIA Technical Blog · hardware · 2026-04-28
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...
High signal Matched: model, open model, agent, agentic
Together AI · inference-infra · 2026-04-28
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
High signal Matched: model, open model, agentic
vLLM Project · open-source · 2026-04-28
We are excited to support the newly released NVIDIA Nemotron 3 Nano Omni model on vLLM.
High signal Matched: model, agentic
Sakana AI · model-lab · 2026-04-24
No feed summary available yet.
High signal Matched: model, agent
LMCache · open-source · 2026-04-23
Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...
High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker
AI2 · research · 2026-04-23
OlmoEarth Studio now lets users export custom Earth-observation embeddings from our OlmoEarth foundation models and use them for tasks like similarity search, few-shot mapping, change detection, and unsupervised exploration.
High signal Matched: introducing
Nota AI · korea · 2026-04-22
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
vLLM Project · open-source · 2026-04-22
Long-context LLM serving is increasingly memory-bound: for standard full-attention decoders, the KV cache often dominates GPU memory at 128k+ contexts, and each decode step must read a large...
High signal Matched: serving, kv cache, gpu, fp8, quantization, long-context
SkyPilot · open-source · 2026-04-22
Introducing GPU Compass: One dashboard to browse, compare pricing, and launch across every GPU cloud.
High signal Matched: gpu, introducing, launch, cloud
NVIDIA Technical Blog · hardware · 2026-04-20
As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy...
High signal Matched: generation, throughput, fp8, training
BAIR · research · 2026-04-20
.grasp-results-table table { font-size: 0.875rem; line-height: 1.35; width: 100%; } .grasp-results-table th, .grasp-results-table td { padding: 0.35rem 0.5rem; } /* Consistent whitespace between major sections (this post is long and hr-hea...
High signal Matched: performance, model, paper, arxiv, evaluation, training
Together AI · inference-infra · 2026-04-15
Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just...
High signal Matched: performance, model
NVIDIA Technical Blog · hardware · 2026-04-14
NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising...
High signal Matched: model
NVIDIA Technical Blog · hardware · 2026-04-12
The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,...
High signal Matched: release, model, agentic
SkyPilot · open-source · 2026-04-10
With the SkyPilot Agent Skill, your AI coding agent can launch clusters, run training jobs and manage cloud resources across any infrastructure using natural language.
High signal Matched: launch, cloud, training, agent, agents
NVIDIA Technical Blog · hardware · 2026-04-09
Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume...
High signal Matched: model, weights, checkpoint, training
Google Research · big-tech · 2026-04-09
Generative AI
High signal Matched: introducing, agents
Nota AI · korea · 2026-04-08
Jaehoon Lee Technical Content Manager, Nota AI AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...
High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate
AI2 · research · 2026-04-07
WildDet3D is an open model that predicts 3D bounding boxes from a single image. It generalizes across cameras and object categories, and folds in depth signals when available—alongside a new dataset of verified 3D annotations.
High signal Matched: introducing, model, open model
Together AI · inference-infra · 2026-04-03
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
High signal Matched: generation, model
LY Corporation Tech Blog · korea · 2026-04-02
Hello. I’m Inoue, and I work on private cloud infrastructure at LY Corporation.What powers LY Corpor...
High signal Matched: generation, introducing, cloud
NVIDIA Technical Blog · hardware · 2026-04-02
In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU...
High signal Matched: throughput, gpu, model
NVIDIA Technical Blog · hardware · 2026-04-02
The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from...
High signal Matched: launch
Together AI · inference-infra · 2026-04-02
Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.
High signal Matched: inference, model, agents
Modular · inference-infra · 2026-04-02
Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD
High signal Matched: performance, launch
vLLM Project · open-source · 2026-04-02
With the debut of Gemma 4, vLLM introduces immediate support for Google's most sophisticated open model lineup, spanning multiple hardware backends, with first-ever Day 0 support on Google TPUs,...
High signal Matched: model, open model
Nota AI · korea · 2026-03-31
Jaehoon Lee Technical Content Manager, Nota AI In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...
High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model
NVIDIA Technical Blog · hardware · 2026-03-25
In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition...
High signal Matched: throughput, gpu, model
NVIDIA Technical Blog · hardware · 2026-03-25
Developing new protein-based therapies and catalysts involves the challenging task of designing protein binders, or proteins that bind to a target protein or...
High signal Matched: model
vLLM Project · open-source · 2026-03-24
We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...
High signal Matched: model, api
Nota AI · korea · 2026-03-23
Jaehoon Lee Technical Content Manager, Nota AI GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...
High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source
NVIDIA Technical Blog · hardware · 2026-03-23
As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its limits. Prefill and decode stages...
High signal Matched: inference, serving, prefill, model
Hugging Face · open-source · 2026-03-21
No feed summary available yet.
High signal Matched: model
Nota AI · korea · 2026-03-20
NP Product Team, Nota AI The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...
High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization
Together AI · inference-infra · 2026-03-18
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
High signal Matched: throughput, cost, model, training, fine-tuning
AI2 · research · 2026-03-18
MolmoPoint is a new vision-language model architecture that replaces text-based coordinate outputs with a more natural, token-based pointing mechanism that directly selects regions from visual features.
High signal Matched: model
NVIDIA Technical Blog · hardware · 2026-03-16
AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward...
High signal Matched: introducing, agentic
Nota AI · korea · 2026-03-13
Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...
High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context
BAIR · research · 2026-03-13
--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...
High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag
llm-d · open-source · 2026-03-13
A lightweight ML model trained online from live traffic replaces manually tuned heuristic weights with direct latency predictions, achieving 43% improvement in P50 end-to-end latency and 70% improvement in TTFT on a production-realistic wo...
High signal Matched: latency, ttft, model, weights
vLLM Project · open-source · 2026-03-13
EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you...
High signal Matched: inference, decoding, speculative decoding, eagle, model
Google Research · big-tech · 2026-03-12
Climate & Sustainability
High signal Matched: introducing
vLLM Project · open-source · 2026-03-11
We are excited to support the newly released NVIDIA Nemotron 3 Super model on vLLM.
High signal Matched: model, agent
SkyPilot · open-source · 2026-03-11
SkyPilot Recipes let you store SkyPilot YAMLs in a shared, team-accessible registry. Launch workloads directly from the CLI without local files.
High signal Matched: launch
Hugging Face · open-source · 2026-03-10
No feed summary available yet.
High signal Matched: introducing
vLLM Project · open-source · 2026-03-10
Since v0.1 Iris, vLLM Semantic Router has made a large jump. In one release cycle, the project rebuilt its model stack, expanded routing into safety, semantic caching, memory, retrieval, and...
High signal Matched: router, release, model, retrieval
Hugging Face · open-source · 2026-03-05
No feed summary available yet.
High signal Matched: introducing
AI2 · research · 2026-03-05
Olmo Hybrid is a fully open 7B language model that combines transformer attention with linear RNN layers to achieve greater expressivity and significantly improved data and compute efficiency compared to pure transformer models.
High signal Matched: introducing, model
Hugging Face · open-source · 2026-03-04
No feed summary available yet.
High signal Matched: model, training
AIBrix · open-source · 2026-03-03
🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...
High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible
Together AI · inference-infra · 2026-03-02
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
High signal Matched: introducing, research, open-source
Nota AI · korea · 2026-02-26
Jewon Lee | Wooksu Shin | Seungmin Yang | Ki-Ung Song | Donguk Lim | Jaeyeon Kim | Tae-Ho Kim | Bo-Kyeong KimEdgeFM Team, Nota AI ✔️ Resources for more information: GitHub, ArXiv, Project Page, Demo.✔️ Accepted at ICLR 2026. &...
High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks
vLLM Project · open-source · 2026-02-26
Organizations and individuals running multiple custom AI models, especially recent Mixture of Experts (MoE) model families, can face the challenge of paying for idle GPU capacity when the...
High signal Matched: serve, moe, mixture of experts, gpu, model, sagemaker, bedrock
Modal · inference-infra · 2026-02-24
Introducing Directory Snapshots, a programatic way to snapshot a specific directory within a running Sandbox and mount it into another Sandbox later, independently of the base image and the rest of the filesystem.
High signal Matched: introducing
Together AI · inference-infra · 2026-02-12
Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
High signal Matched: inference, introducing
Hugging Face · open-source · 2026-02-06
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2026-02-04
No feed summary available yet.
High signal Matched: model
Together AI · inference-infra · 2026-02-02
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...
High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss
vLLM Project · open-source · 2026-02-01
TL;DR: In collaboration with the open-source community, vLLM + NVIDIA has achieved significant performance milestones on the gpt-oss-120b model running on NVIDIA's Blackwell GPUs. Through deep...
High signal Matched: performance, blackwell, model, open-source, oss
vLLM Project · open-source · 2026-01-31
Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...
High signal Matched: inference, model, api
Hugging Face · open-source · 2026-01-29
No feed summary available yet.
High signal Matched: introducing
Together AI · inference-infra · 2026-01-26
Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...
High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source
Google Research · big-tech · 2026-01-24
Algorithms & Theory
High signal Matched: introducing
Hugging Face · open-source · 2026-01-20
No feed summary available yet.
High signal Matched: introducing
Together AI · inference-infra · 2026-01-13
Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...
High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents
Together AI · inference-infra · 2026-01-12
Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.
High signal Matched: distributed, multi-node, gpu, model, training, distributed training
BAIR · research · 2026-01-10
An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements distinguish objects. Man...
High signal Matched: performance, model, paper, evaluation, training, evaluate
Together AI · inference-infra · 2026-01-08
Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.
High signal Matched: performance, cost, model, open model, evaluating, open-source
SqueezeBits · korea · 2026-01-07
A recap of the Intel® Gaudi® hands-on workshop co-hosted by SqueezeBits and Lablup. AI model compression, fine-tuning, and vLLM serving on Gaudi® hardware with Backend.AI.
High signal Matched: serving, model, fine-tuning
Hugging Face · open-source · 2026-01-05
No feed summary available yet.
High signal Matched: introducing
vLLM Project · open-source · 2026-01-05
vLLM Semantic Router is the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems. It lives between users and models, capturing signals from...
High signal Matched: router, release
vLLM Project · open-source · 2026-01-02
As a passionate vLLM community member who wants to see vLLM thrive and reach even more developers, I'm excited to announce vLLM Playground – a modern, feature-rich web interface for managing and...
High signal Matched: introducing
SqueezeBits · korea · 2025-12-24
Introducing ATOM™-Max, rebellions’ next-generation NPU designed for high-performance AI inference. Learn how its runtime, profiling tools, and PyTorch-native integrations enable developers to run and serve models efficiently without sacrif...
High signal Matched: inference, generation, serve, performance, npu, introducing, rebellions
SkyPilot · open-source · 2025-12-19
SkyPilot now includes predefined templates to launch clusters with popular frameworks and patterns. Deploy fully configured environments without writing long YAMLs.
High signal Matched: launch
Nota AI · korea · 2025-12-19
Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...
High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval
Together AI · inference-infra · 2025-12-15
Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud
High signal Matched: model, cloud, reasoning model
vLLM Project · open-source · 2025-12-15
Modern Large Multimodal Models (LMMs) introduce a unique serving-time bottleneck: before any text generation can begin, all images must be processed by a visual encoder (e.g., ViT). This encoder...
High signal Matched: serving, generation, model
vLLM Project · open-source · 2025-12-15
Jan 28th Update: NVIDIA just released their Nemotron 3 Nano model in NVFP4 precision. This model is supported by vLLM out of the box and it uses a new method called Quantization-Aware Distillation...
High signal Matched: model, quantization, agents
vLLM Project · open-source · 2025-12-13
Efficiently managing request distribution across a fleet of model replicas is a critical requirement for large-scale, production vLLM deployments. Standard load balancers often fall short as they...
High signal Matched: serving, prefill, router, performance, model
vLLM Project · open-source · 2025-12-13
- Speculative decoding serves as an optimization to improve inference performance; however, training a unique draft model for each LLM can be difficult and time-consuming, while production-ready...
High signal Matched: inference, decoding, speculative decoding, draft model, performance, model, training
Hugging Face · open-source · 2025-12-12
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-12-05
No feed summary available yet.
High signal Matched: introducing
Together AI · inference-infra · 2025-12-03
AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...
High signal Matched: inference, decoding, speculative decoding, introducing
vLLM Project · open-source · 2025-12-03
Several months ago, we published a blog post about CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond, introducing a powerful technique for debugging illegal memory access...
High signal Matched: cuda, gpu, introducing
Hugging Face · open-source · 2025-12-01
No feed summary available yet.
High signal Matched: model
vLLM Project · open-source · 2025-11-30
We are excited to announce the official release of vLLM-Omni, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.
High signal Matched: serving, generation, release, model
Google Research · big-tech · 2025-11-22
Algorithms & Theory
High signal Matched: model
vLLM Project · open-source · 2025-11-22
Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster, simplifying the workflow to spawn vLLM servers...
High signal Matched: serving, multi-node, launch
Hugging Face · open-source · 2025-11-20
No feed summary available yet.
High signal Matched: introducing, api
Modal · inference-infra · 2025-11-19
Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.
High signal Matched: latency, gpu, model
AIBrix · open-source · 2025-11-10
🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...
High signal Matched: prefill, latency, release, evaluation, api, openai-compatible
Google Research · big-tech · 2025-11-08
Algorithms & Theory
High signal Matched: introducing
Modular · inference-infra · 2025-11-07
"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis
High signal Matched: model
SqueezeBits · korea · 2025-10-31
Explore how the Yetter Inference Engine overcomes the limitations of step caching and model distillation for diffusion models. We analyze latency, diversity, quality, and negative-prompt handling to reveal what truly matters for scalable,...
High signal Matched: inference, generation, latency, model
Hugging Face · open-source · 2025-10-23
No feed summary available yet.
High signal Matched: introducing, agent
Together AI · inference-infra · 2025-10-21
Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.
High signal Matched: generation, model, openai-compatible
Google Research · big-tech · 2025-10-02
Human-Computer Interaction and Visualization
High signal Matched: introducing
Hugging Face · open-source · 2025-10-01
No feed summary available yet.
High signal Matched: introducing, evaluation, retrieval
Replicate · inference-infra · 2025-09-23
Here is the ultimate comparison post on all the latest image editing models.
High signal Matched: model
Replicate · inference-infra · 2025-09-17
Find the best models and collections with a single API call.
High signal Matched: introducing, api
Together AI · inference-infra · 2025-09-15
Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...
High signal Matched: inference, cost, model, api
Hugging Face · open-source · 2025-09-12
No feed summary available yet.
High signal Matched: introducing
Modal · inference-infra · 2025-09-09
A collaborative environment for high-performance interactive computing on GPUs.
High signal Matched: performance, introducing
Hugging Face · open-source · 2025-09-04
No feed summary available yet.
High signal Matched: model
BAIR · research · 2025-09-01
What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern lan...
High signal Matched: benchmark, performance, model, weights, paper, training
Together AI · inference-infra · 2025-08-27
Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.
High signal Matched: deepseek-v3, model, swe-bench
SqueezeBits · korea · 2025-08-20
Efficient AI Study & Meetup recap: SqueezeBits' community study on AI model compression, featuring paper reviews, participant interviews, and networking from the offline meetup.
High signal Matched: model, paper
Together AI · inference-infra · 2025-08-15
Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.
High signal Matched: model, fine-tuning, open-source
Hugging Face · open-source · 2025-08-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-08-07
No feed summary available yet.
High signal Matched: model
AIBrix · open-source · 2025-08-05
AIBrix is a composable, cloud‑native LLM inference infrastructure designed to deliver high performance and low cost at scale. We now present a major update in a new release - v0.4.0. This release tackles key bottlenecks in orchestration an...
High signal Matched: inference, prefill, generation, token generation, throughput, performance, cost, gpu, release, cloud
Modular · inference-infra · 2025-08-05
Modular Platform 25.5: Introducing Large Scale Batch Inference
High signal Matched: inference, introducing
Together AI · inference-infra · 2025-08-05
Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.
High signal Matched: model, oss
Hugging Face · open-source · 2025-08-05
No feed summary available yet.
High signal Matched: model, open-source, oss
SqueezeBits · korea · 2025-08-04
Trimming large multilingual vocabularies in Small Language Models (SLM) is a simple, low-risk way to boost efficiency to its limit. It accelerates the model inference significantly while keeping accuracy almost unchanged.
High signal Matched: inference, model
SkyPilot · open-source · 2025-07-30
There are a lot of discussions happening in AI infrastructure right now. On one side, we have researchers who trained on Slurm in grad school, comfortable with sbatch train_model.sh and the predictability of academic HPC clusters. On the o...
High signal Matched: model, cloud
Hugging Face · open-source · 2025-07-29
No feed summary available yet.
High signal Matched: introducing
Together AI · inference-infra · 2025-07-28
Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.
High signal Matched: benchmark, model, open-source
Together AI · inference-infra · 2025-07-25
Unlock agentic coding with Qwen3-Coder on Together AI: 256K context, SWE-bench rivaling Claude Sonnet 4, zero-setup instant deployment.
High signal Matched: model, swe-bench, agentic
SkyPilot · open-source · 2025-07-24
Announcing SkyPilot 0.10 - the largest release yet with enterprise-grade features.
High signal Matched: release
Hugging Face · open-source · 2025-07-23
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2025-07-16
Engineers of language model applications should think about requests, not tokens.
High signal Matched: model
Together AI · inference-infra · 2025-07-14
Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.
High signal Matched: cost, model, open model, agentic, open-source
Modal · inference-infra · 2025-07-11
Welcome to another round of Modal Product Updates! Here's what's new this month.
High signal Matched: multi-node, b200, release, training
Nota AI · korea · 2025-07-10
Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...
High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks
Replicate · inference-infra · 2025-07-07
It's hard keeping up with every new video model. In this post we'll help you pick the best one for your needs.
High signal Matched: model
BAIR · research · 2025-07-01
.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...
High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents
Together AI · inference-infra · 2025-06-11
No feed summary available yet.
High signal Matched: cost, introducing, api
Hugging Face · open-source · 2025-06-11
No feed summary available yet.
High signal Matched: introducing, training
SqueezeBits · korea · 2025-06-10
SqueezeBits at Japan IT Week Spring 2025 in Tokyo: AI model compression demos, OwLite and Fits on Chips introductions, Japan market entry experiences, and team stories from the frontline.
High signal Matched: model
Modular · inference-infra · 2025-06-10
Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple
High signal Matched: introducing
Modal · inference-infra · 2025-06-09
We've released v1.0 of the Modal client, marking a new milestone of maturity and stability for our platform.
High signal Matched: introducing
Together AI · inference-infra · 2025-06-05
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-06-03
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2025-05-30
We’re excited to be making Nvidia B200 and H200 GPUs available on Modal starting today!
High signal Matched: h200, b200, introducing
Modal · inference-infra · 2025-05-22
Modal Batch is a new interface backed by a new durable queue system built specifically to make job processing easy, scalable, and fault-tolerant.
High signal Matched: introducing
Replicate · inference-infra · 2025-05-22
Google's flagship image generation model, Imagen 4, is now available for you to try on Replicate. Create images with fine detail, versatile styles, and improved typography.
High signal Matched: generation, model
AIBrix · open-source · 2025-05-22
AIBrix is a composable, cloud-native AI infrastructure toolkit designed to power scalable and cost-effective large language model (LLM) inference. As production demands for memory-efficient and latency-aware LLM services continue to grow,...
High signal Matched: inference, prefix cache, latency, cost, release, model, cloud
llm-d · open-source · 2025-05-20
Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.
High signal Matched: inference, serving, distributed, performance, introducing, sota
SqueezeBits · korea · 2025-05-20
This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.
High signal Matched: model, quantized
llm-d · open-source · 2025-05-20
Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.
High signal Matched: inference, distributed, release, cloud, open source
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
High signal Matched: introducing, sota
Hugging Face · open-source · 2025-05-15
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-05-14
No feed summary available yet.
High signal Matched: model
Nota AI · korea · 2025-05-08
Jaewoo SongSoftware Engineer, Nota AI SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...
High signal Matched: performance, model, weights, research, quantization, int8, int4
Nota AI · korea · 2025-05-07
Jewon Lee | Ki-Ung Song | Seungmin Yang | Donguk Lim | Jaeyeon Kim | Wooksu Shin | Bo-Kyeong Kim | Tae-Ho KimEdgeFM Team, Nota AI Yong Jae Lee, Ph. D.Associate Professor, UW-Madison SummaryOur method, Trimmed-Llama, reduces t...
High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source
Hugging Face · open-source · 2025-04-29
No feed summary available yet.
High signal Matched: introducing, quantization
Hugging Face · open-source · 2025-04-16
No feed summary available yet.
High signal Matched: introducing, evaluating, long-context
BAIR · research · 2025-04-11
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...
High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota
SqueezeBits · korea · 2025-04-11
Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.
High signal Matched: performance, model, quantization
BAIR · research · 2025-04-08
PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...
High signal Matched: inference, generation, cost, model, weights, research, training, retrieval
Nota AI · korea · 2025-04-08
Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...
High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks
SkyPilot · open-source · 2025-04-08
Techniques to speed up checkpointing by 9.6x and how to easily achieve them in SkyPilot
High signal Matched: performance, model, cloud, checkpointing
Hugging Face · open-source · 2025-04-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-03-24
No feed summary available yet.
High signal Matched: introducing
SkyPilot · open-source · 2025-03-11
Transforming SkyPilot into a scalable, multi-user platform.
High signal Matched: introducing
AIBrix · open-source · 2025-03-10
This blog post introduces deploying DeepSeek R1 using AIBrix. DeepSeek-R1 demonstrates remarkable proficiency in reasoning tasks through step-by-step training process. It features 671B total parameters with 37B active parameters, and 128k...
High signal Matched: inference, distributed, benchmark, model, weights, training, context length
Replicate · inference-infra · 2025-03-05
Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.
High signal Matched: generation, model, cloud, api, open-source
Hugging Face · open-source · 2025-02-27
No feed summary available yet.
High signal Matched: model
Nota AI · korea · 2025-02-25
Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI Jaeyeon KimAI Research Engineer, Nota AI SummaryIn this study, we propose a method for determining whether given multilingual...
High signal Matched: generation, performance, model, paper, research, training, fine-tuning
AIBrix · open-source · 2025-02-21
Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...
High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source
AIBrix · open-source · 2025-02-19
We’re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...
High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent
Hugging Face · open-source · 2025-02-18
No feed summary available yet.
High signal Matched: inference, introducing
Modular · inference-infra · 2025-02-18
MAX 25.1 - Introducing MAX Builds
High signal Matched: introducing
Modal · inference-infra · 2025-01-28
Serializing container state to disk for aggressive cold start optimization.
High signal Matched: checkpoint
Hugging Face · open-source · 2025-01-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-01-22
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-01-16
No feed summary available yet.
High signal Matched: inference, generation, introducing
Hugging Face · open-source · 2025-01-16
No feed summary available yet.
High signal Matched: model
SqueezeBits · korea · 2025-01-13
In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.
High signal Matched: performance, accelerator, fp8, quantization, evaluate
Hugging Face · open-source · 2024-12-31
No feed summary available yet.
High signal Matched: introducing, agents
Hugging Face · open-source · 2024-12-23
No feed summary available yet.
High signal Matched: generation, model
Modal · inference-infra · 2024-12-19
NVIDIA L40S GPUs available on Modal now!
High signal Matched: introducing
Hugging Face · open-source · 2024-12-19
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-12-18
No feed summary available yet.
High signal Matched: inference, model
Modular · inference-infra · 2024-12-17
Introducing MAX 24.6: A GPU Native Generative AI Platform
High signal Matched: gpu, introducing
Hugging Face · open-source · 2024-12-17
No feed summary available yet.
High signal Matched: performance, model
Hugging Face · open-source · 2024-12-16
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-11-26
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2024-11-24
Announcing Modal's newest cloud partnership.
High signal Matched: release, cloud
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
High signal Matched: introducing, leaderboard
AIBrix · open-source · 2024-11-13
In recent years, large language models (LLMs) have revolutionized AI applications, powering solutions in areas like chatbots, automated content generation, and advanced recommendation engines. Services like OpenAI’s have gained significant...
High signal Matched: decoding, prefill, generation, kv cache, performance, cost, gpu, release, introducing, cloud, open-source
Hugging Face · open-source · 2024-10-29
No feed summary available yet.
High signal Matched: decoding, generation, model
Hugging Face · open-source · 2024-10-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
High signal Matched: model
Replicate · inference-infra · 2024-10-22
We've partnered with Ideogram to bring their inpainting model to Replicate's API.
High signal Matched: model, api
Hugging Face · open-source · 2024-10-10
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-04
No feed summary available yet.
High signal Matched: introducing, leaderboard
Replicate · inference-infra · 2024-10-03
Black Forest Labs continue to push boundaries with their latest release of FLUX.1 image generation model.
High signal Matched: generation, release, model
Hugging Face · open-source · 2024-09-17
No feed summary available yet.
High signal Matched: introducing
Modal · inference-infra · 2024-09-16
Learn how we used our new dynamic batching feature to improve throughput and reduce inference costs for the Whisper model with a single line of code!
High signal Matched: inference, throughput, model
Hugging Face · open-source · 2024-09-16
No feed summary available yet.
High signal Matched: introducing
SkyPilot · open-source · 2024-09-16
With last week’s Pixtral release, multimodal large language models (LLMs) like OpenAI’s GPT-4o, Google’s Gemini Pro, and Pixtral are making significant strides. These models are not only able to generate text from images...
High signal Matched: release
Hugging Face · open-source · 2024-08-12
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-08-06
No feed summary available yet.
High signal Matched: introducing
Nota AI · korea · 2024-08-02
Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...
High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source
Replicate · inference-infra · 2024-07-23
Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Modular · inference-infra · 2024-07-09
Bring your own PyTorch model
High signal Matched: model
Hugging Face · open-source · 2024-07-03
No feed summary available yet.
High signal Matched: model
SqueezeBits · korea · 2024-06-26
Estimating the cost savings from model compression.
High signal Matched: cost, model
Hugging Face · open-source · 2024-06-25
No feed summary available yet.
High signal Matched: model
Replicate · inference-infra · 2024-06-14
Create your own custom version of Stability's latest image generation model and run it on Replicate via the web or API.
High signal Matched: generation, model, api
Nota AI · korea · 2024-06-13
Jeongho KimResearch Engineer, Nota AI SummaryOnline multi-camera system for efficient individual trackingAccurate ID management with Cluster Self-Refinement (CSR)Improved performance with enhanced pose estimation Intro...
High signal Matched: performance, model, paper, research, evaluation, leaderboard
Replicate · inference-infra · 2024-06-12
Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2024-06-07
No feed summary available yet.
High signal Matched: introducing, sagemaker
Modular · inference-infra · 2024-06-07
MAX 24.4 - Introducing quantization APIs and MAX on macOS
High signal Matched: introducing, quantization
Hugging Face · open-source · 2024-06-05
No feed summary available yet.
High signal Matched: introducing
Modular · inference-infra · 2024-05-29
What ownership is really about: a mental model approach
High signal Matched: model
Hugging Face · open-source · 2024-05-24
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2024-05-21
How we fine-tuned a Stable Diffusion model on the Heroicons library to generate all the icons we could dream of.
High signal Matched: model, fine-tuning
Hugging Face · open-source · 2024-05-21
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-05-14
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-05-14
No feed summary available yet.
High signal Matched: introducing, leaderboard
Hugging Face · open-source · 2024-05-13
No feed summary available yet.
High signal Matched: introducing, agents
Modal · inference-infra · 2024-05-13
You can now specify which cloud region you would like to run your Functions in.
High signal Matched: introducing, cloud
Hugging Face · open-source · 2024-05-05
No feed summary available yet.
High signal Matched: introducing, leaderboard
Modular · inference-infra · 2024-05-02
MAX 24.3 - Introducing MAX Engine Extensibility
High signal Matched: introducing
SqueezeBits · korea · 2024-04-24
Clarifying the misunderstandings in AI model compression
High signal Matched: model
Hugging Face · open-source · 2024-04-23
No feed summary available yet.
High signal Matched: introducing, leaderboard
Replicate · inference-infra · 2024-04-23
Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
SqueezeBits · korea · 2024-04-19
Do I need to COMPRESS my AI model? : the short answer is “YES” — and here’s why.
High signal Matched: model
Replicate · inference-infra · 2024-04-18
Llama 3 is the latest language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2024-04-16
No feed summary available yet.
High signal Matched: introducing, evaluation, leaderboard
SqueezeBits · korea · 2024-04-15
AI model compression for acceleration is essential. The question is HOW? Here are 4 key methodologies.
High signal Matched: model
Hugging Face · open-source · 2024-04-15
No feed summary available yet.
High signal Matched: introducing, model
Hugging Face · open-source · 2024-04-10
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-04-09
No feed summary available yet.
High signal Matched: release
Hugging Face · open-source · 2024-03-21
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-03-20
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2024-03-05
No feed summary available yet.
High signal Matched: introducing, model
Modal · inference-infra · 2024-02-27
Modal now supports WebSocket connections, enabling real-time, bidirectional data transfer between client and server.
High signal Matched: introducing
Hugging Face · open-source · 2024-02-23
No feed summary available yet.
High signal Matched: introducing, leaderboard
Modal · inference-infra · 2024-02-21
Find out how Suno uses Modal to scale inference and batch pre-processing to thousands of GPUs.
High signal Matched: inference, launch
SkyPilot · open-source · 2024-02-20
SkyServe: A simple, cost-efficient, multi-region/cloud library for serving GenAI models.
High signal Matched: serving, cost, introducing, cloud
Hugging Face · open-source · 2024-02-20
No feed summary available yet.
High signal Matched: introducing, evaluation, korean, leaderboard
Modal · inference-infra · 2024-02-06
We’re excited to be making Nvidia H100 GPUs available on Modal starting today!
High signal Matched: h100, introducing
Hugging Face · open-source · 2024-01-31
No feed summary available yet.
High signal Matched: introducing, leaderboard
SkyPilot · open-source · 2023-12-21
A tutorial for serving Mixtral 8x7B model with SkyPilot and SkyServe.
High signal Matched: serving, mixtral, cost, gpu, model
Replicate · inference-infra · 2023-11-10
An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.
High signal Matched: performance, model, api, open-source
Hugging Face · open-source · 2023-11-07
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-11-03
No feed summary available yet.
High signal Matched: introducing
Replicate · inference-infra · 2023-10-25
How to run a latent consistency model on your M1 or M2 Mac
High signal Matched: model
Replicate · inference-infra · 2023-10-17
In this post we'll explore the basics of retrieval augmented generation by creating an example app that uses bge-large-en for embeddings, ChromaDB for vector store, and mistral-7b-instruct for language model generation.
High signal Matched: generation, model, retrieval augmented generation, retrieval
Modal · inference-infra · 2023-10-10
Modal Labs Announces Series A Financing Round, Securing $16 Million Investment to Launch Cloud-Based Infrastructure Platform, Build Towards End-to-End Enterprise Data Stack
High signal Matched: release, launch, cloud
Replicate · inference-infra · 2023-10-06
Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Hugging Face · open-source · 2023-09-13
No feed summary available yet.
High signal Matched: generation, introducing
Hugging Face · open-source · 2023-08-22
No feed summary available yet.
High signal Matched: introducing, model
Hugging Face · open-source · 2023-08-22
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-08-01
No feed summary available yet.
High signal Matched: weights
Replicate · inference-infra · 2023-07-27
Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open source
Hugging Face · open-source · 2023-07-24
No feed summary available yet.
High signal Matched: introducing, agents
Replicate · inference-infra · 2023-07-19
A roundup of recent developments from the llamaverse following the second major release of Meta's open-source large language model.
High signal Matched: release, model, open-source
Hugging Face · open-source · 2023-05-31
No feed summary available yet.
High signal Matched: inference, introducing, sagemaker
Hugging Face · open-source · 2023-05-31
No feed summary available yet.
High signal Matched: introducing
Replicate · inference-infra · 2023-05-26
Prompt engineering and training are often the first solutions we reach for to improve language model behavior, but they're not the only way.
High signal Matched: model, training
Hugging Face · open-source · 2023-05-24
No feed summary available yet.
High signal Matched: launch, model
Hugging Face · open-source · 2023-05-15
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-04-27
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2023-04-24
No feed summary available yet.
High signal Matched: introducing
Replicate · inference-infra · 2023-04-21
A roundup of recent developments from the world of open-source language models.
High signal Matched: model, open-source
Replicate · inference-infra · 2023-03-23
No feed summary available yet.
High signal Matched: model, lora
Hugging Face · open-source · 2023-02-07
No feed summary available yet.
High signal Matched: introducing, agents
Replicate · inference-infra · 2023-02-07
It's like DreamBooth, but much faster. And you can run it in the cloud on Replicate.
High signal Matched: introducing, cloud, lora
Hugging Face · open-source · 2022-12-20
No feed summary available yet.
High signal Matched: model
Replicate · inference-infra · 2022-11-21
With just a handful of images and a single API call, you can train a model, publish it to Replicate, and run predictions on it in the cloud.
High signal Matched: model, cloud, api
SkyPilot · open-source · 2022-11-16
Introducing SkyPilot.
High signal Matched: cost, introducing, cloud
Hugging Face · open-source · 2022-11-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-10-24
No feed summary available yet.
High signal Matched: model, evaluate, evaluating
Hugging Face · open-source · 2022-10-07
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-09-07
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-08-12
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-08-03
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-07-28
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-07-16
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-07-12
No feed summary available yet.
High signal Matched: introducing, model
Replicate · inference-infra · 2022-07-05
Inspired by model cards, we've created templates for documenting models on Replicate.
High signal Matched: model
Hugging Face · open-source · 2022-06-28
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-06-07
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-05-26
No feed summary available yet.
High signal Matched: launch
Hugging Face · open-source · 2022-05-25
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-05-02
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-04-25
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-04-12
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-03-28
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-03-17
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-03-02
No feed summary available yet.
High signal Matched: model, state of the art
Hugging Face · open-source · 2021-12-15
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2021-12-02
No feed summary available yet.
High signal Matched: introducing, agents
Hugging Face · open-source · 2021-11-29
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2021-11-04
No feed summary available yet.
High signal Matched: inference, model
Hugging Face · open-source · 2021-10-25
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2021-09-14
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2021-04-16
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2020-11-09
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2020-02-14
No feed summary available yet.
High signal Matched: model
Microsoft Research · big-tech · 2026-05-09
Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern...
Watchlist Matched: release, research
AI2 · research · 2026-05-05
MolmoAct 2 is a fully open robotics foundation model that brings faster, stronger 3D action reasoning to real-world robot tasks, alongside a major new bimanual manipulation dataset for researchers to study, reproduce, and build on.
Watchlist Matched: model
Lambda · cloud · 2026-05-04
Consider two teams provisioning 8,192 GPUs for a large training run. Same model, same dataset, same budget. Team A lands on a facility purpose-built for AI with sufficient power density, carefully engineered liquid cooling, a high-performa...
Watchlist Matched: performance, model, training
AI2 · research · 2026-04-30
AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.
Watchlist Matched: model, frontier-model
AI2 · research · 2026-04-20
BAR is a recipe for post-training language models one capability at a time—train domain experts independently, merge them into a single mixture-of-experts model, and upgrade any expert without impacting the others.
Watchlist Matched: model, training, post-training
Replicate · inference-infra · 2026-04-15
If you have never tried a video model before, now is the time.
Watchlist Matched: model
AI2 · research · 2026-03-24
Introducing MolmoWeb, an open visual web agent that navigates and completes tasks in a browser using screenshots alone, along with MolmoWebMix, the largest public dataset for training web agents.
Watchlist Matched: introducing, training, agent, agents
AI2 · research · 2026-03-11
MolmoBot is an open robotic manipulation model suite trained entirely in simulation—demonstrating zero-shot transfer to real-world robots without any real-world data collection or fine-tuning.
Watchlist Matched: model, training, fine-tuning
AI2 · research · 2026-03-11
Introducing MolmoBot and MolmoSpaces, an open foundation for training real-world robots to advance science.
Watchlist Matched: introducing, training
AI2 · research · 2026-02-13
Olmix is a framework for language model data mixing that provides empirically grounded defaults and efficient reuse techniques.
Watchlist Matched: model
Replicate · inference-infra · 2025-11-26
Isaac 0.1 is a lightweight, grounded vision-language model built for real-world perception.
Watchlist Matched: model
BAIR · research · 2025-11-01
In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalabilit...
Watchlist Matched: benchmark, performance, model, paper, training
LY Corporation Tech Blog · korea · 2025-10-20
At LY Corporation we're constantly working to improve our pre-release test process and reduce the ri...
Watchlist Matched: release
Replicate · inference-infra · 2025-07-31
Wan 2.2 is our fastest, cheapest video model.
Watchlist Matched: model, open source
Replicate · inference-infra · 2025-06-06
We're sharing our experiments and tips on Google's new Veo 3 model.
Watchlist Matched: model
Replicate · inference-infra · 2025-05-29
This is how to get the most from Black Forest Labs' new image editing model.
Watchlist Matched: model
BAIR · research · 2025-03-25
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...
Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents
Replicate · inference-infra · 2025-03-05
We've been playing with Alibaba's WAN2.1 text-to-video model lately. What happens when you tweak those mysterious parameters? Let's find out.
Watchlist Matched: model
Replicate · inference-infra · 2024-10-22
Stability AI's latest text-to-image model is now available on Replicate and you can run it with an API.
Watchlist Matched: model, api
Replicate · inference-infra · 2024-08-30
Create your own fine-tuned Flux model to generate new images of yourself.
Watchlist Matched: model
Replicate · inference-infra · 2024-08-02
Open source frontier image model, cut objects from videos, new Python web framework from Jeremy Howard
Watchlist Matched: model, open source
Replicate · inference-infra · 2024-08-01
FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.
Watchlist Matched: model, api, open-source
Replicate · inference-infra · 2024-07-26
A top-tier open-ish language model, new safety classifiers, model search API
Watchlist Matched: model, api
Replicate · inference-infra · 2024-06-28
Google's Gemma2 models, language model leaderboard, tips for Stable Diffusion 3
Watchlist Matched: model, leaderboard
Replicate · inference-infra · 2024-06-21
Really good coding model, AI search breakthroughs, Discord support bot
Watchlist Matched: model
SqueezeBits · korea · 2024-05-27
SqueezeBits' IT exhibition recap: from AI model compression demos to hands-on OwLite experiences, booth visitor reactions, and more. Read our on-the-ground event story!
Watchlist Matched: model
Replicate · inference-infra · 2023-11-08
We’ve added chord conditioning to Meta’s MusicGen model, so you can create automatic backing tracks in any style using text prompts and chord progressions.
Watchlist Matched: model
Replicate · inference-infra · 2023-08-22
With the recent release of Stable Diffusion XL fine-tuning on Replicate, and today being the 1-year anniversary of Stable Diffusion, now feels like the perfect opportunity to take a step back and reflect on how text-to-image AI has improve...
Watchlist Matched: release, fine-tuning
Replicate · inference-infra · 2022-08-25
A tutorial for building a chat bot that replies to prompts with the output of a text-to-image model.
Watchlist Matched: model
Hugging Face · open-source · 2021-10-26
No feed summary available yet.
Watchlist Matched: launch