Moreh · korea · 2026-06-03
Optimizing Long-Context Prefill on Multiple (Older-Generation) GPU Nodes
No feed summary available yet.
High signal Matched: prefill, generation, gpu, long-context
Searchable long-term record
Every normalized post remains available for filtering and historical lookup.
No matching posts.
Moreh · korea · 2026-06-03
No feed summary available yet.
High signal Matched: prefill, generation, gpu, long-context
Moreh · korea · 2026-06-03
No feed summary available yet.
High signal Matched: performance, mi300x, evaluation
Moreh · korea · 2026-06-03
No feed summary available yet.
High signal Matched: performance, mi300x, evaluation
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: inference, agentic
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: performance, benchmarks
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: benchmark, performance
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: hicache, benchmark
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, cloud, training
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, agent
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: inference, gpu
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gb200, cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, introducing
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: launch, cloud
Moreh · korea · 2026-06-03
No feed summary available yet.
High signal Matched: inference, distributed
Moreh · korea · 2026-06-03
No feed summary available yet.
High signal Matched: inference, mi300x
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: release
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: agentic
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: serving
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: kv cache
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: kv cache
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: serving
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: hicache
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: lmcache
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: serving
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: benchmark
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Mooncake · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: inference, training
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: generation
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: blackwell
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: inference
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: research
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: gpu
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
VESSL AI · korea · 2026-06-03
No feed summary available yet.
High signal Matched: introducing
KubeAI · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: model
KubeAI · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: generation
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: inference
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Gcore · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, serve, performance, model
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: generation, distributed
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, serve
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: research
Runpod · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: multi-node, gpu
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
High signal Matched: decoding
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: research
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model, frontier model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: cost
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: performance
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: performance, cloud, training
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding, training
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: inference, model
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: gpu, cloud
Runpod · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: latency
Runpod · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: gpu
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: blackwell
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: kernel
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud, agent
Nebius · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: inference
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: gb200
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: b200
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: h200
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: h100
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: mi300x
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: model, training
Crusoe · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, kv cache, gpu
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding, model, training
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, performance
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: inference, generation, agentic
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: throughput, furiosa, sdk
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, moe
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: performance, model
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: decoding, speculative decoding, eagle, training
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
High signal Matched: inference, kernel, performance, agentic
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: cost
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model, agentic
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: furiosa, sdk
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: model, training, frontier model
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, model, open model
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: generation
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
High signal Matched: model, evaluation
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: release
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: inference, training
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: research
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: introducing, tool use
Cerebras · hardware · 2026-06-03
No feed summary available yet.
High signal Matched: inference
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: performance, agentic
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: research
Cohere · model-lab · 2026-06-03
No feed summary available yet.
High signal Matched: model
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
High signal Matched: naver
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
High signal Matched: naver
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
High signal Matched: naver
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
High signal Matched: kakao
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
High signal Matched: kakao
Upstage · korea · 2026-06-03
No feed summary available yet.
High signal Matched: upstage, agent
Upstage · korea · 2026-06-03
No feed summary available yet.
High signal Matched: model
LG AI Research · korea · 2026-06-03
No feed summary available yet.
High signal Matched: research
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
High signal Matched: model
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
High signal Matched: research
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
High signal Matched: cloud
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
High signal Matched: performance
LMCache · open-source · 2026-06-03
TL;DR: A key contributor to the LMCache community just secured a major investment. This will greatly accelerate our mission of building the best KV cache library for every developer. Come join us in building the future AI-native data layer...
High signal Matched: kv cache, lmcache
AWS Machine Learning Blog · cloud · 2026-06-03
Fine-tuning for domain-specific tasks means improving performance in one area without degrading the model’s general capabilities, and getting that balance right is harder than it looks. This post walks through how to navigate that balance,...
High signal Matched: performance, model, training, checkpointing, fine-tuning
AWS Machine Learning Blog · cloud · 2026-06-03
In this post, we'll walk through implementing object detection with Amazon Nova 2 Lite. You'll learn how to deploy an object detection application using Amazon Bedrock, AWS Lambda, and Amazon API Gateway. You'll also learn how to craft eff...
High signal Matched: bedrock, api
Lambda · cloud · 2026-06-03
Lambda workspaces help teams organize cloud resources, control access, and separate dev, staging, and production in shared GPU environments. A junior researcher kills a production training run. A contractor sees weights they shouldn't. If...
High signal Matched: gpu, introducing, weights, cloud, training
AWS Machine Learning Blog · cloud · 2026-06-03
This post walks through how Baz built their Spec Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We'll cover the architecture decisions, implementation details, and the business outcomes they achieved by leveraging these AW...
High signal Matched: bedrock, agent
NVIDIA Technical Blog · hardware · 2026-06-02
AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal...
High signal Matched: research, agent, agents
Together AI · inference-infra · 2026-06-02
How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.
High signal Matched: inference, serving
AWS Machine Learning Blog · cloud · 2026-06-02
Today, we’re excited to announce the ability to reference a secret in AWS Secrets Manager for AgentCore Identity, so you can reference your own preconfigured secret from Secrets Manager and retain full control over how it is managed. With...
High signal Matched: bedrock
AWS Machine Learning Blog · cloud · 2026-06-02
In this post, we walk through how to use Amazon Quick Research to integrate biomedical data sources for rare cancer research. The walkthrough uses pediatric sarcoma as the research domain and draws on publicly available datasets from PubMe...
High signal Matched: research
AWS Machine Learning Blog · cloud · 2026-06-02
GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock. Deploy them in production applications and agents today, on Bedrock’s high performance inference engine.
High signal Matched: inference, performance, bedrock, agents
AWS Machine Learning Blog · cloud · 2026-06-02
While deploying Model Context Protocol (MCP) servers in production, enterprises need fine-grained access control across servers, observability into which teams use which tools, security guarantees against data exfiltration, and centralized...
High signal Matched: model, bedrock, mcp
AWS Machine Learning Blog · cloud · 2026-06-02
In this post, we use a lakehouse data agent to demonstrate how you can use Policy for deterministic access control and Lambda interceptors for dynamic validation. We then show how to combine Lambda interceptors and Policy to implement a ge...
High signal Matched: bedrock, agent, agents
AWS Machine Learning Blog · cloud · 2026-06-02
In this post, we address several key risks that surface when designing an agentic payment system, and how to address them with the capabilities of AgentCore payments.
High signal Matched: bedrock, agentic
AWS Machine Learning Blog · cloud · 2026-06-02
When you build agentic AI solutions, you face unique operational challenges. Agents make unpredictable decisions, costs spiral unexpectedly, and debugging non-deterministic failures seems impossible. Agentic AI applications don't just exec...
High signal Matched: bedrock, agents, agentic
AWS Machine Learning Blog · cloud · 2026-06-02
If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for in...
High signal Matched: inference, gpu, model
Hugging Face · open-source · 2026-06-02
No feed summary available yet.
High signal Matched: introducing, model
vLLM Project · open-source · 2026-06-02
Long-horizon LLM agents create a routing problem that single-turn prompt routers were not designed to solve. A router still needs to know which model is best for the current request, but it also...
High signal Matched: router, model, agents, agentic
vLLM Project · open-source · 2026-06-02
We are excited to announce that AutoRound — Intel's state-of-the-art post-training quantization (PTQ) algorithm — is now fully integrated into vLLM-Omni, enabling a streamlined quantize-once,...
High signal Matched: inference, training, post-training, quantization
PyTorch Foundation · open-source · 2026-06-01
TL;DR: This case study demonstrates how LinkedIn re-architected its distributed linear programming solver, DuaLip, by developing a GPU-accelerated PyTorch version to handle extreme-scale optimization challenges like web applications. This...
High signal Matched: distributed, gpu
NVIDIA Technical Blog · hardware · 2026-06-01
The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent...
High signal Matched: multi-node, agents
Lambda · cloud · 2026-06-01
When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...
High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic
Hugging Face · open-source · 2026-06-01
No feed summary available yet.
High signal Matched: model
NVIDIA Technical Blog · hardware · 2026-06-01
Each wave of AI has created a new scaling law. Pretraining scaled intelligence through larger datasets, more parameters, and massively parallel GPU systems....
High signal Matched: gpu, pretraining, agentic
AMD ROCm Blogs · hardware · 2026-06-01
Reinforcement learning (RL) is rapidly becoming a foundational technology for Large Language Models (LLMs)—powering key abilities such as reasoning and agentic behaviors. As RL workloads grow more complex and computationally intensive, the...
High signal Matched: performance, gpu, agentic
AMD ROCm Blogs · hardware · 2026-06-01
This blog, like the previous articles in the profiling guide series (Part 1, Part 2, and Part 3), is designed to help you systematically analyze and improve the performance of your Fortran OpenMP offload applications running on AMD GPUs. T...
High signal Matched: performance
vLLM Project · open-source · 2026-06-01
A technical deep dive on running vLLM on NVIDIA DGX Spark and GB10 systems, covering sm_121 architecture, unified memory behavior, NVFP4 model serving, Nemotron-3-Super configuration, Docker deployment, Prometheus metrics, and local evalua...
High signal Matched: serving, model, evaluation
AWS Machine Learning Blog · cloud · 2026-05-30
This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with inference components.
High signal Matched: inference, gpu, sagemaker
NVIDIA Technical Blog · hardware · 2026-05-29
Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker...
High signal Matched: serving, prefill, model
NVIDIA Technical Blog · hardware · 2026-05-29
As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams...
High signal Matched: model
Nota AI · korea · 2026-05-29
Jaehoon Lee Technical Content Manager, Nota AI When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...
High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard
Together AI · inference-infra · 2026-05-29
Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.
High signal Matched: inference, gpu
AWS Machine Learning Blog · cloud · 2026-05-29
Azercell Telecom LLC, Azerbaijan's leading telecommunications provider, wanted to build an Azerbaijani large language model (LLM) on Amazon SageMaker AI for telecom use cases and a customer-facing chatbot. The challenge: adapting foundatio...
High signal Matched: model, sagemaker, training
Google Research · big-tech · 2026-05-29
General Science
High signal Matched: research
AWS Machine Learning Blog · cloud · 2026-05-29
In this post, you learn how to build a custom portal with embedded SageMaker AI MLflow Apps UI. You walk through the architecture pattern behind a React front end paired with a Flask reverse proxy that handles AWS Signature Version 4 (SigV...
High signal Matched: cloud, sagemaker
AWS Machine Learning Blog · cloud · 2026-05-29
In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation...
High signal Matched: cloud, sagemaker, api, sdk
AWS Machine Learning Blog · cloud · 2026-05-29
This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep...
High signal Matched: evaluation, bedrock, evals, evaluating, agent, agents
AWS Machine Learning Blog · cloud · 2026-05-29
Datasets in AgentCore is in public preview. Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchm...
High signal Matched: benchmark, evaluation, bedrock, agent
AWS Machine Learning Blog · cloud · 2026-05-29
This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.
High signal Matched: inference, model, bedrock, agentic
NVIDIA Technical Blog · hardware · 2026-05-29
AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and...
High signal Matched: generation
AMD ROCm Blogs · hardware · 2026-05-29
Speculative speculative decoding (SSD) [1] is a recently proposed speculative decoding (SD) algorithm that further accelerates large language model (LLM) inference beyond conventional SD. In standard SD, a small draft model proposes severa...
High signal Matched: inference, decoding, speculative decoding, draft model, verification, cost, mi300x, model
AMD ROCm Blogs · hardware · 2026-05-29
Quantum computing offers a fundamentally different approach to computational problems by leveraging quantum mechanical properties such as superposition and entanglement. Unlike a classical bit, which is always 0 or 1, a qubit can exist in...
High signal Matched: benchmark, cost, gpu
PyTorch Foundation · open-source · 2026-05-28
When you use PyTorch’s compiler, your model runs faster, up to 10x faster. But what’s actually happening? Without compilation, the GPU runs a kernel, a function on the GPU, for...
High signal Matched: kernel, gpu, model
PyTorch Foundation · open-source · 2026-05-28
TL;DR: The TokenSpeed inference engine achieved a record-breaking 580 tps running the Qwen3.5-397B-A17B model on GPUs. This extreme performance for agentic workloads is driven by systematic elimination of memory copies,...
High signal Matched: inference, performance, gpu, model, agentic
vLLM Project · open-source · 2026-05-28
The v0.5.0 release brings significant architectural improvements to speculative decoding model training, introducing DFlash algorithm support, fully unified online training capabilities, and a...
High signal Matched: decoding, speculative decoding, release, introducing, model, training
vLLM Project · open-source · 2026-05-28
Most routing systems start with a prompt and choose a model endpoint. vLLM Semantic Router (VSR) makes a different bet: before a request reaches the serving model, the system should extract...
High signal Matched: serving, endpoint, router, model
vLLM Project · open-source · 2026-05-28
As organizations increasingly adopt AI-powered development tools, the need for high-performance agentic models that deliver both accuracy and operational efficiency has become critical. Laguna...
High signal Matched: inference, performance, agentic
vLLM Project · open-source · 2026-05-28
As post-training workloads continue to scale, we've seen widespread adoption of vLLM as the inference engine of choice. However, two issues repeatedly arise:
High signal Matched: inference, training, post-training
NVIDIA Technical Blog · hardware · 2026-05-27
The cold-start problem In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However,...
High signal Matched: inference
NVIDIA Technical Blog · hardware · 2026-05-27
Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...
High signal Matched: inference, blackwell
NVIDIA Technical Blog · hardware · 2026-05-27
NVIDIA RTX provides game developers with direct paths to AI-driven characters, frame generation, and ray-traced rendering. This post walks through a meaningful...
High signal Matched: generation
PyTorch Foundation · open-source · 2026-05-27
The PyTorch Foundation, a community-driven hub for open source AI under the Linux Foundation, is announcing today that Alibaba Cloud has joined as a Platinum member. Alibaba Cloud is a...
High signal Matched: cloud, open source
LMCache · open-source · 2026-05-27
A collaboration story about LMCache multiprocess mode + MooncakeStore — From 0 to 1, from functional to optimized. 1. Before We Begin Recently, the LMCache community and the Mooncake community carried out a series of valuable open-source c...
High signal Matched: lmcache, adapter, open-source, open source
AMD ROCm Blogs · hardware · 2026-05-27
Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves th...
High signal Matched: gemm, performance, fp8
Modal · inference-infra · 2026-05-27
Introducing Role-Based Access Control for humans and agents, now available for all users on Teams and Enterprise plans.
High signal Matched: introducing, agents
PyTorch Foundation · open-source · 2026-05-26
Code available at: https://github.com/facebookresearch/ads_model_kernel_library In this post, we present the design of TLX Block Attention — a Triton kernel targeting NVIDIA Blackwell GPUs that exploits compile-time knowledge of a block-di...
High signal Matched: kernel, triton, blackwell, model
NVIDIA Technical Blog · hardware · 2026-05-26
NVIDIA CompileIQ tackles one of the hardest problems in performance engineering: finding the compiler options that unlock the best performance for a specific...
High signal Matched: kernel, performance
NVIDIA Technical Blog · hardware · 2026-05-26
Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based...
High signal Matched: cuda, performance, gpu
NVIDIA Technical Blog · hardware · 2026-05-26
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
High signal Matched: cuda, performance, gpu, launch
NVIDIA Technical Blog · hardware · 2026-05-26
Precision medicine depends on two fundamental capabilities: understanding disease at the genomic level and identifying treatments at the molecular level. ...
High signal Matched: blackwell
vLLM Project · open-source · 2026-05-26
The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding algorithms across both research and...
High signal Matched: decoding, speculative decoding, eagle, research
AMD ROCm Blogs · hardware · 2026-05-25
Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of mo...
High signal Matched: inference, multi-gpu, gpu, model, checkpoint, cloud, quantization, evaluate
Lambda · cloud · 2026-05-22
After 15 months of incremental updates, leaks, and rumored leaks, DeepSeek released version 4. It arrived without the fanfare R1 and R1-preview commanded in early 2025. That quiet reception is the most interesting thing about the release....
High signal Matched: inference, serving, performance, cost, release, model, open-source
SkyPilot · open-source · 2026-05-22
Online reinforcement learning for LLMs breaks Slurm's batch scheduling model. We'll discuss why, and what can be done about it.
High signal Matched: model
AMD ROCm Blogs · hardware · 2026-05-22
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...
High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source
AMD ROCm Blogs · hardware · 2026-05-22
On a single MI355, our most-optimized FP16 GEMM kernel runs at 99% MFMA efficiency — the matrix engine sits idle for a handful of cycles per loop. Getting there took ten versions, a regression along the way, and a profiler open for the who...
High signal Matched: kernel, gemm, performance
Lambda · cloud · 2026-05-21
The unit of AI compute has shifted from single hosts to rack-scale systems that integrate NVIDIA GPUs, CPUs, scale-up networking fabrics, and liquid cooling, such as the NVIDIA GB300 NVL72 and NVIDIA Vera Rubin NVL72. Teams at the frontier...
High signal Matched: serving, performance, cloud, training, api
NVIDIA Technical Blog · hardware · 2026-05-21
Maximizing the value of AI infrastructure demands deep visibility into GPU utilization. Yet many platform teams running AI workloads on Kubernetes operate with...
High signal Matched: gpu
NVIDIA Technical Blog · hardware · 2026-05-21
As AI models grow in scale and complexity, realizing the full performance of modern accelerated infrastructure depends as much on how workloads are placed as on...
High signal Matched: performance, gb200
NVIDIA Technical Blog · hardware · 2026-05-21
Telcos around the world are building sovereign AI factories based on the NVIDIA Cloud Partner (NCP) reference architecture, giving governments, enterprises, and...
High signal Matched: cloud
Modular · inference-infra · 2026-05-21
Why LLM Inference Needs a New Kind of Router - Part 2
High signal Matched: inference, router
LMCache · open-source · 2026-05-21
A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the...
High signal Matched: serving, lmcache, api
Modal · inference-infra · 2026-05-21
We've raised $355M at a $4.65B valuation to continue building the production cloud for AI.
High signal Matched: cloud
NVIDIA Technical Blog · hardware · 2026-05-20
Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to...
High signal Matched: research, agent, agents
Lambda · cloud · 2026-05-20
What the numbers mean for financial services Executive summary Lambda is the first to publish an audited STAC-AI™ LANG6 result on NVIDIA HGX B200, with independently verified performance data that Financial Services Industry (FSI) infrastr...
High signal Matched: inference, generation, performance, gpu, h200, b200, model, evaluating
Google Research · big-tech · 2026-05-20
General Science
High signal Matched: research
AMD ROCm Blogs · hardware · 2026-05-20
AMD released ROCm Core 7.13, the AMD GPU Driver 31.30, and AMD GPU Virtualization 9.0. With these releases, ROCm software expands hardware support across enterprise datacenters. The platform introduces AMD’s latest Instinct accelerators, e...
High signal Matched: performance, gpu, rocm, open-source
AMD ROCm Blogs · hardware · 2026-05-20
Large Language Models (LLMs) typically contain billions — or even tens of billions — of parameters. During inference, tensor parallelism is commonly employed to distribute the workload across multiple GPUs. This approach demands frequent,...
High signal Matched: inference, latency, introducing, quantization
NVIDIA Technical Blog · hardware · 2026-05-19
Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...
High signal Matched: model, agent, agents, mcp
NVIDIA Technical Blog · hardware · 2026-05-19
Evaluating an AI model and evaluating an AI agent are related—but they answer fundamentally different questions. A model benchmark tests the capability of a...
High signal Matched: benchmark, model, evaluation, evaluating, agent, agentic
Together AI · inference-infra · 2026-05-19
Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.
High signal Matched: inference, ttft, cost, benchmarks, agents
Hugging Face · open-source · 2026-05-19
No feed summary available yet.
High signal Matched: introducing
PyTorch Foundation · open-source · 2026-05-19
TLDR: PyTorch 2.11 makes it possible to install CUDA-enabled PyTorch wheels on aarch64 Linux directly from PyPI, eliminating the need for custom package indexes and workarounds that previously complicated deployment...
High signal Matched: cuda
PyTorch Foundation · open-source · 2026-05-19
TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for PyTorch models on Apple Silicon Macs, using Apple’s MLX framework. The delegate seamlessly integrates with...
High signal Matched: inference, gpu, introducing
Modal · inference-infra · 2026-05-19
No feed summary available yet.
High signal Matched: introducing, agents
Modular · inference-infra · 2026-05-18
Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations
High signal Matched: inference
vLLM Project · open-source · 2026-05-18
TL;DR: In collaboration with Novita AI, PegaFlow integrates with vLLM as an external KV cache service for LLM inference, implemented as a standalone Rust process and connected through the external...
High signal Matched: inference, kv cache
Microsoft Research · big-tech · 2026-05-16
Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points ab...
High signal Matched: paper, research, evaluation
Together AI · inference-infra · 2026-05-15
Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.
High signal Matched: inference, endpoint, cost, launch, research
NVIDIA Technical Blog · hardware · 2026-05-14
Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,...
High signal Matched: inference, introducing, agentic
PyTorch Foundation · open-source · 2026-05-14
We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due...
High signal Matched: cuda, release
Microsoft Research · big-tech · 2026-05-14
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projec...
High signal Matched: performance, research, open-source
Microsoft Research · big-tech · 2026-05-14
Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and...
High signal Matched: cost, introducing, model, research
vLLM Project · open-source · 2026-05-14
Expert parallelism (EP) is a key technique for serving Mixture-of-Experts (MoE) models at high throughput. WideEP deployments (where EP spans many workers) maximize KV cache capacity, enabling...
High signal Matched: serving, throughput, kv cache, moe
vLLM Project · open-source · 2026-05-14
We are excited to announce the pre-release of VeRL-Omni, a general reinforcement learning (RL) post-training framework focused on multimodal generative models, built on top of verl and vllm-omni.
High signal Matched: release, training, post-training
LMCache · open-source · 2026-05-13
A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation trac...
High signal Matched: lmcache, moe, mi300x, rocm, fp8, agentic
AI2 · research · 2026-05-13
AIMIP is a new open benchmark and dataset for evaluating AI climate models, showing they can match or beat conventional models on some historical climate metrics while still struggling to generalize reliably to long-term warming trends and...
High signal Matched: benchmark, introducing, model, evaluating
Microsoft Research · big-tech · 2026-05-12
MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials w...
High signal Matched: model, research
Cloudflare Blog · cloud · 2026-05-12
We investigated a bug where CUBIC's congestion window became pinned at its minimum floor, causing a performance to plummet. The fix involved correctly measuring idle periods to distinguish RTT wait times from actual application idleness.
High signal Matched: kernel, performance
NVIDIA Technical Blog · hardware · 2026-05-12
The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...
High signal Matched: serving, model, fine-tuning
Modular · inference-infra · 2026-05-12
Inkwell: Why Your Inference Platform Matters As Much As Your Model
High signal Matched: inference, model
Together AI · inference-infra · 2026-05-12
Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.
High signal Matched: introducing
Hugging Face · open-source · 2026-05-12
No feed summary available yet.
High signal Matched: inference, model, training
NVIDIA Technical Blog · hardware · 2026-05-11
The compute capability of large GPU fleets presents unprecedented opportunities to innovate and provide value to customers in record time. Yet these...
High signal Matched: gpu, introducing
Nota AI · korea · 2026-05-11
Jaehoon Lee Technical Content Manager, Nota AI NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...
High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api
Together AI · inference-infra · 2026-05-11
DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...
High signal Matched: inference, serving, endpoint, kernel, b200, long-context
vLLM Project · open-source · 2026-05-11
TurboQuant, a method for KV-cache quantization, recently gained significant traction in the community due to the large advertised savings in GPU memory from very low bit-width quantization of a...
High signal Matched: performance, gpu, quantization
BAIR · research · 2026-05-08
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...
High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model
NVIDIA Technical Blog · hardware · 2026-05-08
Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is...
High signal Matched: decoding, generation, model, agents
Together AI · inference-infra · 2026-05-08
Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.
High signal Matched: inference, gpu, release, model
Modular · inference-infra · 2026-05-08
Why LLM Inference Needs a New Kind of Router - Part 1
High signal Matched: inference, router
AI2 · research · 2026-05-08
EMO is a new mixture-of-experts model trained so modular expert groups emerge from data, enabling users to select small task-specific expert subsets while preserving near full-model performance.
High signal Matched: mixture of experts, performance, model, pretraining
NVIDIA Technical Blog · hardware · 2026-05-07
NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables...
High signal Matched: gpu, gb200
NVIDIA Technical Blog · hardware · 2026-05-07
Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices such as NVIDIA GeForce RTX GPUs. By...
High signal Matched: inference, performance, model, training, post-training, quantization
NVIDIA Technical Blog · hardware · 2026-05-07
Distributed deep learning depends on fast, reliable GPU-to-GPU communication using the NVIDIA Collective Communication Library (NCCL). When training slows down,...
High signal Matched: distributed, nccl, performance, gpu, training
vLLM Project · open-source · 2026-05-06
TL;DR: Agentic workloads generate massive shared prefixes that are often recomputed across turns. By integrating Mooncake's distributed KV cache store into vLLM, we achieve 3.8x higher throughput,...
High signal Matched: serving, throughput, distributed, kv cache, agentic
NVIDIA Technical Blog · hardware · 2026-05-05
The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and...
High signal Matched: cloud, agents, agentic
LMCache · open-source · 2026-05-05
DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention...
High signal Matched: kv cache, lmcache, model
Together AI · inference-infra · 2026-05-04
As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.
High signal Matched: inference, research
Modal · inference-infra · 2026-05-04
If we've said it once, we've said it once per millisecond: never block the GPU.
High signal Matched: inference, performance, gpu
Cloudflare Blog · cloud · 2026-05-01
Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.
High signal Matched: serve, cost, introducing
SkyPilot · open-source · 2026-05-01
We ran hundreds of benchmarks to tune storage systems for distributed training so you don’t have to.
High signal Matched: distributed, training, distributed training, benchmarks
NVIDIA Technical Blog · hardware · 2026-04-30
Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches...
High signal Matched: inference, performance
NVIDIA Technical Blog · hardware · 2026-04-30
Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation...
High signal Matched: generation
NVIDIA Technical Blog · hardware · 2026-04-30
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...
High signal Matched: kernel, cuda, gpu, model, agents
Google Research · big-tech · 2026-04-30
Data Mining & Modeling
High signal Matched: research
Nota AI · korea · 2026-04-29
Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...
High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic
Together AI · inference-infra · 2026-04-29
DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.
High signal Matched: research, long-context, agents
Hugging Face · open-source · 2026-04-29
No feed summary available yet.
High signal Matched: inference
LMCache · open-source · 2026-04-29
For years, we have referred to one of the most critical components of modern LLM inference as a “KV cache.” That name made sense once. Today, it is increasingly misleading. What began as a small, ephemeral optimization inside a...
High signal Matched: inference, kv cache, lmcache
Hugging Face · open-source · 2026-04-29
No feed summary available yet.
High signal Matched: introducing, long-context, agents
Modal · inference-infra · 2026-04-29
Learn how AE Studio used evolutionary algorithms on Modal to efficiently improve Lean proof generation.
High signal Matched: generation
NVIDIA Technical Blog · hardware · 2026-04-28
For decades, computational biology has operated under a reductionist compromise. To fit complex biological systems into the limited memory of a single GPU,...
High signal Matched: gpu
NVIDIA Technical Blog · hardware · 2026-04-28
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...
High signal Matched: model, open model, agent, agentic
Together AI · inference-infra · 2026-04-28
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
High signal Matched: model, open model, agentic
vLLM Project · open-source · 2026-04-28
We are excited to support the newly released NVIDIA Nemotron 3 Nano Omni model on vLLM.
High signal Matched: model, agentic
NVIDIA Technical Blog · hardware · 2026-04-24
DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient...
High signal Matched: generation, gpu, blackwell
NVIDIA Technical Blog · hardware · 2026-04-24
Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable....
High signal Matched: research
Together AI · inference-infra · 2026-04-24
Rollout is the silent bottleneck in RL post-training. DAS fixes it with adaptive speculative decoding — up to 50% faster, zero degradation in reward quality.
High signal Matched: decoding, speculative decoding, training, post-training
Sakana AI · model-lab · 2026-04-24
No feed summary available yet.
High signal Matched: model, agent
LMCache · open-source · 2026-04-23
Overview Large language model (LLM) inference performance depends heavily on how efficiently the system manages key-value (KV) cache — the stored attention states that allow the model to avoid recomputing previous tokens. As context length...
High signal Matched: inference, kv cache, lmcache, performance, latency, gpu, model, sagemaker
AI2 · research · 2026-04-23
OlmoEarth Studio now lets users export custom Earth-observation embeddings from our OlmoEarth foundation models and use them for tasks like similarity search, few-shot mapping, change detection, and unsupervised exploration.
High signal Matched: introducing
NVIDIA Technical Blog · hardware · 2026-04-22
AI integration is redefining mainstream enterprise applications, from productivity software like Microsoft Office to more complex design and engineering tools....
High signal Matched: blackwell
Nota AI · korea · 2026-04-22
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
vLLM Project · open-source · 2026-04-22
Long-context LLM serving is increasingly memory-bound: for standard full-attention decoders, the KV cache often dominates GPU memory at 128k+ contexts, and each decode step must read a large...
High signal Matched: serving, kv cache, gpu, fp8, quantization, long-context
SkyPilot · open-source · 2026-04-22
Introducing GPU Compass: One dashboard to browse, compare pricing, and launch across every GPU cloud.
High signal Matched: gpu, introducing, launch, cloud
llm-d · open-source · 2026-04-21
How migrating from a simple vLLM deployment to a robust MLOps platform utilizing KServe, llm-d's intelligent routing, and vLLM solved significant scaling and operational challenges in LLM deployment through deep customization and prefix-ca...
High signal Matched: inference, gpu
Together AI · inference-infra · 2026-04-21
Learn how AI-native companies design multi-tenant GPU clusters that pool capacity without sacrificing team isolation — and how Together AI makes it work in practice.
High signal Matched: gpu
vLLM Project · open-source · 2026-04-21
Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way to combine the linear-time...
High signal Matched: serving
NVIDIA Technical Blog · hardware · 2026-04-20
As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy...
High signal Matched: generation, throughput, fp8, training
BAIR · research · 2026-04-20
.grasp-results-table table { font-size: 0.875rem; line-height: 1.35; width: 100%; } .grasp-results-table th, .grasp-results-table td { padding: 0.35rem 0.5rem; } /* Consistent whitespace between major sections (this post is long and hr-hea...
High signal Matched: performance, model, paper, arxiv, evaluation, training
NVIDIA Technical Blog · hardware · 2026-04-20
AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating...
High signal Matched: serve, agents, agentic
LMCache · open-source · 2026-04-18
GTC wrapped up a month ago. Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial...
High signal Matched: kv cache, lmcache, open-source
NVIDIA Technical Blog · hardware · 2026-04-17
Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents....
High signal Matched: inference, agents, agentic
LMCache · open-source · 2026-04-16
TL;DR: TurboQuant allows you to put 4x more context in your GPU without blowing up GPU memory or dropping AI’s intelligence. It does so by quantizing the memory of large language models, also known as KV cache, an important bottleneck ment...
High signal Matched: inference, kv cache, lmcache, gpu
Together AI · inference-infra · 2026-04-15
Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just...
High signal Matched: performance, model
SqueezeBits · korea · 2026-04-14
Check out highlights from the 2nd vLLM Korea Meetup! open-source use cases and real-world production examples that showcase vLLM's technical maturity!
High signal Matched: korea, open-source
NVIDIA Technical Blog · hardware · 2026-04-14
When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to...
High signal Matched: cuda, performance, gpu
NVIDIA Technical Blog · hardware · 2026-04-14
NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising...
High signal Matched: model
vLLM Project · open-source · 2026-04-14
Hosted by the vLLM KR Community, with support from Rebellions, SqueezeBits, Red Hat APAC, and PyTorch Korea, the vLLM Korea Meetup 2026 was held in Seoul on April 2nd.
High signal Matched: korea, seoul, rebellions
Modal · inference-infra · 2026-04-14
Autoresearch automates AI research. Modal automates AI infrastructure.
High signal Matched: research, agents
Rebellions · hardware · 2026-04-13
vLLM KR 커뮤니티가 주관하고, 리벨리온(Rebellions), SqueezeBits, Red Hat APAC, PyTorch Korea가 함께한 vLLM Korea Meetup 2026이 4월 2일 서울에서 열렸습니다.... The post 2026 vLLM Korea Meetup appeared first on Rebellions.
High signal Matched: korea, rebellions
Modular · inference-infra · 2026-04-13
TileTensor Part 1 - Safer, More Efficient GPU Kernels
High signal Matched: gpu
NVIDIA Technical Blog · hardware · 2026-04-12
The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,...
High signal Matched: release, model, agentic
SkyPilot · open-source · 2026-04-10
With the SkyPilot Agent Skill, your AI coding agent can launch clusters, run training jobs and manage cloud resources across any infrastructure using natural language.
High signal Matched: launch, cloud, training, agent, agents
NVIDIA Technical Blog · hardware · 2026-04-09
Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations...
High signal Matched: gpu, open source
NVIDIA Technical Blog · hardware · 2026-04-09
Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume...
High signal Matched: model, weights, checkpoint, training
Google Research · big-tech · 2026-04-09
Generative AI
High signal Matched: introducing, agents
SkyPilot · open-source · 2026-04-09
Coding agents working from code alone generate shallow hypotheses. Adding a research phase — arxiv papers, competing forks, other backends — produced 5 kernel fusions that made llama.cpp CPU inference 15% faster.
High signal Matched: inference, kernel, arxiv, research, agent, agents
Nota AI · korea · 2026-04-08
Jaehoon Lee Technical Content Manager, Nota AI AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...
High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate
Modal · inference-infra · 2026-04-08
How Physical Intelligence runs remote, real-time, robotic inference on Modal.
High signal Matched: inference
NVIDIA Technical Blog · hardware · 2026-04-07
The NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 systems, featuring NVIDIA Blackwell architecture, are rack-scale supercomputers. They’re designed with 18...
High signal Matched: gb200, blackwell
Together AI · inference-infra · 2026-04-07
AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.
High signal Matched: cloud
vLLM Project · open-source · 2026-04-07
TL;DR: Prefill and decode fight over the same GPUs, causing ITL spikes under load. We show how to disaggregate them on a single 8-GPU MI300X node using AMD's MORI-IO connector — achieving 2.5x...
High signal Matched: inference, prefill, itl, gpu, mi300x
AI2 · research · 2026-04-07
WildDet3D is an open model that predicts 3D bounding boxes from a single image. It generalizes across cameras and object categories, and folds in depth signals when available—alongside a new dataset of verified 3D annotations.
High signal Matched: introducing, model, open model
LMCache · open-source · 2026-04-04
Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...
High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents
Together AI · inference-infra · 2026-04-03
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
High signal Matched: generation, model
Together AI · inference-infra · 2026-04-03
New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.
High signal Matched: research
LY Corporation Tech Blog · korea · 2026-04-02
Hello. I’m Inoue, and I work on private cloud infrastructure at LY Corporation.What powers LY Corpor...
High signal Matched: generation, introducing, cloud
NVIDIA Technical Blog · hardware · 2026-04-02
In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU...
High signal Matched: throughput, gpu, model
NVIDIA Technical Blog · hardware · 2026-04-02
The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from...
High signal Matched: launch
NVIDIA Technical Blog · hardware · 2026-04-02
In algorithmic trading, reducing response times to market events is crucial. To keep pace with high-speed electronic markets, latency-sensitive firms often use...
High signal Matched: inference, latency
Together AI · inference-infra · 2026-04-02
Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.
High signal Matched: inference, model, agents
Modular · inference-infra · 2026-04-02
Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD
High signal Matched: performance, launch
Rebellions · hardware · 2026-04-02
Summary Challenge 석유 및 가스 산업이 발달한 중동 지역에서는 원유 생산 과정에서 불가피하게 발생하는 폐수와 기름을 처리해야 합니다. 특히, 저수지와... The post NPU 서버 기반 피지컬 AI, 아랍에미리트(UAE) 수질 정화 로봇 솔루션 appeared first on Rebellions.
High signal Matched: npu, rebellions
vLLM Project · open-source · 2026-04-02
With the debut of Gemma 4, vLLM introduces immediate support for Google's most sophisticated open model lineup, spanning multiple hardware backends, with first-ever Day 0 support on Google TPUs,...
High signal Matched: model, open model
NVIDIA Technical Blog · hardware · 2026-04-01
Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it's also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1...
High signal Matched: cuda
NVIDIA Technical Blog · hardware · 2026-04-01
Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak...
High signal Matched: throughput, cost
NVIDIA Technical Blog · hardware · 2026-04-01
In today’s AI factory environment, performance is not theoretical. It is economic, competitive, and existential. A 1% drop in usable GPU time can mean...
High signal Matched: performance, gpu
Together AI · inference-infra · 2026-04-01
The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.
High signal Matched: kernel, flashattention, gpu
NVIDIA Technical Blog · hardware · 2026-03-31
Spatial computing is moving from visualization to active collaboration, adding increasingly more GPU demands on XR hardware to render photorealistic,...
High signal Matched: gpu
Nota AI · korea · 2026-03-31
Jaehoon Lee Technical Content Manager, Nota AI In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...
High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model
Together AI · inference-infra · 2026-03-31
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
High signal Matched: decoding, speculative decoding, open-source
Modular · inference-infra · 2026-03-30
Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem
High signal Matched: gpu
Together AI · inference-infra · 2026-03-26
As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and...
High signal Matched: performance, long context
Modal · inference-infra · 2026-03-26
Modal is proud to power real-time inference for Runway Characters.
High signal Matched: inference
NVIDIA Technical Blog · hardware · 2026-03-25
In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition...
High signal Matched: throughput, gpu, model
NVIDIA Technical Blog · hardware · 2026-03-25
Developing new protein-based therapies and catalysts involves the challenging task of designing protein binders, or proteins that bind to a target protein or...
High signal Matched: model
NVIDIA Technical Blog · hardware · 2026-03-25
In the AI era, power is the ultimate constraint, and every AI factory operates within a hard limit. This makes performance per watt—the rate at which power is...
High signal Matched: performance
Modal · inference-infra · 2026-03-25
How Modal helped the ML team at Doppel parallelize experimentation and scale inference.
High signal Matched: inference
vLLM Project · open-source · 2026-03-24
We are excited to announce Model Runner V2 (MRV2), a ground-up re-implementation of the vLLM model runner. MRV2 delivers a cleaner, more modular, and more efficient execution core—with no API...
High signal Matched: model, api
NVIDIA Technical Blog · hardware · 2026-03-23
Industrial and medical systems are rapidly increasing the use of high-performance AI to improve worker productivity, human-machine interaction, and downtime...
High signal Matched: performance
Nota AI · korea · 2026-03-23
Jaehoon Lee Technical Content Manager, Nota AI GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...
High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source
NVIDIA Technical Blog · hardware · 2026-03-23
AI is moving from experimentation to production. However, most data enterprises need exists outside the public cloud. This includes sensitive information like...
High signal Matched: cloud
NVIDIA Technical Blog · hardware · 2026-03-23
As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its limits. Prefill and decode stages...
High signal Matched: inference, serving, prefill, model
Hugging Face · open-source · 2026-03-21
No feed summary available yet.
High signal Matched: model
Nota AI · korea · 2026-03-20
NP Product Team, Nota AI The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...
High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization
Modular · inference-infra · 2026-03-19
Modular 26.2: State-of-the-Art Image Generation and Upgraded AI Coding with Mojo
High signal Matched: generation
SkyPilot · open-source · 2026-03-19
Karpathy's autoresearch runs one experiment at a time. We gave it access to our GPU infra and let it run experiments in parallel.
High signal Matched: gpu, agent
Together AI · inference-infra · 2026-03-18
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
High signal Matched: throughput, cost, model, training, fine-tuning
Google Research · big-tech · 2026-03-18
Health & Bioscience
High signal Matched: research
AI2 · research · 2026-03-18
MolmoPoint is a new vision-language model architecture that replaces text-based coordinate outputs with a more natural, token-based pointing mechanism that directly selects regions from visual features.
High signal Matched: model
Hugging Face · open-source · 2026-03-17
No feed summary available yet.
High signal Matched: throughput, agent, computer use
Together AI · inference-infra · 2026-03-17
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
High signal Matched: inference, open-source
Google Research · big-tech · 2026-03-17
Education Innovation
High signal Matched: research
NVIDIA Technical Blog · hardware · 2026-03-16
Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools....
High signal Matched: inference, multi-node, agentic
NVIDIA Technical Blog · hardware · 2026-03-16
AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward...
High signal Matched: introducing, agentic
NVIDIA Technical Blog · hardware · 2026-03-16
AI is evolving, and reasoning models are increasing token demand, placing new requirements on every layer of AI infrastructure. More than ever, compute must...
High signal Matched: performance
NVIDIA Technical Blog · hardware · 2026-03-16
NVIDIA Groq 3 LPX is a new rack-scale inference accelerator for the NVIDIA Vera Rubin platform, designed for the low-latency and large-context demands of...
High signal Matched: inference, latency, accelerator
Modular · inference-infra · 2026-03-16
Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200
High signal Matched: kernel, b200, blackwell
Together AI · inference-infra · 2026-03-16
Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.
High signal Matched: inference, research, agents
Nota AI · korea · 2026-03-13
Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...
High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context
BAIR · research · 2026-03-13
--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...
High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag
llm-d · open-source · 2026-03-13
A lightweight ML model trained online from live traffic replaces manually tuned heuristic weights with direct latency predictions, achieving 43% improvement in P50 end-to-end latency and 70% improvement in TTFT on a production-realistic wo...
High signal Matched: latency, ttft, model, weights
NVIDIA Technical Blog · hardware · 2026-03-13
The next generation of AI-driven robots like humanoids and autonomous vehicles depends on high-fidelity, physics-aware training data. Without diverse and...
High signal Matched: generation, training
vLLM Project · open-source · 2026-03-13
EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you...
High signal Matched: inference, decoding, speculative decoding, eagle, model
Google Research · big-tech · 2026-03-12
Climate & Sustainability
High signal Matched: introducing
Together AI · inference-infra · 2026-03-12
Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.
High signal Matched: latency, agents
SqueezeBits · korea · 2026-03-11
Explore why Physical AI deployment needs synthetic data at scale with Squeezebits' research and discover how to overcome inference bottlenecks to accelerate Roboost Agent.
High signal Matched: inference, research, agent
Together AI · inference-infra · 2026-03-11
NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.
High signal Matched: inference, context window, agent
vLLM Project · open-source · 2026-03-11
We are excited to support the newly released NVIDIA Nemotron 3 Super model on vLLM.
High signal Matched: model, agent
SkyPilot · open-source · 2026-03-11
SkyPilot Recipes let you store SkyPilot YAMLs in a shared, team-accessible registry. Launch workloads directly from the CLI without local files.
High signal Matched: launch
Together AI · inference-infra · 2026-03-10
Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise...
High signal Matched: gpu
Hugging Face · open-source · 2026-03-10
No feed summary available yet.
High signal Matched: introducing
vLLM Project · open-source · 2026-03-10
Since v0.1 Iris, vLLM Semantic Router has made a large jump. In one release cycle, the project rebuilt its model stack, expanded routing into safety, semantic caching, memory, retrieval, and...
High signal Matched: router, release, model, retrieval
Modular · inference-infra · 2026-03-06
Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem
High signal Matched: research
Together AI · inference-infra · 2026-03-05
As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to soft...
High signal Matched: throughput, kernel, flashattention, gpu
Together AI · inference-infra · 2026-03-05
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
High signal Matched: inference, flashattention, research, cloud
Modular · inference-infra · 2026-03-05
Structured Mojo Kernels Part 1 - Peak Performance, Half the Code
High signal Matched: performance
Hugging Face · open-source · 2026-03-05
No feed summary available yet.
High signal Matched: introducing
AI2 · research · 2026-03-05
Olmo Hybrid is a fully open 7B language model that combines transformer attention with linear RNN layers to achieve greater expressivity and significantly improved data and compute efficiency compared to pure transformer models.
High signal Matched: introducing, model
Together AI · inference-infra · 2026-03-04
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM...
High signal Matched: inference, serving, prefill, throughput, long-context
Hugging Face · open-source · 2026-03-04
No feed summary available yet.
High signal Matched: model, training
vLLM Project · open-source · 2026-03-04
This article is adapted from a Red Hat hosted vLLM Office Hours session with Burkhard Ringlein from IBM Research, featuring a deep technical walkthrough of the vLLM Triton attention backend....
High signal Matched: triton, research
Modal · inference-infra · 2026-03-04
A roundup of everything we shipped in February: Directory Snapshots for Sandboxes, a free GLM-5 endpoint, new billing API, and more.
High signal Matched: endpoint, api
AIBrix · open-source · 2026-03-03
🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...
High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible
Together AI · inference-infra · 2026-03-02
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
High signal Matched: introducing, research, open-source
SkyPilot · open-source · 2026-02-27
OpenClaw gives an AI agent full access to your system. Here's why you should run it on an isolated cloud VM, and how to set that up.
High signal Matched: cloud, agent
vLLM Project · open-source · 2026-02-27
For a long time, enabling AMD support meant "porting"; i.e. just making code run. That era is over.
High signal Matched: inference, performance, rocm
AI2 · research · 2026-02-27
The Asta Interaction Dataset (AID) contains real researcher queries revealing how scientists actually use AI-powered research tools, and where their habits diverge from what tool builders expect.
High signal Matched: research
Nota AI · korea · 2026-02-26
Jewon Lee | Wooksu Shin | Seungmin Yang | Ki-Ung Song | Donguk Lim | Jaeyeon Kim | Tae-Ho Kim | Bo-Kyeong KimEdgeFM Team, Nota AI ✔️ Resources for more information: GitHub, ArXiv, Project Page, Demo.✔️ Accepted at ICLR 2026. &...
High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks
Hugging Face · open-source · 2026-02-26
No feed summary available yet.
High signal Matched: mixture of experts
vLLM Project · open-source · 2026-02-26
Organizations and individuals running multiple custom AI models, especially recent Mixture of Experts (MoE) model families, can face the challenge of paying for idle GPU capacity when the...
High signal Matched: serve, moe, mixture of experts, gpu, model, sagemaker, bedrock
SqueezeBits · korea · 2026-02-25
Scaling Physical AI requires reliable synthetic data. Learn how RoBoost Agent integrates NVIDIA Cosmos to transform world models into trustworthy data engines for robotics and autonomous driving.
High signal Matched: agent
Modal · inference-infra · 2026-02-25
Learn why researchers at Scaling Intelligence, Hazy Research, and other top labs are choosing Modal.
High signal Matched: research
Modal · inference-infra · 2026-02-24
Introducing Directory Snapshots, a programatic way to snapshot a specific directory within a running Sandbox and mount it into another Sandbox later, independently of the base image and the rest of the filesystem.
High signal Matched: introducing
Together AI · inference-infra · 2026-02-23
State-of-the-art speech models like Whisper and Deepgram score near-human on benchmarks — then fail 39% of the time on street names. New research from Together AI exposes the gap and a fix.
High signal Matched: research, benchmarks
SkyPilot · open-source · 2026-02-21
SkyPilot Admin Policies let you enforce cost controls, security rules, and compliance requirements automatically — without slowing down your engineering team.
High signal Matched: cost, gpu
Together AI · inference-infra · 2026-02-19
Standard diffusion language models can't use KV caching and need too many refinement steps to be practical. CDLM fixes both with a post-training recipe that enables exact block-wise KV caching and trajectory-consistent step reduction — del...
High signal Matched: inference, latency, training, post-training
Replicate · inference-infra · 2026-02-18
Recraft V4 generates art-directed images — and actual editable SVGs — with strong composition, accurate text rendering, and what the Recraft team calls "design taste." Four models are available on Replicate now.
High signal Matched: generation
vLLM Project · open-source · 2026-02-13
DeepSeek-V3.2 (NVFP4 + TP2)has been successfully and smoothly run on GB300 (SM103 - Blackwell Ultra). Leveraging FP4 quantization, it achieves a single-GPU throughput of 7360 TGS (tokens / GPU /...
High signal Matched: throughput, deepseek-v3, performance, gpu, blackwell, quantization
Together AI · inference-infra · 2026-02-12
Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
High signal Matched: inference, introducing
Google Research · big-tech · 2026-02-11
Algorithms & Theory
High signal Matched: throughput
llm-d · open-source · 2026-02-10
llm-d's new filesystem backend offloads KV cache to shared storage, enabling cross-replica reuse and up to 16.8x faster TTFT — scaling inference throughput without GPU or CPU memory limits.
High signal Matched: inference, throughput, kv cache, ttft, gpu
Together AI · inference-infra · 2026-02-06
What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...
High signal Matched: research
Hugging Face · open-source · 2026-02-06
No feed summary available yet.
High signal Matched: introducing
llm-d · open-source · 2026-02-04
llm-d v0.5 introduces hierarchical KV-cache offloading, LoRA-aware scheduling, UCCL networking, and scale-to-zero autoscaling for sustained inference performance at scale.
High signal Matched: inference, performance, lora
Hugging Face · open-source · 2026-02-04
No feed summary available yet.
High signal Matched: model
vLLM Project · open-source · 2026-02-03
Building on our previous work achieving 2.2k tok/s/H200 decode throughput with wide-EP, the vLLM team has continued performance optimization efforts targeting NVIDIA's GB200 platform. This blog...
High signal Matched: serving, throughput, performance, h200, gb200, blackwell
Together AI · inference-infra · 2026-02-02
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...
High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss
Together AI · inference-infra · 2026-02-02
Together Evaluations now supports OpenAI, Anthropic, and Google models for cross-provider benchmarking. Compare open-source, fine-tuned, and proprietary models side-by-side to make data-driven decisions on quality, cost, and performance—al...
High signal Matched: performance, cost, open-source, open source
vLLM Project · open-source · 2026-02-01
TL;DR: In collaboration with the open-source community, vLLM + NVIDIA has achieved significant performance milestones on the gpt-oss-120b model running on NVIDIA's Blackwell GPUs. Through deep...
High signal Matched: performance, blackwell, model, open-source, oss
vLLM Project · open-source · 2026-01-31
Large language model inference has traditionally operated on a simple premise: the user submits a complete prompt (request), the model processes it, and returns a response (either streaming or at...
High signal Matched: inference, model, api
Hugging Face · open-source · 2026-01-29
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2026-01-28
No feed summary available yet.
High signal Matched: cuda
Hugging Face · open-source · 2026-01-27
No feed summary available yet.
High signal Matched: evaluation
Together AI · inference-infra · 2026-01-26
Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...
High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source
Google Research · big-tech · 2026-01-24
Algorithms & Theory
High signal Matched: introducing
Together AI · inference-infra · 2026-01-22
Learn how to reduce inference latency without massive cost using proven inference optimization tactics — improving throughput, GPU utilization, and cost efficiency while balancing throughput vs. latency tradeoffs.
High signal Matched: inference, throughput, latency, cost, gpu
Hugging Face · open-source · 2026-01-20
No feed summary available yet.
High signal Matched: introducing
Modular · inference-infra · 2026-01-14
How to Beat Unsloth's CUDA Kernel Using Mojo—With Zero GPU Experience
High signal Matched: kernel, cuda, gpu
Google Research · big-tech · 2026-01-14
Generative AI
High signal Matched: generation
SkyPilot · open-source · 2026-01-13
Run Meta's SAM3 on large video archives distributed across AWS and Kubernetes clusters with SkyPilot Pools.
High signal Matched: distributed
Together AI · inference-infra · 2026-01-13
Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...
High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents
Together AI · inference-infra · 2026-01-12
Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.
High signal Matched: distributed, multi-node, gpu, model, training, distributed training
BAIR · research · 2026-01-10
An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements distinguish objects. Man...
High signal Matched: performance, model, paper, evaluation, training, evaluate
Together AI · inference-infra · 2026-01-08
Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.
High signal Matched: performance, cost, model, open model, evaluating, open-source
vLLM Project · open-source · 2026-01-08
In this post, we will describe the new KV cache offloading feature that was introduced in vLLM 0.11.0. We will focus on offloading to CPU memory (DRAM) and its benefits to improving overall...
High signal Matched: inference, throughput, kv cache
SqueezeBits · korea · 2026-01-07
A recap of the Intel® Gaudi® hands-on workshop co-hosted by SqueezeBits and Lablup. AI model compression, fine-tuning, and vLLM serving on Gaudi® hardware with Backend.AI.
High signal Matched: serving, model, fine-tuning
Hugging Face · open-source · 2026-01-05
No feed summary available yet.
High signal Matched: introducing
vLLM Project · open-source · 2026-01-05
vLLM Semantic Router is the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems. It lives between users and models, capturing signals from...
High signal Matched: router, release
vLLM Project · open-source · 2026-01-02
As a passionate vLLM community member who wants to see vLLM thrive and reach even more developers, I'm excited to announce vLLM Playground – a modern, feature-rich web interface for managing and...
High signal Matched: introducing
Rebellions · hardware · 2025-12-29
Summary Challenge 관세청은 매년 방대한 양의 수출입 신고서를 처리하며, 각 품목에 적합한 HS 코드(Harmonized System Code)를 정확하게 분류해야 하는 업무를... The post LLM/RAG 기반 몽골 관세청 물품 분류 코드 AI 추천 챗봇 appeared first on Rebellions.
High signal Matched: rebellions, rag
SqueezeBits · korea · 2025-12-24
Introducing ATOM™-Max, rebellions’ next-generation NPU designed for high-performance AI inference. Learn how its runtime, profiling tools, and PyTorch-native integrations enable developers to run and serve models efficiently without sacrif...
High signal Matched: inference, generation, serve, performance, npu, introducing, rebellions
Together AI · inference-infra · 2025-12-23
MiniMax Speech 2.6 Turbo: State-of-the-art multilingual TTS with human-level emotional awareness, sub-250ms latency, and 40+ languages—now on Together AI.
High signal Matched: latency
SkyPilot · open-source · 2025-12-19
SkyPilot now includes predefined templates to launch clusters with popular frameworks and patterns. Deploy fully configured environments without writing long YAMLs.
High signal Matched: launch
Nota AI · korea · 2025-12-19
Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...
High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval
Google Research · big-tech · 2025-12-19
Year in Review
High signal Matched: research
vLLM Project · open-source · 2025-12-19
We are thrilled to announce a major performance update for vLLM-Omni.
High signal Matched: performance
Hugging Face · open-source · 2025-12-17
No feed summary available yet.
High signal Matched: evaluation
Together AI · inference-infra · 2025-12-17
Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...
High signal Matched: performance, research
vLLM Project · open-source · 2025-12-17
In v0.11.0, the last code from vLLM V0 engine was removed, marking the complete migration to the improved V1 engine architecture. This achievement would not have been possible without vLLM’s...
High signal Matched: serving, h200
Google Research · big-tech · 2025-12-16
Algorithms & Theory
High signal Matched: paper
vLLM Project · open-source · 2025-12-16
Over the past several months, AMD and the vLLM SR Team have been collaborating to bring vLLM Semantic Router (VSR) to AMD GPUs—not just as a performance optimization, but as a fundamental shift in...
High signal Matched: router, performance
Together AI · inference-infra · 2025-12-15
Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud
High signal Matched: model, cloud, reasoning model
vLLM Project · open-source · 2025-12-15
Modern Large Multimodal Models (LMMs) introduce a unique serving-time bottleneck: before any text generation can begin, all images must be processed by a visual encoder (e.g., ViT). This encoder...
High signal Matched: serving, generation, model
vLLM Project · open-source · 2025-12-15
Jan 28th Update: NVIDIA just released their Nemotron 3 Nano model in NVFP4 precision. This model is supported by vLLM out of the box and it uses a new method called Quantization-Aware Distillation...
High signal Matched: model, quantization, agents
vLLM Project · open-source · 2025-12-13
Efficiently managing request distribution across a fleet of model replicas is a critical requirement for large-scale, production vLLM deployments. Standard load balancers often fall short as they...
High signal Matched: serving, prefill, router, performance, model
vLLM Project · open-source · 2025-12-13
- Speculative decoding serves as an optimization to improve inference performance; however, training a unique draft model for each LLM can be difficult and time-consuming, while production-ready...
High signal Matched: inference, decoding, speculative decoding, draft model, performance, model, training
Hugging Face · open-source · 2025-12-12
No feed summary available yet.
High signal Matched: model
SkyPilot · open-source · 2025-12-11
Announcing SkyPilot 0.11 with Pools for batch inference, faster managed jobs, and enterprise-scale improvements.
High signal Matched: inference, cloud
SqueezeBits · korea · 2025-12-10
Rebellions and SqueezeBits Co-Host a vLLM Hands-on Workshop: Workshop Highlights, PyTorch Best Practices, Performance Optimization, and Developer First-Hand Tips!
High signal Matched: performance, rebellions
vLLM Project · open-source · 2025-12-09
Achieve faster, more efficient LLM serving without sacrificing accuracy!
High signal Matched: serving, quantization
Hugging Face · open-source · 2025-12-05
No feed summary available yet.
High signal Matched: introducing
Google Research · big-tech · 2025-12-04
Machine Intelligence
High signal Matched: benchmark
Together AI · inference-infra · 2025-12-03
AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...
High signal Matched: inference, decoding, speculative decoding, introducing
Together AI · inference-infra · 2025-12-03
Build, train, and deploy advanced AI agents with integrated reinforcement learning on the Together platform.
High signal Matched: cloud, agents
Together AI · inference-infra · 2025-12-03
No feed summary available yet.
High signal Matched: cloud
vLLM Project · open-source · 2025-12-03
Several months ago, we published a blog post about CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond, introducing a powerful technique for debugging illegal memory access...
High signal Matched: cuda, gpu, introducing
SkyPilot · open-source · 2025-12-02
Scale document OCR batch inference for RAG on multiple clouds and Kubernetes clusters using SkyPilot Pool.
High signal Matched: inference, rag
Modal · inference-infra · 2025-12-02
We've partnered with Mistral to bring you Day 0 support for Mistral 3, with GPU-snapshot-optimized performance.
High signal Matched: performance, gpu
llm-d · open-source · 2025-12-02
llm-d v0.4 delivers 50% lower latency for MoE models via speculative decoding, expands TPU and XPU support, and adds prefix cache offloading for faster TTFT.
High signal Matched: decoding, prefix cache, speculative decoding, moe, performance, latency, ttft, tpu, sota
Together AI · inference-infra · 2025-12-01
Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...
High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source
Hugging Face · open-source · 2025-12-01
No feed summary available yet.
High signal Matched: model
vLLM Project · open-source · 2025-11-30
We are excited to announce the official release of vLLM-Omni, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.
High signal Matched: serving, generation, release, model
AIBrix · open-source · 2025-11-26
In recent years, large language models (LLMs) such as GPT, DeepSeek, Doubao and Qwen have advanced rapidly and are reshaping a wide range of industries. As the Scaling Law continues to be validated and pushed to its limits, LLM capabilitie...
High signal Matched: inference, serving, generation, throughput, performance, latency, cost
Together AI · inference-infra · 2025-11-25
Production-grade image generation with multi-reference consistency, exact brand colors, and reliable text rendering. FLUX.2 from Black Forest Labs, now on Together AI's platform.
High signal Matched: generation
Hugging Face · open-source · 2025-11-25
No feed summary available yet.
High signal Matched: research, state of the art
Hugging Face · open-source · 2025-11-25
No feed summary available yet.
High signal Matched: inference
Google Research · big-tech · 2025-11-22
Algorithms & Theory
High signal Matched: model
vLLM Project · open-source · 2025-11-22
Ray now has a new command: ray symmetric-run. This command makes it possible to launch the same entrypoint command on every node in a Ray cluster, simplifying the workflow to spawn vLLM servers...
High signal Matched: serving, multi-node, launch
Rebellions · hardware · 2025-11-20
Summary Challenge 최근 반려동물 양육 인구의 증가로 X-ray 영상 진단 수요가 빠르게 확대되고 있습니다. 그러나 국내 영상의학 전공 수의사는 수백... The post NPU로 구동되는 AI 기반 동물 영상 진단 보조 서비스 appeared first on Rebellions.
High signal Matched: npu, rebellions
Modular · inference-infra · 2025-11-20
Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience
High signal Matched: inference, gpu
Hugging Face · open-source · 2025-11-20
No feed summary available yet.
High signal Matched: introducing, api
Modal · inference-infra · 2025-11-19
Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.
High signal Matched: latency, gpu, model
Modal · inference-infra · 2025-11-18
Never block the GPU.
High signal Matched: inference, gpu
Hugging Face · open-source · 2025-11-17
No feed summary available yet.
High signal Matched: rocm
Modal · inference-infra · 2025-11-13
How Decagon and Modal made real-time voice AI possible, combining fine-tuned small models with a re-engineered inference runtime for sub-second latency.
High signal Matched: inference, latency
Hugging Face · open-source · 2025-11-13
No feed summary available yet.
High signal Matched: cloud
AIBrix · open-source · 2025-11-10
🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...
High signal Matched: prefill, latency, release, evaluation, api, openai-compatible
Google Research · big-tech · 2025-11-08
Algorithms & Theory
High signal Matched: introducing
Rebellions · hardware · 2025-11-07
리벨리온 NPU에서 직접 경험한 LLM 추론의 새로운 가능성 지난 8월 vLLM Korea Meetup에 이어, 10월 29일 리벨리온과 스퀴즈비츠 주관으로 vLLM... The post vLLM Hands-on Workshop WrapUp appeared first on Rebellions.
High signal Matched: npu, korea, rebellions
Modular · inference-infra · 2025-11-07
"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis
High signal Matched: model
Modal · inference-infra · 2025-11-04
How we built a real-time voice bot on Modal's distributed serverless platform.
High signal Matched: distributed, latency
Together AI · inference-infra · 2025-11-04
Together AI launches the fastest voice AI stack: streaming Whisper STT, serverless open-source TTS (Orpheus & Kokoro), and Voxtral transcription. Sub-second latency for production voice agents.
High signal Matched: inference, latency, agents, open-source
Together AI · inference-infra · 2025-11-04
Understanding how to evaluate and benchmark Large Language Models (LLMS). Test, compare, and understand LLMs.
High signal Matched: benchmark, evaluate
SqueezeBits · korea · 2025-10-31
Explore how the Yetter Inference Engine overcomes the limitations of step caching and model distillation for diffusion models. We analyze latency, diversity, quality, and negative-prompt handling to reveal what truly matters for scalable,...
High signal Matched: inference, generation, latency, model
Google Research · big-tech · 2025-10-31
Climate & Sustainability
High signal Matched: research
Modal · inference-infra · 2025-10-29
We've collaborated with Datalab, the creators of Marker and Surya, to make it faster than ever to deploy document intelligence workflows.
High signal Matched: throughput
SqueezeBits · korea · 2025-10-28
Explore how Intel’s new Gaudi-3 compares to Gaudi-2, NVIDIA A100, and H100. We analyze real-world GEMM efficiency, attention performance, and LLM serving results to uncover what truly matters for AI inference and training workloads.
High signal Matched: inference, serving, gemm, performance, h100, training
Hugging Face · open-source · 2025-10-23
No feed summary available yet.
High signal Matched: introducing, agent
Together AI · inference-infra · 2025-10-22
ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.
High signal Matched: benchmark
SkyPilot · open-source · 2025-10-21
AWS Batch works well for traditional enterprise batch processing (see their case studies 1 and 2). But AI workloads have different requirements - they’re more interactive, need flexible GPU access, and benefit from simpler iteration...
High signal Matched: inference, gpu
Together AI · inference-infra · 2025-10-21
Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.
High signal Matched: generation, model, openai-compatible
Google Research · big-tech · 2025-10-21
Generative AI
High signal Matched: generation
Rebellions · hardware · 2025-10-20
Summary Challenge 초대형 AI 시설은 이미 소도시 규모의 전력을 소비하고 있습니다. 단일 사이트의 수요가 100~200MW에 달해 소형 원자로급 수준입니다. AI... The post 지속 가능한 AI 확장을 위하여: 데이터센터 연산과 전력 공급의 혁신 appeared first on Rebellions.
High signal Matched: rebellions
Google Research · big-tech · 2025-10-18
Algorithms & Theory
High signal Matched: cloud
Modular · inference-infra · 2025-10-17
Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days
High signal Matched: performance
Hugging Face · open-source · 2025-10-16
No feed summary available yet.
High signal Matched: cloud, oss
Google Research · big-tech · 2025-10-15
Generative AI
High signal Matched: npu
Together AI · inference-infra · 2025-10-15
We've launched the Together AI Startup Accelerator: Up to $50K credits, expert engineering hours, GTM support, community and VC access for AI-native apps in build–scale tiers.
High signal Matched: accelerator
Together AI · inference-infra · 2025-10-10
LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance without manual tuning.
High signal Matched: inference, deepseek-v3, performance, accelerator
llm-d · open-source · 2025-10-10
llm-d v0.3 adds Google TPU and Intel XPU support, wide expert parallelism at 2.2k tokens/sec per GPU, predicted latency scheduling, and Inference Gateway GA.
High signal Matched: inference, latency, tokens/sec, gpu, tpu
Google Research · big-tech · 2025-10-03
Generative AI
High signal Matched: generation
SqueezeBits · korea · 2025-10-02
Meet 'Yetter': the generative AI API service built for speed, efficiency, and scalability. Powered by our optimization inference engine, it delivers reliable image, video, and future LLM services at a fraction of the cost.
High signal Matched: inference, cost, api
Google Research · big-tech · 2025-10-02
Human-Computer Interaction and Visualization
High signal Matched: introducing
Hugging Face · open-source · 2025-10-01
No feed summary available yet.
High signal Matched: introducing, evaluation, retrieval
Google Research · big-tech · 2025-10-01
Algorithms & Theory
High signal Matched: research
Google Research · big-tech · 2025-09-25
Generative AI
High signal Matched: research, agent
llm-d · open-source · 2025-09-24
See how llm-d's precise KV-cache aware scheduling delivers 57x faster responses and 2x throughput in production distributed LLM inference benchmarks.
High signal Matched: inference, throughput, distributed, benchmarks
Replicate · inference-infra · 2025-09-23
Here is the ultimate comparison post on all the latest image editing models.
High signal Matched: model
Modular · inference-infra · 2025-09-19
Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA
High signal Matched: blackwell, sota
Hugging Face · open-source · 2025-09-19
No feed summary available yet.
High signal Matched: inference
Rebellions · hardware · 2025-09-17
리벨리온(Rebellions)과 레드햇(Rad Hat)이 주최하고 파이토치 코리아와 스퀴즈비츠(SqueezeBits)가 함께 기획한 제1회 vLLM 커뮤니티 밋업 코리아 행사가 2025년 8월 19일 서울에서 열렸습니다.... The post The First vLLM Meetup in Korea appeared first on Rebellions.
High signal Matched: korea, rebellions
Hugging Face · open-source · 2025-09-17
No feed summary available yet.
High signal Matched: inference
Replicate · inference-infra · 2025-09-17
Find the best models and collections with a single API call.
High signal Matched: introducing, api
SqueezeBits · korea · 2025-09-16
The guide to LLM guided decoding! This deep-dive benchmark compares XGrammar and LLGuidance on vLLM and SGLang to help you find the optimal setup for generating structured output based on your use case.
High signal Matched: decoding, benchmark, performance
Modal · inference-infra · 2025-09-16
Exploring the internals of our new product, a modern Jupyter notebook built for fast startup and real-time collaboration.
High signal Matched: gpu, cloud
Together AI · inference-infra · 2025-09-15
Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...
High signal Matched: inference, cost, model, api
SkyPilot · open-source · 2025-09-12
SkyPilot now supports detailed GPU metrics across multiple Kubernetes clusters in the dashboard for better observability.
High signal Matched: gpu
Modular · inference-infra · 2025-09-12
Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance
High signal Matched: performance, blackwell, sota
Google Research · big-tech · 2025-09-12
Generative AI
High signal Matched: inference
Hugging Face · open-source · 2025-09-12
No feed summary available yet.
High signal Matched: introducing
SkyPilot · open-source · 2025-09-11
This page has moved. If you are not redirected automatically, click here.
High signal Matched: distributed, training
Google Research · big-tech · 2025-09-10
General Science
High signal Matched: research
Modal · inference-infra · 2025-09-09
A collaborative environment for high-performance interactive computing on GPUs.
High signal Matched: performance, introducing
Together AI · inference-infra · 2025-09-09
Together AI launches Instant Clusters: self-service GPU clusters with NVIDIA H100/B200, ready in minutes for training or inference at any scale.
High signal Matched: inference, gpu, h100, b200, training
Replicate · inference-infra · 2025-09-08
Cache your compiled models for faster boot and inference times
High signal Matched: inference
Modular · inference-infra · 2025-09-05
Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul
High signal Matched: matmul, blackwell
SkyPilot · open-source · 2025-09-04
How we transformed our fragmented multi-cloud AI infrastructure into a unified system with SkyPilot, achieving 10x faster development cycles.
High signal Matched: cloud
Hugging Face · open-source · 2025-09-04
No feed summary available yet.
High signal Matched: model
llm-d · open-source · 2025-09-03
Learn how llm-d's intelligent inference scheduling uses prefix-aware, load-balanced routing to maximize LLM throughput and minimize latency on Kubernetes.
High signal Matched: inference, throughput, latency
BAIR · research · 2025-09-01
What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern lan...
High signal Matched: benchmark, performance, model, weights, paper, training
Modal · inference-infra · 2025-08-28
Zencastr scaled up to 1,500 concurrent GPUs on Modal to process hundreds of years of podcast audio in just a few days. Today they run transcription, speaker detection, and audio enrichment for millions of podcast episodes on Modal, giving...
High signal Matched: cost
Modular · inference-infra · 2025-08-28
Matrix Multiplication on Blackwell: Part 1 - Introduction
High signal Matched: blackwell
Together AI · inference-infra · 2025-08-27
Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.
High signal Matched: deepseek-v3, model, swe-bench
SqueezeBits · korea · 2025-08-26
In this article, we introduce how to run LLMs efficiently on Apple Silicon with disaggregated inference technique.
High signal Matched: inference, prefill, gpu, npu
Rebellions · hardware · 2025-08-21
비전 모델과 언어 모델을 결합한 멀티모달, GPU와 NPU를 결합한 하이브리드 인프라로 기존 시스템의 제약을 극복하는 차별화된 AI 기반 안전 관제 시스템, ‘AI 비전 인텔리전스'를 개발한 코오롱베니트의 사례 The post AI로 예방 중심의 건설 & 플랜트 프로젝트 현장 안전 관리 실현 appeared first on Rebellions.
High signal Matched: gpu, npu, rebellions
Rebellions · hardware · 2025-08-21
Summary Challenge 현대의 보안관제센터(Security Operation Center, SOC)는 세 가지 과제를 동시에 해결해야 하는 트릴레마(Trilemma) 상황에 놓여 있습니다. 새로운 유형의 공격을... The post SOC의 보안 위협 탐지와 대응에 LLM 기반 AI 접목 appeared first on Rebellions.
High signal Matched: rebellions
Rebellions · hardware · 2025-08-21
Physical AI를 위한 로봇 학습용 데이터 생성과 활용 방안은? Physical AI가 도입되어 실제 환경과 AI가 상호작용하기 위해서는 모델이 매우 정교하게... The post 학습용 현실 데이터 생성: 생성형 AI로 구현하는 Physical AI appeared first on Rebellions.
High signal Matched: rebellions
Together AI · inference-infra · 2025-08-21
Build AI agents for complex, long-running engineering tasks. Learn key patterns from a case study: accelerating LLM inference with speculative decoding.
High signal Matched: inference, decoding, speculative decoding, agents
SkyPilot · open-source · 2025-08-21
Avataar's enterprise AI content platform cut costs 11x and unlocked GPU capacity by migrating from inflexible SLURM deployment to SkyPilot's multi-cloud infrastructure.
High signal Matched: gpu, cloud
SqueezeBits · korea · 2025-08-20
Efficient AI Study & Meetup recap: SqueezeBits' community study on AI model compression, featuring paper reviews, participant interviews, and networking from the offline meetup.
High signal Matched: model, paper
Together AI · inference-infra · 2025-08-19
Customize OpenAI’s gpt-oss-20B/120B with Together AI’s fine-tuning: train, optimize, and instantly deploy domain experts with enterprise reliability and cost efficiency.
High signal Matched: cost, fine-tuning, oss
Hugging Face · open-source · 2025-08-18
No feed summary available yet.
High signal Matched: cuda, gpu
Hugging Face · open-source · 2025-08-18
No feed summary available yet.
High signal Matched: research, mcp
Together AI · inference-infra · 2025-08-15
Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.
High signal Matched: model, fine-tuning, open-source
SkyPilot · open-source · 2025-08-12
Your AI writes code. Now what? If you’re building AI agents in 2025, you probably wondered that as well. Your LLM generates some Python code that analyzes data, manipulates files, or calls APIs. But where does it run? Most people eit...
High signal Matched: cloud, agent, agents, open-source
Modal · inference-infra · 2025-08-11
Welcome to another round of Modal Product Updates! Here's what's new this month.
High signal Matched: gpu
Hugging Face · open-source · 2025-08-08
No feed summary available yet.
High signal Matched: multi-gpu, gpu, training
Hugging Face · open-source · 2025-08-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-08-07
No feed summary available yet.
High signal Matched: model
Google Research · big-tech · 2025-08-07
General Science
High signal Matched: research
AIBrix · open-source · 2025-08-05
AIBrix is a composable, cloud‑native LLM inference infrastructure designed to deliver high performance and low cost at scale. We now present a major update in a new release - v0.4.0. This release tackles key bottlenecks in orchestration an...
High signal Matched: inference, prefill, generation, token generation, throughput, performance, cost, gpu, release, cloud
Modular · inference-infra · 2025-08-05
Modular Platform 25.5: Introducing Large Scale Batch Inference
High signal Matched: inference, introducing
Together AI · inference-infra · 2025-08-05
Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.
High signal Matched: model, oss
Hugging Face · open-source · 2025-08-05
No feed summary available yet.
High signal Matched: model, open-source, oss
SqueezeBits · korea · 2025-08-04
Trimming large multilingual vocabularies in Small Language Models (SLM) is a simple, low-risk way to boost efficiency to its limit. It accelerates the model inference significantly while keeping accuracy almost unchanged.
High signal Matched: inference, model
Hugging Face · open-source · 2025-08-01
No feed summary available yet.
High signal Matched: benchmark
Modular · inference-infra · 2025-07-31
SF Compute and Modular Partner to Revolutionize AI Inference Economics
High signal Matched: inference
SkyPilot · open-source · 2025-07-30
There are a lot of discussions happening in AI infrastructure right now. On one side, we have researchers who trained on Slurm in grad school, comfortable with sbatch train_model.sh and the predictability of academic HPC clusters. On the o...
High signal Matched: model, cloud
Modal · inference-infra · 2025-07-30
Using GPU snapshots to enable sub-second container startup times.
High signal Matched: gpu
Hugging Face · open-source · 2025-07-29
No feed summary available yet.
High signal Matched: introducing
llm-d · open-source · 2025-07-29
llm-d v0.2 introduces well-lit paths for Kubernetes LLM deployment: intelligent scheduling, P/D disaggregation, and MoE support with vLLM optimizations.
High signal Matched: moe
Together AI · inference-infra · 2025-07-28
Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.
High signal Matched: benchmark, model, open-source
Together AI · inference-infra · 2025-07-25
Unlock agentic coding with Qwen3-Coder on Together AI: 256K context, SWE-bench rivaling Claude Sonnet 4, zero-setup instant deployment.
High signal Matched: model, swe-bench, agentic
SkyPilot · open-source · 2025-07-24
Announcing SkyPilot 0.10 - the largest release yet with enterprise-grade features.
High signal Matched: release
Hugging Face · open-source · 2025-07-23
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-07-23
No feed summary available yet.
High signal Matched: inference, lora
SqueezeBits · korea · 2025-07-21
LoRA excels at efficient fine-tuning but suffers at higher ranks due to gradient entanglement. We introduce GraLoRA, which addresses these issues through finer-grained, block-wise updates, significantly enhancing performance and expressivi...
High signal Matched: performance, cost, fine-tuning, lora
Together AI · inference-infra · 2025-07-17
Together AI inference is now among the world’s fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for NVIDIA HGX B200.
High signal Matched: inference, b200, blackwell, open-source
SkyPilot · open-source · 2025-07-16
This is Part 2 of our series on the evolution of AI Job Orchestration. In Part 1, we explored how Neoclouds are democratizing GPU access but leaving the “last mile” unsolved. Now we’ll discover how AI-native orchestration...
High signal Matched: infiniband, performance, cost, gpu, cloud
Modal · inference-infra · 2025-07-16
Engineers of language model applications should think about requests, not tokens.
High signal Matched: model
Together AI · inference-infra · 2025-07-14
Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.
High signal Matched: cost, model, open model, agentic, open-source
Modal · inference-infra · 2025-07-11
Welcome to another round of Modal Product Updates! Here's what's new this month.
High signal Matched: multi-node, b200, release, training
Nota AI · korea · 2025-07-10
Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...
High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks
Together AI · inference-infra · 2025-07-10
No feed summary available yet.
High signal Matched: performance
Hugging Face · open-source · 2025-07-10
No feed summary available yet.
High signal Matched: inference
SkyPilot · open-source · 2025-07-08
If you’re an infrastructure or MLOps engineer at a large company, you know the drill. The ML team comes to you with requirements that change weekly. They need GPUs yesterday, but the budget was set six months ago. They want to use th...
High signal Matched: cost, gpu
Replicate · inference-infra · 2025-07-07
It's hard keeping up with every new video model. In this post we'll help you pick the best one for your needs.
High signal Matched: model
Hugging Face · open-source · 2025-07-04
No feed summary available yet.
High signal Matched: evaluation, training
SqueezeBits · korea · 2025-07-03
At SqueezeBits we have been empowering developers to efficiently deploy complex AI models while minimizing performance trade-offs with OwLite toolkit. With OwLite v2.5, we're excited to announce official support for Qualcomm Neural Network...
High signal Matched: performance
SkyPilot · open-source · 2025-07-02
Configure high-performance networking on different cloud providers and managed infrastructure with unified SkyPilot's network tier abstraction
High signal Matched: performance, cloud
Modal · inference-infra · 2025-07-02
There's only one playbook for improving generative applications. Read about it here.
High signal Matched: inference, evals
BAIR · research · 2025-07-01
.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...
High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents
SqueezeBits · korea · 2025-07-01
SqueezeBits has partnered with Intel to make Gaudi NPUs more usable in practice. We optimized LLMs and diffusion models for Gaudi-2 and created yetter, a generative AI API service.
High signal Matched: api
llm-d · open-source · 2025-06-25
Help shape llm-d's future: Take our 5-minute community survey, subscribe to our YouTube channel, and access exclusive resources for LLM serving innovation.
High signal Matched: serving
Modal · inference-infra · 2025-06-18
Price, performance, and control: pick three.
High signal Matched: performance
Hugging Face · open-source · 2025-06-16
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2025-06-12
No feed summary available yet.
High signal Matched: performance
Hugging Face · open-source · 2025-06-12
No feed summary available yet.
High signal Matched: kernel
Hugging Face · open-source · 2025-06-12
No feed summary available yet.
High signal Matched: inference
Together AI · inference-infra · 2025-06-11
No feed summary available yet.
High signal Matched: cost, introducing, api
Hugging Face · open-source · 2025-06-11
No feed summary available yet.
High signal Matched: introducing, training
SqueezeBits · korea · 2025-06-10
SqueezeBits at Japan IT Week Spring 2025 in Tokyo: AI model compression demos, OwLite and Fits on Chips introductions, Japan market entry experiences, and team stories from the frontline.
High signal Matched: model
Modular · inference-infra · 2025-06-10
Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple
High signal Matched: introducing
Modular · inference-infra · 2025-06-10
Modular + AMD: Unleashing AI performance on AMD GPUs
High signal Matched: performance
Modal · inference-infra · 2025-06-09
We've released v1.0 of the Modal client, marking a new milestone of maturity and stability for our platform.
High signal Matched: introducing
Hugging Face · open-source · 2025-06-06
No feed summary available yet.
High signal Matched: evaluation, agents
Together AI · inference-infra · 2025-06-05
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-06-04
No feed summary available yet.
High signal Matched: kv cache
Hugging Face · open-source · 2025-06-04
No feed summary available yet.
High signal Matched: generation
llm-d · open-source · 2025-06-03
llm-d hits 1000 GitHub stars! Week 1-2 round-up covers KVTransfer Protocol, InferenceModel API updates, and community resources for LLM inference developers.
High signal Matched: inference, api
Hugging Face · open-source · 2025-06-03
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-06-03
No feed summary available yet.
High signal Matched: gpu
Modal · inference-infra · 2025-05-30
We’re excited to be making Nvidia B200 and H200 GPUs available on Modal starting today!
High signal Matched: h200, b200, introducing
Modular · inference-infra · 2025-05-29
Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon
High signal Matched: kernel, gpu
Modal · inference-infra · 2025-05-22
Modal Batch is a new interface backed by a new durable queue system built specifically to make job processing easy, scalable, and fault-tolerant.
High signal Matched: introducing
Replicate · inference-infra · 2025-05-22
Google's flagship image generation model, Imagen 4, is now available for you to try on Replicate. Create images with fine detail, versatile styles, and improved typography.
High signal Matched: generation, model
AIBrix · open-source · 2025-05-22
AIBrix is a composable, cloud-native AI infrastructure toolkit designed to power scalable and cost-effective large language model (LLM) inference. As production demands for memory-efficient and latency-aware LLM services continue to grow,...
High signal Matched: inference, prefix cache, latency, cost, release, model, cloud
Hugging Face · open-source · 2025-05-21
No feed summary available yet.
High signal Matched: performance
llm-d · open-source · 2025-05-20
Introducing llm-d: Kubernetes-native distributed LLM inference with KV-cache routing, disaggregated serving, and SOTA performance per dollar. Built on vLLM.
High signal Matched: inference, serving, distributed, performance, introducing, sota
SqueezeBits · korea · 2025-05-20
This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.
High signal Matched: model, quantized
llm-d · open-source · 2025-05-20
Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.
High signal Matched: inference, distributed, release, cloud, open source
Modular · inference-infra · 2025-05-20
Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥
High signal Matched: kernel, gpu
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
High signal Matched: introducing, sota
Replicate · inference-infra · 2025-05-16
NVIDIA H100 GPUs are here, with better performance and lower cost.
High signal Matched: performance, cost, h100
Hugging Face · open-source · 2025-05-15
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-05-14
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2025-05-13
No feed summary available yet.
High signal Matched: inference
Together AI · inference-infra · 2025-05-12
No feed summary available yet.
High signal Matched: decoding, speculative decoding
Nota AI · korea · 2025-05-08
Jaewoo SongSoftware Engineer, Nota AI SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...
High signal Matched: performance, model, weights, research, quantization, int8, int4
Nota AI · korea · 2025-05-07
Jewon Lee | Ki-Ung Song | Seungmin Yang | Donguk Lim | Jaeyeon Kim | Wooksu Shin | Bo-Kyeong Kim | Tae-Ho KimEdgeFM Team, Nota AI Yong Jae Lee, Ph. D.Associate Professor, UW-Madison SummaryOur method, Trimmed-Llama, reduces t...
High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source
Modal · inference-infra · 2025-05-07
How we use an eighty-year-old algorithm to find arbitrages in the cloud market.
High signal Matched: cloud
SqueezeBits · korea · 2025-05-07
This article describes the experimental results of quantized YOLO models with OwLite.
High signal Matched: quantized
Together AI · inference-infra · 2025-05-05
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2025-04-29
No feed summary available yet.
High signal Matched: introducing, quantization
Modal · inference-infra · 2025-04-24
Modal + Daily + Pipecat is the best-in-class infra stack for real-time inference pipelines.
High signal Matched: inference
Together AI · inference-infra · 2025-04-24
No feed summary available yet.
High signal Matched: blackwell
Modal · inference-infra · 2025-04-18
sync. is a research lab training foundational models to understand and manipulate humans in video. After outgrowing Google Colab, they partnered with Modal for efficient deployment, allowing rapid iteration and scaling to process over 100...
High signal Matched: research, training
Modular · inference-infra · 2025-04-17
Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ
High signal Matched: gpu
Hugging Face · open-source · 2025-04-16
No feed summary available yet.
High signal Matched: prefill, performance
Hugging Face · open-source · 2025-04-16
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2025-04-16
No feed summary available yet.
High signal Matched: introducing, evaluating, long-context
BAIR · research · 2025-04-11
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...
High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota
SqueezeBits · korea · 2025-04-11
Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.
High signal Matched: performance, model, quantization
BAIR · research · 2025-04-08
PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...
High signal Matched: inference, generation, cost, model, weights, research, training, retrieval
Nota AI · korea · 2025-04-08
Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...
High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks
SkyPilot · open-source · 2025-04-08
Techniques to speed up checkpointing by 9.6x and how to easily achieve them in SkyPilot
High signal Matched: performance, model, cloud, checkpointing
Hugging Face · open-source · 2025-04-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-04-02
No feed summary available yet.
High signal Matched: performance
SqueezeBits · korea · 2025-04-02
This article discusses inference efficiency when running the FLUX.1 models on Intel Gaudi-2 hardware.
High signal Matched: inference
Hugging Face · open-source · 2025-03-28
No feed summary available yet.
High signal Matched: inference
Modular · inference-infra · 2025-03-26
What about Triton and Python eDSLs? (Democratizing AI Compute, Part 7)
High signal Matched: triton
SqueezeBits · korea · 2025-03-26
With TensorRT-LLM now open source, we can finally take a deep dive into the secret sauce behind its impressive performance.
High signal Matched: performance, open source
Modular · inference-infra · 2025-03-25
MAX 25.2: Unleash the power of your H200's–without CUDA!
High signal Matched: cuda, h200
Hugging Face · open-source · 2025-03-24
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-03-21
No feed summary available yet.
High signal Matched: inference
SkyPilot · open-source · 2025-03-20
How to accelerate distributed embedding generation? Use the "forgotten" regions.
High signal Matched: inference, generation, distributed
SkyPilot · open-source · 2025-03-11
Transforming SkyPilot into a scalable, multi-user platform.
High signal Matched: introducing
AIBrix · open-source · 2025-03-10
This blog post introduces deploying DeepSeek R1 using AIBrix. DeepSeek-R1 demonstrates remarkable proficiency in reasoning tasks through step-by-step training process. It features 671B total parameters with 37B active parameters, and 128k...
High signal Matched: inference, distributed, benchmark, model, weights, training, context length
Hugging Face · open-source · 2025-03-07
No feed summary available yet.
High signal Matched: inference
Modular · inference-infra · 2025-03-05
What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)
High signal Matched: cuda
Replicate · inference-infra · 2025-03-05
Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.
High signal Matched: generation, model, cloud, api, open-source
SkyPilot · open-source · 2025-03-05
SkyPilot uses the venerable SQLite for state management. SQLite can handle millions of QPS, and terabytes of data. However, our efforts to scale our Managed Jobs feature ran up against the one downfall of SQLite: many concurrent writers. S...
High signal Matched: qps
Hugging Face · open-source · 2025-02-27
No feed summary available yet.
High signal Matched: model
SqueezeBits · korea · 2025-02-27
This article introduces Fits on Chips, an LLMOps toolkit for performance evaluation.
High signal Matched: performance, evaluation
SkyPilot · open-source · 2025-02-26
DeepSeek R1 has shown great reasoning capability when it is firstly released. In this blog post, we detail our learnings in using DeepSeek R1 to build a Retrieval-Augmented Generation (RAG) system, tailored for legal documents. We choose l...
High signal Matched: generation, research, rag, retrieval-augmented generation, retrieval
Nota AI · korea · 2025-02-25
Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI Jaeyeon KimAI Research Engineer, Nota AI SummaryIn this study, we propose a method for determining whether given multilingual...
High signal Matched: generation, performance, model, paper, research, training, fine-tuning
Modal · inference-infra · 2025-02-24
A guide to maximizing the utilization of GPUs, from cloud allocations to FLOP/s.
High signal Matched: gpu, cloud
Hugging Face · open-source · 2025-02-24
No feed summary available yet.
High signal Matched: inference, decoding
Modal · inference-infra · 2025-02-21
GPU documentation for the people, now by the people.
High signal Matched: gpu
AIBrix · open-source · 2025-02-21
Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...
High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source
Modular · inference-infra · 2025-02-20
CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)
High signal Matched: cuda
AIBrix · open-source · 2025-02-19
We’re excited to announce the v0.2.0 release of AIBrix! Building on feedback from v0.1.0 production adoption and user interest, this release introduces several new features to enhance performance and usability. Extend the vLLM Prefix...
High signal Matched: inference, serving, prefill, throughput, distributed, multi-node, kv cache, prefix cache, performance, cost, gpu, accelerator, release, agent
Hugging Face · open-source · 2025-02-18
No feed summary available yet.
High signal Matched: inference, introducing
Modular · inference-infra · 2025-02-18
MAX 25.1 - Introducing MAX Builds
High signal Matched: introducing
SqueezeBits · korea · 2025-02-17
A brief review of the research paper from our team, published at ICML 2024.
High signal Matched: verification, paper, research
Modular · inference-infra · 2025-02-12
How did CUDA succeed? (Democratizing AI Compute, Part 3)
High signal Matched: cuda
Hugging Face · open-source · 2025-02-12
No feed summary available yet.
High signal Matched: generation
Nota AI · korea · 2025-02-10
Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI SummaryIn this study, we present a method for detecting ambiguous samples in natural language understanding (NLU) tasks using...
High signal Matched: performance, paper, research, evaluation, training, evaluate
SqueezeBits · korea · 2025-02-10
This article is about an open-source library for direct conversion of PyTorch models to TensorRT-LLM.
High signal Matched: open-source
Modular · inference-infra · 2025-02-06
Paged Attention & Prefix Caching Now Available in MAX Serve
High signal Matched: serve, paged attention
Modular · inference-infra · 2025-02-05
What exactly is “CUDA”? (Democratizing AI Compute, Part 2)
High signal Matched: cuda
Hugging Face · open-source · 2025-02-04
No feed summary available yet.
High signal Matched: benchmark, agent
Modular · inference-infra · 2025-01-30
Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling
High signal Matched: serve, agents, agentic, function calling
Modal · inference-infra · 2025-01-28
Serializing container state to disk for aggressive cold start optimization.
High signal Matched: checkpoint
Hugging Face · open-source · 2025-01-28
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2025-01-27
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2025-01-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2025-01-22
No feed summary available yet.
High signal Matched: model
SqueezeBits · korea · 2025-01-20
This article provides a comparative analysis of serving vision-language models on vLLM and TensorRT-LLM.
High signal Matched: serving
Hugging Face · open-source · 2025-01-16
No feed summary available yet.
High signal Matched: inference, generation, introducing
Hugging Face · open-source · 2025-01-16
No feed summary available yet.
High signal Matched: model
SqueezeBits · korea · 2025-01-13
In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.
High signal Matched: performance, accelerator, fp8, quantization, evaluate
Hugging Face · open-source · 2025-01-09
No feed summary available yet.
High signal Matched: performance, leaderboard
SqueezeBits · korea · 2025-01-06
In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.
High signal Matched: performance, accelerator, evaluation, evaluate
Hugging Face · open-source · 2024-12-31
No feed summary available yet.
High signal Matched: introducing, agents
Hugging Face · open-source · 2024-12-24
No feed summary available yet.
High signal Matched: gpu
Hugging Face · open-source · 2024-12-23
No feed summary available yet.
High signal Matched: generation, model
Modal · inference-infra · 2024-12-19
NVIDIA L40S GPUs available on Modal now!
High signal Matched: introducing
Hugging Face · open-source · 2024-12-19
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-12-18
No feed summary available yet.
High signal Matched: inference, model
Modular · inference-infra · 2024-12-17
Introducing MAX 24.6: A GPU Native Generative AI Platform
High signal Matched: gpu, introducing
Modular · inference-infra · 2024-12-17
MAX GPU: State of the Art Throughput on a New GenAI platform
High signal Matched: throughput, gpu, state of the art
Hugging Face · open-source · 2024-12-17
No feed summary available yet.
High signal Matched: performance, model
Modular · inference-infra · 2024-12-17
Build a Continuous Chat Interface with Llama 3 and MAX Serve
High signal Matched: serve
Hugging Face · open-source · 2024-12-16
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-12-10
No feed summary available yet.
High signal Matched: research, open source
SqueezeBits · korea · 2024-12-09
This article provides a comparative analysis of speculative decoding.
High signal Matched: decoding, speculative decoding
Hugging Face · open-source · 2024-12-09
No feed summary available yet.
High signal Matched: bedrock
Hugging Face · open-source · 2024-12-09
No feed summary available yet.
High signal Matched: generation
SqueezeBits · korea · 2024-12-05
This article provides a comparative analysis of multi-LoRA serving capabilities of vLLM and TensorRT-LLM frameworks.
High signal Matched: serving, lora
Hugging Face · open-source · 2024-12-04
No feed summary available yet.
High signal Matched: benchmark, evaluation, leaderboard
Hugging Face · open-source · 2024-12-03
No feed summary available yet.
High signal Matched: performance
SqueezeBits · korea · 2024-12-03
In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.
High signal Matched: performance, accelerator, evaluation, evaluate
Hugging Face · open-source · 2024-11-26
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2024-11-24
Announcing Modal's newest cloud partnership.
High signal Matched: release, cloud
SqueezeBits · korea · 2024-11-21
In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.
High signal Matched: performance, accelerator, evaluate
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
High signal Matched: decoding, generation, speculative decoding
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
High signal Matched: introducing, leaderboard
SqueezeBits · korea · 2024-11-18
This article provides a comparative analysis of the effects of KV cache quantization on vLLM and TensorRT-LLM frameworks.
High signal Matched: kv cache, quantization
Replicate · inference-infra · 2024-11-15
NVIDIA L40S GPUs are here, with better performance and lower cost.
High signal Matched: performance, cost
AIBrix · open-source · 2024-11-13
In recent years, large language models (LLMs) have revolutionized AI applications, powering solutions in areas like chatbots, automated content generation, and advanced recommendation engines. Services like OpenAI’s have gained significant...
High signal Matched: decoding, prefill, generation, kv cache, performance, cost, gpu, release, introducing, cloud, open-source
SqueezeBits · korea · 2024-11-11
This article provides a comparative analysis of the effects of weight-activation quantization on vLLM and TensorRT-LLM frameworks.
High signal Matched: quantization
Hugging Face · open-source · 2024-11-04
No feed summary available yet.
High signal Matched: evaluation, fine-tuning
SkyPilot · open-source · 2024-11-01
For AI teams: How do you efficiently spend $1M+ cloud credits across 3+ clouds?
High signal Matched: cloud
SqueezeBits · korea · 2024-11-01
This article provides a comparative analysis of the effects of weight-only quantization on vLLM and TensorRT-LLM frameworks.
High signal Matched: quantization
SqueezeBits · korea · 2024-10-30
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks, focusing on performance with fixed and dynamic datasets.
High signal Matched: performance
Hugging Face · open-source · 2024-10-29
No feed summary available yet.
High signal Matched: decoding, generation, model
Modal · inference-infra · 2024-10-25
Why Modal is obsessed with serverless AI infrastructure
High signal Matched: gpu
Hugging Face · open-source · 2024-10-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-23
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
High signal Matched: generation
Replicate · inference-infra · 2024-10-22
We've partnered with Ideogram to bring their inpainting model to Replicate's API.
High signal Matched: model, api
SqueezeBits · korea · 2024-10-18
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks with various sampling methods.
High signal Matched: performance
SqueezeBits · korea · 2024-10-11
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks, focusing on batching configurations and thoroughly examining the effects of maximum batch size and maximum number of tokens.
High signal Matched: serving
Hugging Face · open-source · 2024-10-10
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-10-08
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2024-10-04
No feed summary available yet.
High signal Matched: introducing, leaderboard
Replicate · inference-infra · 2024-10-03
Black Forest Labs continue to push boundaries with their latest release of FLUX.1 image generation model.
High signal Matched: generation, release, model
SqueezeBits · korea · 2024-10-01
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks for serving LLMs, evaluating their performance based on key metrics like throughput, TTFT, and TPOT to offer insights for practitioners in optimizing LLM depl...
High signal Matched: serving, throughput, performance, ttft, tpot, evaluation, evaluating
Hugging Face · open-source · 2024-09-17
No feed summary available yet.
High signal Matched: introducing
Modal · inference-infra · 2024-09-16
Learn how we used our new dynamic batching feature to improve throughput and reduce inference costs for the Whisper model with a single line of code!
High signal Matched: inference, throughput, model
Hugging Face · open-source · 2024-09-16
No feed summary available yet.
High signal Matched: introducing
SkyPilot · open-source · 2024-09-16
With last week’s Pixtral release, multimodal large language models (LLMs) like OpenAI’s GPT-4o, Google’s Gemini Pro, and Pixtral are making significant strides. These models are not only able to generate text from images...
High signal Matched: release
Modular · inference-infra · 2024-09-13
MAX 24.5 - With SOTA CPU Performance for Llama 3.1
High signal Matched: performance, sota
Modal · inference-infra · 2024-09-10
A step-by-step guide to building a scalable analytics stack using Modal, dlt, and dbt for efficient data loading, transformation, and deployment.
High signal Matched: cost
Hugging Face · open-source · 2024-08-19
No feed summary available yet.
High signal Matched: cloud
Hugging Face · open-source · 2024-08-12
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2024-08-06
...and we're passing the savings to you. 15-30% price cuts on GPUs and CPUs.
High signal Matched: gpu
Hugging Face · open-source · 2024-08-06
No feed summary available yet.
High signal Matched: introducing
Modal · inference-infra · 2024-08-05
Scale up smaller open models with search and evaluation to match frontier capabilities.
High signal Matched: evaluation
Nota AI · korea · 2024-08-02
Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...
High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source
Hugging Face · open-source · 2024-07-29
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2024-07-25
No feed summary available yet.
High signal Matched: evaluation, fine-tuning
Replicate · inference-infra · 2024-07-23
Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Hugging Face · open-source · 2024-07-18
No feed summary available yet.
High signal Matched: serve, lora
SkyPilot · open-source · 2024-07-11
Develop, Train and Serve AI on Kubernetes with SkyPilot.
High signal Matched: serve
Modal · inference-infra · 2024-07-09
Welcome to another round of Modal Product Updates! Here's what's new this month.
High signal Matched: latency
Modular · inference-infra · 2024-07-09
Bring your own PyTorch model
High signal Matched: model
Hugging Face · open-source · 2024-07-09
No feed summary available yet.
High signal Matched: cloud
Hugging Face · open-source · 2024-07-03
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-07-01
No feed summary available yet.
High signal Matched: benchmark, agent
SqueezeBits · korea · 2024-06-26
Estimating the cost savings from model compression.
High signal Matched: cost, model
Hugging Face · open-source · 2024-06-25
No feed summary available yet.
High signal Matched: model
Modal · inference-infra · 2024-06-20
Isolate your tasks with Modal containers while using Airflow for orchestration.
High signal Matched: gpu
Hugging Face · open-source · 2024-06-18
No feed summary available yet.
High signal Matched: generation
Replicate · inference-infra · 2024-06-14
Create your own custom version of Stability's latest image generation model and run it on Replicate via the web or API.
High signal Matched: generation, model, api
Nota AI · korea · 2024-06-13
Jeongho KimResearch Engineer, Nota AI SummaryOnline multi-camera system for efficient individual trackingAccurate ID management with Cluster Self-Refinement (CSR)Improved performance with enhanced pose estimation Intro...
High signal Matched: performance, model, paper, research, evaluation, leaderboard
Replicate · inference-infra · 2024-06-12
We'll soon support NVIDIA's H100 GPUs for predictions and training. Let us know if you want early access.
High signal Matched: h100, training
Replicate · inference-infra · 2024-06-12
Stable Diffusion 3 is the latest text-to-image model from Stability, with improved image quality, typography, prompt understanding, and resource efficiency. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2024-06-07
No feed summary available yet.
High signal Matched: introducing, sagemaker
Modular · inference-infra · 2024-06-07
MAX 24.4 - Introducing quantization APIs and MAX on macOS
High signal Matched: introducing, quantization
Hugging Face · open-source · 2024-06-05
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-06-04
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2024-05-29
No feed summary available yet.
High signal Matched: inference, generation
Modular · inference-infra · 2024-05-29
What ownership is really about: a mental model approach
High signal Matched: model
Hugging Face · open-source · 2024-05-24
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-05-24
No feed summary available yet.
High signal Matched: evaluation
Modal · inference-infra · 2024-05-21
How we fine-tuned a Stable Diffusion model on the Heroicons library to generate all the icons we could dream of.
High signal Matched: model, fine-tuning
Hugging Face · open-source · 2024-05-21
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-05-21
No feed summary available yet.
High signal Matched: gpu
Hugging Face · open-source · 2024-05-21
No feed summary available yet.
High signal Matched: cloud
Modal · inference-infra · 2024-05-20
Learn how Substack sped up their developer iteration cycles by moving ML training and deployment to Modal from AWS SageMaker.
High signal Matched: sagemaker, training
Hugging Face · open-source · 2024-05-16
No feed summary available yet.
High signal Matched: generation, quantization
Hugging Face · open-source · 2024-05-14
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-05-14
No feed summary available yet.
High signal Matched: introducing, leaderboard
Hugging Face · open-source · 2024-05-13
No feed summary available yet.
High signal Matched: introducing, agents
Modal · inference-infra · 2024-05-13
You can now specify which cloud region you would like to run your Functions in.
High signal Matched: introducing, cloud
Hugging Face · open-source · 2024-05-09
No feed summary available yet.
High signal Matched: cost, rag
Modal · inference-infra · 2024-05-07
Welcome to another round of Modal Product Updates! Here's what's new this month.
High signal Matched: cloud
Hugging Face · open-source · 2024-05-05
No feed summary available yet.
High signal Matched: introducing, leaderboard
Hugging Face · open-source · 2024-05-03
No feed summary available yet.
High signal Matched: performance, leaderboard
Modular · inference-infra · 2024-05-02
MAX 24.3 - Introducing MAX Engine Extensibility
High signal Matched: introducing
Hugging Face · open-source · 2024-05-01
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding
Hugging Face · open-source · 2024-04-29
No feed summary available yet.
High signal Matched: generation
SqueezeBits · korea · 2024-04-24
Clarifying the misunderstandings in AI model compression
High signal Matched: model
SqueezeBits · korea · 2024-04-23
The Blackwell GPU from GTC 2024 was astonishing.Analysis of the Nvidia GPU evolution & what it means for GPU users.
High signal Matched: gpu, blackwell
Hugging Face · open-source · 2024-04-23
No feed summary available yet.
High signal Matched: introducing, leaderboard
Replicate · inference-infra · 2024-04-23
Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
SqueezeBits · korea · 2024-04-19
Do I need to COMPRESS my AI model? : the short answer is “YES” — and here’s why.
High signal Matched: model
Replicate · inference-infra · 2024-04-18
Llama 3 is the latest language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2024-04-16
No feed summary available yet.
High signal Matched: introducing, evaluation, leaderboard
SqueezeBits · korea · 2024-04-15
AI model compression for acceleration is essential. The question is HOW? Here are 4 key methodologies.
High signal Matched: model
Hugging Face · open-source · 2024-04-15
No feed summary available yet.
High signal Matched: introducing, model
Modular · inference-infra · 2024-04-10
Row-major vs. Column-major Matrices: A Performance Analysis in Mojo and NumPy
High signal Matched: performance
Hugging Face · open-source · 2024-04-10
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2024-04-09
No feed summary available yet.
High signal Matched: release
Hugging Face · open-source · 2024-04-04
No feed summary available yet.
High signal Matched: research
Hugging Face · open-source · 2024-04-03
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2024-04-02
No feed summary available yet.
High signal Matched: inference, gpu
Hugging Face · open-source · 2024-03-21
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2024-03-20
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2024-03-18
No feed summary available yet.
High signal Matched: h100, cloud
Hugging Face · open-source · 2024-03-05
No feed summary available yet.
High signal Matched: introducing, model
Hugging Face · open-source · 2024-02-29
No feed summary available yet.
High signal Matched: generation, accelerator
Modal · inference-infra · 2024-02-27
Modal now supports WebSocket connections, enabling real-time, bidirectional data transfer between client and server.
High signal Matched: introducing
Hugging Face · open-source · 2024-02-23
No feed summary available yet.
High signal Matched: introducing, leaderboard
Modal · inference-infra · 2024-02-21
Find out how Suno uses Modal to scale inference and batch pre-processing to thousands of GPUs.
High signal Matched: inference, launch
SkyPilot · open-source · 2024-02-20
SkyServe: A simple, cost-efficient, multi-region/cloud library for serving GenAI models.
High signal Matched: serving, cost, introducing, cloud
Hugging Face · open-source · 2024-02-20
No feed summary available yet.
High signal Matched: introducing, evaluation, korean, leaderboard
Modal · inference-infra · 2024-02-06
We’re excited to be making Nvidia H100 GPUs available on Modal starting today!
High signal Matched: h100, introducing
Hugging Face · open-source · 2024-02-01
No feed summary available yet.
High signal Matched: inference, generation
Hugging Face · open-source · 2024-01-31
No feed summary available yet.
High signal Matched: introducing, leaderboard
Hugging Face · open-source · 2024-01-30
No feed summary available yet.
High signal Matched: decoding, speculative decoding
Replicate · inference-infra · 2024-01-30
Code Llama 70B is one of the powerful open-source code generation models. Learn how to run it in the cloud with one line of code.
High signal Matched: generation, cloud, api, open-source
Hugging Face · open-source · 2024-01-15
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2024-01-04
No feed summary available yet.
High signal Matched: generation
SkyPilot · open-source · 2023-12-21
A tutorial for serving Mixtral 8x7B model with SkyPilot and SkyServe.
High signal Matched: serving, mixtral, cost, gpu, model
Hugging Face · open-source · 2023-12-20
No feed summary available yet.
High signal Matched: inference, decoding, speculative decoding
Hugging Face · open-source · 2023-12-11
No feed summary available yet.
High signal Matched: mixture of experts, mixtral, sota
Hugging Face · open-source · 2023-12-11
No feed summary available yet.
High signal Matched: mixture of experts
Hugging Face · open-source · 2023-12-05
No feed summary available yet.
High signal Matched: gpu
Hugging Face · open-source · 2023-12-05
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2023-12-05
No feed summary available yet.
High signal Matched: inference, lora
Replicate · inference-infra · 2023-11-10
An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.
High signal Matched: performance, model, api, open-source
Hugging Face · open-source · 2023-11-07
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-11-07
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-11-07
No feed summary available yet.
High signal Matched: performance, lora
Hugging Face · open-source · 2023-11-03
No feed summary available yet.
High signal Matched: introducing
Replicate · inference-infra · 2023-10-25
How to run a latent consistency model on your M1 or M2 Mac
High signal Matched: model
Hugging Face · open-source · 2023-10-24
No feed summary available yet.
High signal Matched: inference
Replicate · inference-infra · 2023-10-17
In this post we'll explore the basics of retrieval augmented generation by creating an example app that uses bge-large-en for embeddings, ChromaDB for vector store, and mistral-7b-instruct for language model generation.
High signal Matched: generation, model, retrieval augmented generation, retrieval
Modal · inference-infra · 2023-10-10
Modal Labs Announces Series A Financing Round, Securing $16 Million Investment to Launch Cloud-Based Infrastructure Platform, Build Towards End-to-End Enterprise Data Stack
High signal Matched: release, launch, cloud
Replicate · inference-infra · 2023-10-06
Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Hugging Face · open-source · 2023-10-03
No feed summary available yet.
High signal Matched: inference, tpu, cloud
Hugging Face · open-source · 2023-10-03
No feed summary available yet.
High signal Matched: performance
Hugging Face · open-source · 2023-10-02
No feed summary available yet.
High signal Matched: inference, api
SkyPilot · open-source · 2023-09-27
Covariant runs AI on the cloud using SkyPilot, delivering models 4x faster cost-effectively.
High signal Matched: cost, cloud
Hugging Face · open-source · 2023-09-26
No feed summary available yet.
High signal Matched: benchmark, sagemaker
Hugging Face · open-source · 2023-09-22
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2023-09-13
No feed summary available yet.
High signal Matched: generation, introducing
Hugging Face · open-source · 2023-09-08
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-09-01
No feed summary available yet.
High signal Matched: latency, sagemaker
Hugging Face · open-source · 2023-08-22
No feed summary available yet.
High signal Matched: introducing, model
Hugging Face · open-source · 2023-08-22
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-08-04
No feed summary available yet.
High signal Matched: inference
SkyPilot · open-source · 2023-08-02
An operational guide on finetuning Llama 2, ready for commercial use.
High signal Matched: cloud, finetuning
Hugging Face · open-source · 2023-08-01
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-08-01
No feed summary available yet.
High signal Matched: weights
Replicate · inference-infra · 2023-07-27
Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open source
Hugging Face · open-source · 2023-07-24
No feed summary available yet.
High signal Matched: introducing, agents
Replicate · inference-infra · 2023-07-19
A roundup of recent developments from the llamaverse following the second major release of Meta's open-source large language model.
High signal Matched: release, model, open-source
Hugging Face · open-source · 2023-07-17
No feed summary available yet.
High signal Matched: generation, open-source
Hugging Face · open-source · 2023-07-04
No feed summary available yet.
High signal Matched: inference
SkyPilot · open-source · 2023-06-29
SkyPilot makes the deployment and development of vLLM easy and fast on clouds.
High signal Matched: serving, cloud
Hugging Face · open-source · 2023-06-13
No feed summary available yet.
High signal Matched: gpu
Hugging Face · open-source · 2023-05-31
No feed summary available yet.
High signal Matched: inference, introducing, sagemaker
Hugging Face · open-source · 2023-05-31
No feed summary available yet.
High signal Matched: introducing
SkyPilot · open-source · 2023-05-30
Announcing SkyPilot 0.3: LLM support, new clouds, and enhanced production readiness.
High signal Matched: gpu
Replicate · inference-infra · 2023-05-26
Prompt engineering and training are often the first solutions we reach for to improve language model behavior, but they're not the only way.
High signal Matched: model, training
Hugging Face · open-source · 2023-05-24
No feed summary available yet.
High signal Matched: launch, model
Hugging Face · open-source · 2023-05-23
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-05-15
No feed summary available yet.
High signal Matched: gpu, rocm
Hugging Face · open-source · 2023-05-15
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2023-05-11
No feed summary available yet.
High signal Matched: generation, latency
SkyPilot · open-source · 2023-05-02
Experience report from Salk Institute on how biologists use SkyPilot to conduct research on the cloud.
High signal Matched: research, cloud
Hugging Face · open-source · 2023-04-27
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2023-04-24
No feed summary available yet.
High signal Matched: introducing
Replicate · inference-infra · 2023-04-21
A roundup of recent developments from the world of open-source language models.
High signal Matched: model, open-source
Hugging Face · open-source · 2023-03-28
No feed summary available yet.
High signal Matched: inference, accelerator
Hugging Face · open-source · 2023-03-28
No feed summary available yet.
High signal Matched: inference
Replicate · inference-infra · 2023-03-23
No feed summary available yet.
High signal Matched: model, lora
SkyPilot · open-source · 2023-03-20
Want to host your own LLM Chatbot on any cloud of your choosing?
High signal Matched: cloud
Hugging Face · open-source · 2023-03-09
No feed summary available yet.
High signal Matched: gpu, rlhf, fine-tuning
Hugging Face · open-source · 2023-03-06
No feed summary available yet.
High signal Matched: kakao
Hugging Face · open-source · 2023-02-15
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-02-15
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2023-02-07
No feed summary available yet.
High signal Matched: introducing, agents
Replicate · inference-infra · 2023-02-07
It's like DreamBooth, but much faster. And you can run it in the cloud on Replicate.
High signal Matched: introducing, cloud, lora
Hugging Face · open-source · 2023-01-26
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2023-01-20
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2022-12-20
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-12-14
No feed summary available yet.
High signal Matched: inference, training
Hugging Face · open-source · 2022-11-21
No feed summary available yet.
High signal Matched: inference
Replicate · inference-infra · 2022-11-21
With just a handful of images and a single API call, you can train a model, publish it to Replicate, and run predictions on it in the cloud.
High signal Matched: model, cloud, api
Hugging Face · open-source · 2022-11-17
No feed summary available yet.
High signal Matched: arxiv
SkyPilot · open-source · 2022-11-16
Introducing SkyPilot.
High signal Matched: cost, introducing, cloud
Hugging Face · open-source · 2022-11-08
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-10-24
No feed summary available yet.
High signal Matched: model, evaluate, evaluating
Hugging Face · open-source · 2022-10-21
No feed summary available yet.
High signal Matched: distributed, training, distributed training
Hugging Face · open-source · 2022-10-19
No feed summary available yet.
High signal Matched: benchmark
Hugging Face · open-source · 2022-10-14
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2022-10-12
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2022-10-07
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-09-16
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2022-09-07
No feed summary available yet.
High signal Matched: model
Replicate · inference-infra · 2022-08-31
How to run Stable Diffusion locally so you can hack on it
High signal Matched: gpu
Hugging Face · open-source · 2022-08-12
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-08-11
No feed summary available yet.
High signal Matched: serving
Hugging Face · open-source · 2022-08-03
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-08-01
No feed summary available yet.
High signal Matched: research
Hugging Face · open-source · 2022-07-28
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-07-27
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2022-07-25
No feed summary available yet.
High signal Matched: serving
Hugging Face · open-source · 2022-07-16
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-07-12
No feed summary available yet.
High signal Matched: introducing, model
Replicate · inference-infra · 2022-07-05
Inspired by model cards, we've created templates for documenting models on Replicate.
High signal Matched: model
Hugging Face · open-source · 2022-06-28
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-06-28
No feed summary available yet.
High signal Matched: evaluation
Hugging Face · open-source · 2022-06-07
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-05-26
No feed summary available yet.
High signal Matched: launch
Hugging Face · open-source · 2022-05-25
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-05-19
No feed summary available yet.
High signal Matched: research
Hugging Face · open-source · 2022-05-10
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2022-05-02
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-04-25
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-04-12
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2022-03-28
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2022-03-22
No feed summary available yet.
High signal Matched: research
Hugging Face · open-source · 2022-03-17
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2022-03-16
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2022-03-11
No feed summary available yet.
High signal Matched: generation
Hugging Face · open-source · 2022-03-02
No feed summary available yet.
High signal Matched: model, state of the art
Hugging Face · open-source · 2022-01-13
No feed summary available yet.
High signal Matched: latency
Hugging Face · open-source · 2022-01-11
No feed summary available yet.
High signal Matched: inference, sagemaker
Hugging Face · open-source · 2021-12-15
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2021-12-02
No feed summary available yet.
High signal Matched: introducing, agents
Hugging Face · open-source · 2021-11-29
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2021-11-19
No feed summary available yet.
High signal Matched: distributed, fine-tuning
Hugging Face · open-source · 2021-11-04
No feed summary available yet.
High signal Matched: inference, model
Hugging Face · open-source · 2021-10-25
No feed summary available yet.
High signal Matched: model, training
Hugging Face · open-source · 2021-09-14
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2021-07-08
No feed summary available yet.
High signal Matched: sagemaker
Hugging Face · open-source · 2021-06-03
No feed summary available yet.
High signal Matched: inference, api
Hugging Face · open-source · 2021-04-20
No feed summary available yet.
High signal Matched: inference
Hugging Face · open-source · 2021-04-16
No feed summary available yet.
High signal Matched: introducing
Hugging Face · open-source · 2021-04-08
No feed summary available yet.
High signal Matched: distributed, sagemaker, training, distributed training
Hugging Face · open-source · 2021-03-23
No feed summary available yet.
High signal Matched: sagemaker
Hugging Face · open-source · 2021-03-18
No feed summary available yet.
High signal Matched: cloud
Hugging Face · open-source · 2021-02-10
No feed summary available yet.
High signal Matched: generation, retrieval augmented generation, retrieval
Hugging Face · open-source · 2021-01-18
No feed summary available yet.
High signal Matched: inference, api
Hugging Face · open-source · 2020-11-09
No feed summary available yet.
High signal Matched: model
Hugging Face · open-source · 2020-03-01
No feed summary available yet.
High signal Matched: decoding, generation
Hugging Face · open-source · 2020-02-14
No feed summary available yet.
High signal Matched: model
PyTorch Foundation · open-source · 2026-06-04
TL;DR DeepSpeed now supports Muon Optimizer! Muon Optimizer has gained great momentum with significant adoption from frontier AI Labs. One of those AI Labs is Moonshot AI, which has adopted...
Watchlist Matched: none
Hugging Face · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NVIDIA Dynamo · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mooncake · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
VESSL AI · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
VESSL AI · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moreh · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moreh · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moreh · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moreh · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
KubeAI · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xLLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
DigitalOcean AI/ML · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Gcore · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Perplexity Research · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: agent
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: agentic
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training, agents
Prime Intellect · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: lora
TensorRT-LLM · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
OpenAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
BentoML · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: performance
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: open-source
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
CoreWeave · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebrium · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: agentic
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Nebius · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Crusoe · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Crusoe · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Crusoe · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Crusoe · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Crusoe · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: sdk
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Vast.ai · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LMSYS · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LMSYS · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: agentic
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
FriendliAI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LMSYS · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
FuriosaAI · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: fine-tuning
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Baseten · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LMSYS · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LMSYS · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agentic
Moonshot AI Kimi · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
MiniMax · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Z.AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Z.AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Z.AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Z.AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: evaluate
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agent
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Mistral AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agents
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: eval
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: agentic
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: evals
Anthropic · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
xAI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: api
Groq · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cerebras · hardware · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: retrieval
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Cohere · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Liquid AI · model-lab · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
NAVER D2 · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Kakao Tech · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Upstage · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LG AI Research · korea · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: long context
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: agent
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: evaluating
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Stanford CRFM · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: leaderboard
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: training
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Anyscale · inference-infra · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
GMI Cloud · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
LightSeek Foundation · research · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Databricks AI · big-tech · 2026-06-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-06-03
No feed summary available yet.
Watchlist Matched: mcp
Hugging Face · open-source · 2026-06-02
No feed summary available yet.
Watchlist Matched: agents, computer use
NVIDIA Technical Blog · hardware · 2026-06-02
AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with...
Watchlist Matched: agents
AWS Machine Learning Blog · cloud · 2026-06-02
This post demonstrates how to implement Open Authorization (OAuth) Code flow as an inbound authorization mechanism for MCP servers hosted on Amazon Bedrock AgentCore Gateway. By the end of this guide, you will have a production-ready setup...
Watchlist Matched: bedrock, mcp
NVIDIA Technical Blog · hardware · 2026-06-02
As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized...
Watchlist Matched: agents, agentic
Cloudflare Blog · cloud · 2026-06-02
We investigated why firmware updates were causing our core servers to take four hours to reboot. By diving into UEFI data structures and iPXE automation, we eliminated unnecessary timeouts and cut boot times back down to minutes.
Watchlist Matched: none
AWS Machine Learning Blog · cloud · 2026-06-02
In this post, we walk through a practical implementation using KDB-X MCP server integration with Amazon Quick, demonstrating how traders and analysts can ask questions using conversational language and receive actionable insights from data...
Watchlist Matched: performance, mcp
Hugging Face · open-source · 2026-06-01
No feed summary available yet.
Watchlist Matched: agent
NVIDIA Technical Blog · hardware · 2026-06-01
Developing autonomous vehicle (AV) policies requires bridging an important gap between training and deployment. Vision-language-action (VLA) models that can...
Watchlist Matched: training
NVIDIA Technical Blog · hardware · 2026-06-01
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-06-01
The AI era is driving a new class of infrastructure: AI factories that transform data into intelligence for autonomous AI agents operating at unprecedented...
Watchlist Matched: agents, agentic
NVIDIA Technical Blog · hardware · 2026-06-01
AI is now essential infrastructure, powered by AI factories that generate intelligence in the form of tokens. As demand grows, these factories must scale...
Watchlist Matched: none
Modal · inference-infra · 2026-06-01
What we've seen helping teams run Reinforcement Learning at scale on Modal. Plus an open-source library to skip the scaffolding.
Watchlist Matched: open-source
Sakana AI · model-lab · 2026-06-01
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2026-05-29
Three trends from MLSys 2026
Watchlist Matched: none
Hugging Face · open-source · 2026-05-29
No feed summary available yet.
Watchlist Matched: none
Microsoft Research · big-tech · 2026-05-29
Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into...
Watchlist Matched: research, agents
Sakana AI · model-lab · 2026-05-29
No feed summary available yet.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-28
Here’s how we built Town Lake, Cloudflare's unified analytics platform, alongside Skipper, an internal AI agent running on top of it.
Watchlist Matched: agent
SqueezeBits · korea · 2026-05-28
Wrap up 8 weeks of online studies and take a look at how SqueezeBits makes an effort to maintain the AI compression community to expand!
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-28
Cloudflare Radar data confirms early indications of a partial Internet restoration in Iran, nearly three months after the shutdown began. Traffic spikes and DNS queries have risen, but network activity is currently just 40% of pre-shutdown...
Watchlist Matched: none
Google Research · big-tech · 2026-05-28
Security, Privacy and Abuse Prevention
Watchlist Matched: none
Microsoft Research · big-tech · 2026-05-28
Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research.
Watchlist Matched: research
Sakana AI · model-lab · 2026-05-28
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2026-05-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-05-27
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-05-26
Hi, I'm Jeongwoo, a security platform engineer at LY Corporation developing and operating Athenz.In ...
Watchlist Matched: agent
Hugging Face · open-source · 2026-05-25
No feed summary available yet.
Watchlist Matched: agent
PyTorch Foundation · open-source · 2026-05-23
A little over a year ago, the PyTorch Foundation launched the Ambassador Program, an initiative that recognizes and supports independent, trusted voices in the PyTorch community who are passionate about...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-05-22
High‑quality 3D medical imaging data is the foundation of modern radiology AI, but access to it is often constrained by data scarcity, privacy restrictions,...
Watchlist Matched: none
Microsoft Research · big-tech · 2026-05-22
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks. The p...
Watchlist Matched: performance, research, agentic
Cloudflare Blog · cloud · 2026-05-22
Cloudflare now integrates with the Claude Compliance API, so that security teams can monitor Claude Enterprise activity directly in the Cloudflare Dashboard.
Watchlist Matched: api
Microsoft Research · big-tech · 2026-05-21
Vega turns a full credential into a single proof, sharing only what is needed and nothing more, with performance that works in real apps. The post Vega: Zero-knowledge proofs for digital identity in the age of AI appeared first on Microsof...
Watchlist Matched: performance, research
NVIDIA Technical Blog · hardware · 2026-05-21
In quantitative finance, researchers build algorithms to trade assets, derivatives, and other financial instruments. A key part of that work is finding signals:...
Watchlist Matched: agent
PyTorch Foundation · open-source · 2026-05-21
Thank you to everyone who participated in the PyTorch Docathon 2026! Once again, the community showed up with incredible energy and dedication to make PyTorch documentation better for developers everywhere....
Watchlist Matched: none
AI2 · research · 2026-05-21
PointCheck, an independent project, uses Molmo, MolmoWeb, and Olmo 3 to test web accessibility the way a keyboard user would—by navigating real pages and inspecting what's actually on screen.
Watchlist Matched: none
Lambda · cloud · 2026-05-20
HRT turns to Lambda as on-premise infrastructure reaches its ceiling
Watchlist Matched: research
NVIDIA Technical Blog · hardware · 2026-05-20
Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating...
Watchlist Matched: agent, agents, agentic
NC AI · korea · 2026-05-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-05-20
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2026-05-20
How Applied Compute trains custom agents with Reinforcement Learning for enterprises like DoorDash, Cognition, and Mercor on Modal.
Watchlist Matched: agents
Cloudflare Blog · cloud · 2026-05-19
Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to pr...
Watchlist Matched: agent, agents
Modular · inference-infra · 2026-05-19
How I built a pure Mojo app (and 10 libraries) with AI agents
Watchlist Matched: agents
Hugging Face · open-source · 2026-05-19
No feed summary available yet.
Watchlist Matched: none
AI2 · research · 2026-05-19
OlmoEarth v1.1 is a more efficient family of remote-sensing models that cuts compute costs by up to 3x while maintaining similar performance, making large-scale satellite mapping faster and cheaper to run.
Watchlist Matched: performance
Cloudflare Blog · cloud · 2026-05-18
In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them needs to look like...
Watchlist Matched: none
Hugging Face · open-source · 2026-05-15
No feed summary available yet.
Watchlist Matched: retrieval
Cloudflare Blog · cloud · 2026-05-14
When a partitioning change to our petabyte-scale ClickHouse cluster caused critical billing jobs to stall, standard metrics showed no obvious errors. This post explores how we identified severe lock contention in ClickHouse's query planner...
Watchlist Matched: none
Together AI · inference-infra · 2026-05-14
Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-05-14
No feed summary available yet.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-13
We’ve enabled higher usage limits, faster performance, better reliability, and increased shipping velocity for our Browser Run product by rebuilding on top of Cloudflare’s Containers. Here’s how.
Watchlist Matched: performance
NVIDIA Technical Blog · hardware · 2026-05-13
In today’s data-driven world, organizations increasingly rely on video to capture critical information, yet extracting meaningful, real-time insights from...
Watchlist Matched: agents
NVIDIA Technical Blog · hardware · 2026-05-13
A massive-scale X-ray free-electron laser (XFEL) enables tracking structural and electron dynamics in novel systems, including fusion materials, semiconductors,...
Watchlist Matched: none
Modular · inference-infra · 2026-05-13
Translating to Mojo via AI Agents
Watchlist Matched: agents
Modal · inference-infra · 2026-05-12
A deep dive on Modal's deep tech for fast boots.
Watchlist Matched: none
Microsoft Research · big-tech · 2026-05-12
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoni...
Watchlist Matched: research, agents
vLLM Project · open-source · 2026-05-11
How vLLM built the leading deployments of DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B.
Watchlist Matched: leaderboard
AI2 · research · 2026-05-11
Artificial Analysis uses Ai2’s open IFBench eval because it captures a stubborn, real-world capability many benchmarks miss: whether models can reliably follow complex, multi-part user instructions.
Watchlist Matched: eval, benchmarks
Sakana AI · model-lab · 2026-05-11
No feed summary available yet.
Watchlist Matched: none
Microsoft Research · big-tech · 2026-05-09
Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern...
Watchlist Matched: release, research
Sakana AI · model-lab · 2026-05-09
No feed summary available yet.
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-05-08
An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return...
Watchlist Matched: agentic
Cloudflare Blog · cloud · 2026-05-08
This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloudflare.
Watchlist Matched: none
Lambda · cloud · 2026-05-07
Upsized financing builds on August 2025 credit facility, supporting continued expansion of Lambda's AI factory footprint
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-07
When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no maliciou...
Watchlist Matched: kernel
Modular · inference-infra · 2026-05-07
Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more
Watchlist Matched: none
Hugging Face · open-source · 2026-05-07
No feed summary available yet.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-07
On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here's what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.
Watchlist Matched: serve
AI2 · research · 2026-05-07
Ai2 is bringing NSF OMAI compute online to power a fully open AI research ecosystem, turning national infrastructure investment into reusable models, data, methods, and tools that can accelerate scientific discovery.
Watchlist Matched: research
Hugging Face · open-source · 2026-05-06
No feed summary available yet.
Watchlist Matched: leaderboard
Lambda · cloud · 2026-05-06
Co-founder Stephen Balaban to lead technology vision full-time as CTO; global infrastructure operator Michel Combes named CEO; former AT&T CEO John Donovan appointed Chairman of the Board
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-05-05
Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't...
Watchlist Matched: agents, agentic
AI2 · research · 2026-05-05
MolmoAct 2 is a fully open robotics foundation model that brings faster, stronger 3D action reasoning to real-world robot tasks, alongside a major new bimanual manipulation dataset for researchers to study, reproduce, and build on.
Watchlist Matched: model
NVIDIA Technical Blog · hardware · 2026-05-04
Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making....
Watchlist Matched: agent
Lambda · cloud · 2026-05-04
Consider two teams provisioning 8,192 GPUs for a large training run. Same model, same dataset, same budget. Team A lands on a facility purpose-built for AI with sufficient power density, carefully engineered liquid cooling, a high-performa...
Watchlist Matched: performance, model, training
Modular · inference-infra · 2026-05-04
Modverse #54: AMD AI DevDay, New Modular Offices, and a Community That Keeps Shipping
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-05-02
We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best practices to prevent fut...
Watchlist Matched: none
Google Research · big-tech · 2026-05-02
Data Mining & Modeling
Watchlist Matched: none
AI2 · research · 2026-05-01
Interim CEO Peter Clark shares his thoughts on this moment for Ai2, our commitment to open science, and where the institute is headed next.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-04-30
Cloudflare IPsec now has generally available support for post-quantum encryption via hybrid ML-KEM. We’ve confirmed interoperability with Cisco and Fortinet.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-04-30
Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission,...
Watchlist Matched: agents, api
Lambda · cloud · 2026-04-30
Harnesses If you've used Claude Code or Codex, you've used a harness. A harness is the infrastructure layer that wraps an AI coding agent and decides how it operates, what it can touch, and how you measure whether it worked. It's how most...
Watchlist Matched: gpu, training, post-training, agent, agents, open-source
NVIDIA Technical Blog · hardware · 2026-04-30
Creative and visualization teams today produce more assets, in more formats, with leaner teams. Generative AI can accelerate that work – compressing tasks...
Watchlist Matched: none
Together AI · inference-infra · 2026-04-30
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2026-04-30
Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.
Watchlist Matched: fine-tuning, evaluate
Hugging Face · open-source · 2026-04-30
No feed summary available yet.
Watchlist Matched: none
AI2 · research · 2026-04-30
AstaBench’s latest update adds new frontier-model results, including GPT-5.5, and highlights growing adoption from groups including the UK AISI, General Reasoning, Elicit, SciSpace, Distyl AI, and EvoScientist.
Watchlist Matched: model, frontier-model
Sakana AI · model-lab · 2026-04-30
No feed summary available yet.
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-29
The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and...
Watchlist Matched: agentic
AI2 · research · 2026-04-29
MolmoPoint and MolmoWeb extend the Molmo family from visual understanding to visual action, giving researchers open tools for models that can point, navigate, and interact with the world they see.
Watchlist Matched: none
Sakana AI · model-lab · 2026-04-29
No feed summary available yet.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-04-28
The first quarter of 2026 saw a surge in Internet disruptions, from nationwide shutdowns in Uganda and Iran to unprecedented drone strikes on cloud infrastructure. We explore the data behind these events using Cloudflare Radar.
Watchlist Matched: cloud
NVIDIA Technical Blog · hardware · 2026-04-28
The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential...
Watchlist Matched: agentic
LY Corporation Tech Blog · korea · 2026-04-28
Hello. I'm Ki-cheol Cheon, a site reliability engineer (SRE) on the Service Reliability team. Our SR...
Watchlist Matched: none
Hugging Face · open-source · 2026-04-27
No feed summary available yet.
Watchlist Matched: none
Sakana AI · model-lab · 2026-04-27
No feed summary available yet.
Watchlist Matched: agents
Sakana AI · model-lab · 2026-04-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-04-24
No feed summary available yet.
Watchlist Matched: agents
vLLM Project · open-source · 2026-04-24
A first-principles walkthrough of DeepSeek V4's long-context attention, and how we implemented it in vLLM.
Watchlist Matched: long-context
NVIDIA Technical Blog · hardware · 2026-04-23
In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground...
Watchlist Matched: agents
Hugging Face · open-source · 2026-04-23
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2026-04-23
Generative AI
Watchlist Matched: none
AI2 · research · 2026-04-23
OlmPool is a controlled suite of 26 models showing how small architecture choices can compound to make long-context extension much harder, even when training data and extension recipes are held constant.
Watchlist Matched: training, long context, long-context
NVIDIA Technical Blog · hardware · 2026-04-22
In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater...
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-04-22
Panics in Rust Workers were historically fatal, poisoning the entire instance. By collaborating upstream on the wasm‑bindgen project, Rust Workers now support resilient critical error recovery, including panic unwinding using WebAssembly E...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-22
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved...
Watchlist Matched: training
Google Research · big-tech · 2026-04-22
Generative AI
Watchlist Matched: agents
AI2 · research · 2026-04-22
For the past 10 years, Ai2 has built open, real-time tools that help people protect wildlife, oceans, and ecosystems around the world.
Watchlist Matched: none
Cloudflare Blog · cloud · 2026-04-21
As AI assistants and privacy proxies challenge the capabilities of traditional bot detection, the Web needs new models for accountability. We believe that control should remain with the client, and that an open ecosystem of anonymous crede...
Watchlist Matched: none
Hugging Face · open-source · 2026-04-21
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2026-04-21
No feed summary available yet.
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-20
The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these...
Watchlist Matched: open source
Cloudflare Blog · cloud · 2026-04-20
Agents Week 2026 is a wrap. Let’s take a look at everything we announced, from compute and security to the agent toolbox, platform tools, and the emerging agentic web. Everything we shipped for the agentic cloud.
Watchlist Matched: cloud, agent, agents, agentic
AI2 · research · 2026-04-20
BAR is a recipe for post-training language models one capability at a time—train domain experts independently, merge them into a single mixture-of-experts model, and upgrade any expert without impacting the others.
Watchlist Matched: model, training, post-training
NVIDIA Technical Blog · hardware · 2026-04-17
Agents are evolving from question-and-answer systems into long-running autonomous assistants that read files, call APIs, and drive multi-step workflows....
Watchlist Matched: agent, agents
LY Corporation Tech Blog · korea · 2026-04-17
As of 2026, the AI paradigm is steadily shifting from mere chat interfaces to action-centric executi...
Watchlist Matched: agent
NVIDIA Technical Blog · hardware · 2026-04-17
The development of socially acceptable nuclear reactors requires that they are safe, clean, efficient, economical, and sustainable. Meeting these requirements...
Watchlist Matched: none
Google Research · big-tech · 2026-04-16
Generative AI
Watchlist Matched: none
Google Research · big-tech · 2026-04-16
General Science
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-04-16
Hello, I'm Jeongwoo, a security platform engineer at LY Corporation.I am responsible for developing ...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-16
Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code,...
Watchlist Matched: agents
Modular · inference-infra · 2026-04-16
How Frontier Coding Agents Built a Video Diffusion Pipeline on MAX
Watchlist Matched: agents
Hugging Face · open-source · 2026-04-16
No feed summary available yet.
Watchlist Matched: agents
Hugging Face · open-source · 2026-04-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-04-16
No feed summary available yet.
Watchlist Matched: training, finetuning
Hugging Face · open-source · 2026-04-15
No feed summary available yet.
Watchlist Matched: agents, tool use
Hugging Face · open-source · 2026-04-15
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2026-04-15
If you have never tried a video model before, now is the time.
Watchlist Matched: model
Modal · inference-infra · 2026-04-15
Modal is an official sandbox provider for the OpenAI Agents SDK.
Watchlist Matched: agents, sdk
NVIDIA Technical Blog · hardware · 2026-04-14
For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high...
Watchlist Matched: none
Google Research · big-tech · 2026-04-14
Education Innovation
Watchlist Matched: none
Together AI · inference-infra · 2026-04-13
EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on open math problems — including pushing the kissing number lower bound...
Watchlist Matched: agents
AI2 · research · 2026-04-13
Two benchmarks developed at Ai2 – ScienceWorld and DiscoveryWorld – reveal that even incredibly strong AI science agents struggle with problems human scientists solve routinely.
Watchlist Matched: evaluating, benchmarks, agents
NC AI · korea · 2026-04-10
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-04-10
Hello. I'm Dahee Eo, a site reliability engineer (SRE). Our team is responsible for Media Platform S...
Watchlist Matched: none
NC AI · korea · 2026-04-10
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2026-04-10
Modular Opens Edinburgh & San Francisco Offices
Watchlist Matched: none
Modal · inference-infra · 2026-04-10
Butter, a San Francisco-based AI sandbox technology, is joining Modal.
Watchlist Matched: none
Google Research · big-tech · 2026-04-09
Generative AI
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-09
Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming...
Watchlist Matched: none
Hugging Face · open-source · 2026-04-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-04-09
No feed summary available yet.
Watchlist Matched: none
NC AI · korea · 2026-04-08
No feed summary available yet.
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-04-08
Physical AI—AI systems that perceive, reason, and act in physically grounded simulated environments—is changing how teams design and validate robots and...
Watchlist Matched: none
Hugging Face · open-source · 2026-04-08
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2026-04-07
Product updates, community highlights, and upcoming events.
Watchlist Matched: blackwell, api
LY Corporation Tech Blog · korea · 2026-04-06
I work in the LINE Official Account (OA) team as an Android developer. One of our jobs is to maintai...
Watchlist Matched: none
Google Research · big-tech · 2026-04-03
Generative AI
Watchlist Matched: evaluating
Modular · inference-infra · 2026-04-03
Structured Mojo Kernels Part 4 - Portability and the Road Ahead
Watchlist Matched: none
Hugging Face · open-source · 2026-04-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-04-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-04-01
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2026-04-01
Algorithms & Theory
Watchlist Matched: benchmarks
Hugging Face · open-source · 2026-04-01
No feed summary available yet.
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-03-31
Delivering high-fidelity VR and AR experiences to enterprise users has typically required native application development, custom device management, and complex...
Watchlist Matched: none
Hugging Face · open-source · 2026-03-31
No feed summary available yet.
Watchlist Matched: training
Google Research · big-tech · 2026-03-31
Algorithms & Theory
Watchlist Matched: none
Modular · inference-infra · 2026-03-31
Modverse #54: From GTC to Edinburgh, a Community Building Momentum
Watchlist Matched: none
Hugging Face · open-source · 2026-03-31
No feed summary available yet.
Watchlist Matched: training, post-training
vLLM Project · open-source · 2026-03-30
PR #33736 (included in vllm>=v0.18.0) introduced a new hidden states extraction system to vLLM. This blog post explores the motivation, design, usage, and future direction of this feature, and its...
Watchlist Matched: none
SqueezeBits · korea · 2026-03-27
Sharing GTC 2026 insights, which is the Largest AI Industry Conference for developers! If you’ve ever wondered what it’s like for an AI startup to run a booth at such a massive event, you won’t want to miss this!
Watchlist Matched: none
Hugging Face · open-source · 2026-03-27
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2026-03-26
Structured Mojo Kernels Part 3 - Composition in Practice
Watchlist Matched: none
Google Research · big-tech · 2026-03-25
Human-Computer Interaction and Visualization
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-03-25
In the current state of automotive radar, machine learning engineers can't work with camera-equivalent raw RGB images. Instead, they work with the output of...
Watchlist Matched: none
Google Research · big-tech · 2026-03-25
Algorithms & Theory
Watchlist Matched: none
Google Research · big-tech · 2026-03-25
Algorithms & Theory
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-03-24
Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,...
Watchlist Matched: rag, retrieval, agents, agentic
Hugging Face · open-source · 2026-03-24
No feed summary available yet.
Watchlist Matched: evaluating, agents
AI2 · research · 2026-03-24
Introducing MolmoWeb, an open visual web agent that navigates and completes tasks in a browser using screenshots alone, along with MolmoWebMix, the largest public dataset for training web agents.
Watchlist Matched: introducing, training, agent, agents
AI2 · research · 2026-03-23
A recap of Ai2's week at NVIDIA GTC 2026, covering panels on open models, live demos of Olmo Hybrid and Asta AutoDiscovery, and conversations on coding agents, hybrid architectures, and robotics.
Watchlist Matched: agents
NVIDIA Technical Blog · hardware · 2026-03-18
While consumer AI offers powerful capabilities, workplace tools often suffer from disjointed data and limited context. Built with LangChain, the NVIDIA AI-Q...
Watchlist Matched: agents
LY Corporation Tech Blog · korea · 2026-03-18
This article was originally published on the pre-merger blog (first published on February 24, 2022) ...
Watchlist Matched: none
Google Research · big-tech · 2026-03-18
Health & Bioscience
Watchlist Matched: none
Hugging Face · open-source · 2026-03-18
No feed summary available yet.
Watchlist Matched: open source
NVIDIA Technical Blog · hardware · 2026-03-17
AI-native services are exposing a new bottleneck in AI infrastructure: As millions of users, agents, and devices demand access to intelligence, the challenge is...
Watchlist Matched: agents
NVIDIA Technical Blog · hardware · 2026-03-16
Healthcare faces a structural demand–capacity crisis: a projected global shortfall of ~10 million clinicians by 2030, billions of diagnostic exams annually...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-03-16
Autonomous AI agents are driving the next wave of AI innovation. These agents must often manage long-running tasks that use multiple communication channels and...
Watchlist Matched: agents
NVIDIA Technical Blog · hardware · 2026-03-16
Building AI factories is complex and requires efficient integration across compute, networking, security, and storage systems. To achieve rapid Time to AI and...
Watchlist Matched: none
NVIDIA Technical Blog · hardware · 2026-03-16
AI has evolved from assistants following your directions to agents that act independently. Called claws, these agents can take a goal, figure out how to achieve...
Watchlist Matched: agents
NVIDIA Technical Blog · hardware · 2026-03-16
Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past year, token consumption has grown...
Watchlist Matched: agent
NVIDIA Technical Blog · hardware · 2026-03-16
Physics forms the foundation of robotic simulation, enabling realistic modeling of motion and interaction. For tasks like locomotion and manipulation,...
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-03-13
Hello, I'm Munetoshi Ishikawa, a mobile client developer for the LINE messaging app.This article is ...
Watchlist Matched: none
Google Research · big-tech · 2026-03-12
Climate & Sustainability
Watchlist Matched: none
Google Research · big-tech · 2026-03-12
Generative AI
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-03-11
In November 2025, mobile engineers from our Tokyo and Ho Chi Minh City (HCMC) Development Centers ca...
Watchlist Matched: distributed
Modular · inference-infra · 2026-03-11
Structured Mojo Kernels Part 2 - The Three Pillars
Watchlist Matched: none
AI2 · research · 2026-03-11
MolmoBot is an open robotic manipulation model suite trained entirely in simulation—demonstrating zero-shot transfer to real-world robots without any real-world data collection or fine-tuning.
Watchlist Matched: model, training, fine-tuning
AI2 · research · 2026-03-11
Introducing MolmoBot and MolmoSpaces, an open foundation for training real-world robots to advance science.
Watchlist Matched: introducing, training
Hugging Face · open-source · 2026-03-10
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-03-09
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2026-03-09
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2026-03-07
Natural Language Processing
Watchlist Matched: none
Google Research · big-tech · 2026-03-07
Climate & Sustainability
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-03-06
The original article was published on April 24, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client ...
Watchlist Matched: none
Hugging Face · open-source · 2026-03-05
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2026-03-05
Generative AI
Watchlist Matched: none
SkyPilot · open-source · 2026-03-03
SkyPilot Job Groups let you define heterogeneous RL workloads in a single YAML. Run your PPO trainer on beefy H100s, rollout servers on cheap T4s, and replay buffers on high-memory CPUs, all as one managed job.
Watchlist Matched: none
NC AI · korea · 2026-02-27
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2026-02-25
No feed summary available yet.
Watchlist Matched: training, agents, sota
AI2 · research · 2026-02-25
PreScience is a new benchmark that evaluates whether AI can forecast how science unfolds end-to-end, from team formation through eventual impact.
Watchlist Matched: benchmark
Replicate · inference-infra · 2026-02-24
Seedream 5.0 brings multi-step reasoning, example-based editing, and deep domain knowledge to image generation. Here's what you should know.
Watchlist Matched: generation
LY Corporation Tech Blog · korea · 2026-02-20
The original article was published on April 17, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client ...
Watchlist Matched: none
Hugging Face · open-source · 2026-02-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-02-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-02-19
No feed summary available yet.
Watchlist Matched: agents
Modal · inference-infra · 2026-02-19
No feed summary available yet.
Watchlist Matched: agent
Modular · inference-infra · 2026-02-18
The Claude C Compiler: What It Reveals About the Future of Software
Watchlist Matched: none
Hugging Face · open-source · 2026-02-18
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2026-02-18
Machine Perception
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-02-13
The original article was published on April 10, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client ...
Watchlist Matched: none
Hugging Face · open-source · 2026-02-13
No feed summary available yet.
Watchlist Matched: none
AI2 · research · 2026-02-13
Olmix is a framework for language model data mixing that provides empirically grounded defaults and efficient reuse techniques.
Watchlist Matched: model
Hugging Face · open-source · 2026-02-12
No feed summary available yet.
Watchlist Matched: evaluating, agents
Google Research · big-tech · 2026-02-11
Human-Computer Interaction and Visualization
Watchlist Matched: none
Modal · inference-infra · 2026-02-11
GLM-5.1 establishes a new SotA for open models. Try it free today.
Watchlist Matched: sota
SkyPilot · open-source · 2026-02-10
Moving from Slurm to Kubernetes doesn't have to mean losing the workflow you know. Here's how SkyPilot brings Slurm-like simplicity to K8s.
Watchlist Matched: none
Modular · inference-infra · 2026-02-10
BentoML Joins Modular
Watchlist Matched: none
Google Research · big-tech · 2026-02-10
Climate & Sustainability
Watchlist Matched: none
NC AI · korea · 2026-02-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-02-09
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-02-06
Hello, I’m Young Hee Park from the Cloud Service CBU, where I’m responsible for the private cloud th...
Watchlist Matched: cloud
LY Corporation Tech Blog · korea · 2026-02-06
The original article was published on April 3, 2025.Hello, I'm Yūdai Takanashi, a mobile client deve...
Watchlist Matched: none
Google Research · big-tech · 2026-02-05
Education Innovation
Watchlist Matched: none
Modular · inference-infra · 2026-02-05
The Five Eras of KVCache
Watchlist Matched: none
Google Research · big-tech · 2026-02-05
Algorithms & Theory
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-02-04
Hello. My name is Gi Jun Oh, and I am responsible for the development and operation of the in-house ...
Watchlist Matched: none
Together AI · inference-infra · 2026-02-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-02-04
No feed summary available yet.
Watchlist Matched: evals
Google Research · big-tech · 2026-02-04
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2026-02-04
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-02-03
No feed summary available yet.
Watchlist Matched: training
Together AI · inference-infra · 2026-02-03
Hiring Alon Gavrielov further deepens Together AI’s commitment to building AI factories that deliver the most reliable, efficient, and scalable infrastructure for AI-native teams.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-01-30
The original article was published on March 27, 2025.Hello, I'm Masakuni Ōishi, an engineer working ...
Watchlist Matched: none
NC AI · korea · 2026-01-29
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2026-01-29
Modular 26.1: A Big Step Towards More Programmable and Portable AI Infrastructure
Watchlist Matched: none
Google Research · big-tech · 2026-01-28
Generative AI
Watchlist Matched: agent
Google Research · big-tech · 2026-01-28
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2026-01-28
No feed summary available yet.
Watchlist Matched: open-source
Modal · inference-infra · 2026-01-28
All the latest news across product, content, community, and events this month.
Watchlist Matched: none
Hugging Face · open-source · 2026-01-27
No feed summary available yet.
Watchlist Matched: training, agentic, oss
LY Corporation Tech Blog · korea · 2026-01-23
The original article was published on March 19, 2025.Hello, I'm Yūdai Takanashi, a mobile client dev...
Watchlist Matched: none
Google Research · big-tech · 2026-01-23
Generative AI
Watchlist Matched: none
vLLM Project · open-source · 2026-01-23
We are working on building the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems.
Watchlist Matched: none
SkyPilot · open-source · 2026-01-22
Mount Kubernetes PVCs to your clusters for 10-100x faster data access with persistent storage that survives across job lifecycles.
Watchlist Matched: none
Hugging Face · open-source · 2026-01-21
No feed summary available yet.
Watchlist Matched: benchmarks, agent
Hugging Face · open-source · 2026-01-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2026-01-20
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-01-16
The original article was published on March 13, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client ...
Watchlist Matched: none
Google Research · big-tech · 2026-01-16
Health & Bioscience
Watchlist Matched: none
Hugging Face · open-source · 2026-01-15
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2026-01-15
Learn how Chai Discovery moves seamlessly from ML experimentation to production antibody pipelines with Modal.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-01-14
Introduction: what are guardrails?Various mechanisms for making AI more safe to use are commonly ref...
Watchlist Matched: cost
Google Research · big-tech · 2026-01-14
Algorithms & Theory
Watchlist Matched: none
Google Research · big-tech · 2026-01-14
Quantum
Watchlist Matched: none
Google Research · big-tech · 2026-01-13
Climate & Sustainability
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-01-09
The original article was published on March 6, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client d...
Watchlist Matched: none
Hugging Face · open-source · 2026-01-06
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2026-01-05
This post is a follow-up to Creating a domain-specific NL-to-SQL MCP server, which introduced our MC...
Watchlist Matched: agent, mcp
Hugging Face · open-source · 2026-01-05
No feed summary available yet.
Watchlist Matched: agents
LY Corporation Tech Blog · korea · 2026-01-02
The original article was published on February 27, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Modal · inference-infra · 2025-12-28
How we do active and passive monitoring on hyperscalers and neoclouds.
Watchlist Matched: none
vLLM Project · open-source · 2025-12-27
For a long time, vllm.ai simply redirected to the vLLM GitHub page. Thanks to our community, we now have a brand-new vllm.ai website, drawing inspiration from the PyTorch website.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-12-26
The original article was published on February 20, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Hugging Face · open-source · 2025-12-23
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-12-19
The original article was published on February 13, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Modular · inference-infra · 2025-12-19
🔥 Modular 2025 Year in Review
Watchlist Matched: none
Together AI · inference-infra · 2025-12-18
Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.
Watchlist Matched: none
Hugging Face · open-source · 2025-12-18
No feed summary available yet.
Watchlist Matched: none
SkyPilot · open-source · 2025-12-17
Train a tool-calling agent with VeRL and use SkyPilot to scale it up with independent RL trainer and env rollout
Watchlist Matched: agent
Hugging Face · open-source · 2025-12-16
No feed summary available yet.
Watchlist Matched: agents
vLLM Project · open-source · 2025-12-14
Your LLM just called a tool, received accurate data, and still got the answer wrong. Welcome to the world of extrinsic hallucination—where models confidently ignore the ground truth sitting right...
Watchlist Matched: none
Google Research · big-tech · 2025-12-12
Conferences & Events
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-12-12
The original article was published on February 6, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Together AI · inference-infra · 2025-12-12
No feed summary available yet.
Watchlist Matched: sdk
Hugging Face · open-source · 2025-12-11
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-12-11
Generative AI
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-12-05
The original article was published on January 30, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Modular · inference-infra · 2025-12-05
The path to Mojo 1.0
Watchlist Matched: none
Google Research · big-tech · 2025-12-05
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-12-04
No feed summary available yet.
Watchlist Matched: agent
Hugging Face · open-source · 2025-12-04
No feed summary available yet.
Watchlist Matched: open source
Modular · inference-infra · 2025-12-03
Modverse #52: Advancing AI Together — Community Projects & Platform Milestones
Watchlist Matched: none
NC AI · korea · 2025-12-02
No feed summary available yet.
Watchlist Matched: none
NC AI · korea · 2025-12-01
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-11-28
IntroductionEnterprise data analysis faces a fundamental challenge: the gap between business questio...
Watchlist Matched: mcp
LY Corporation Tech Blog · korea · 2025-11-28
The original article was published on January 23, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Replicate · inference-infra · 2025-11-26
Isaac 0.1 is a lightweight, grounded vision-language model built for real-world perception.
Watchlist Matched: model
Replicate · inference-infra · 2025-11-25
FLUX.2 brings professional-grade image generation and editing with unprecedented detail, multi-reference support, and enterprise efficiency.
Watchlist Matched: generation
Hugging Face · open-source · 2025-11-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-11-25
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-11-21
The original article was published on January 16, 2025.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Hugging Face · open-source · 2025-11-21
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2025-11-21
No feed summary available yet.
Watchlist Matched: leaderboard
Modal · inference-infra · 2025-11-20
Turns out, good devex for agents looks a lot like good devex for humans.
Watchlist Matched: agents
Replicate · inference-infra · 2025-11-20
Nano Banana Pro brings powerful new capabilities in image generation and editing. Here are the main prompt tricks you should know.
Watchlist Matched: generation
Google Research · big-tech · 2025-11-19
Algorithms & Theory
Watchlist Matched: none
Hugging Face · open-source · 2025-11-19
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-11-19
Generate game assets, sprites, tiles, and pixel art with Retro Diffusion's suite of carefully crafted models.
Watchlist Matched: none
Google Research · big-tech · 2025-11-19
Generative AI
Watchlist Matched: none
NC AI · korea · 2025-11-17
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-11-17
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-11-14
The original article was published on January 9, 2025.Hello, I'm Munetoshi Ishikawa, a mobile client...
Watchlist Matched: none
Hugging Face · open-source · 2025-11-14
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-11-14
Climate & Sustainability
Watchlist Matched: none
Google Research · big-tech · 2025-11-13
Algorithms & Theory
Watchlist Matched: none
Google Research · big-tech · 2025-11-13
Algorithms & Theory
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-11-07
The original article was published on November 28, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Google Research · big-tech · 2025-11-07
Data Mining & Modeling
Watchlist Matched: agent
Modular · inference-infra · 2025-11-06
PyTorch and LLVM in 2025 — Keeping up With AI Innovation
Watchlist Matched: none
Google Research · big-tech · 2025-11-06
Climate & Sustainability
Watchlist Matched: none
Google Research · big-tech · 2025-11-05
General Science
Watchlist Matched: none
BAIR · research · 2025-11-01
In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalabilit...
Watchlist Matched: benchmark, performance, model, paper, training
Modal · inference-infra · 2025-10-31
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-31
The original article was published on November 21, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Google Research · big-tech · 2025-10-30
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-10-30
No feed summary available yet.
Watchlist Matched: agent
Google Research · big-tech · 2025-10-30
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-10-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-28
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-10-28
Test AI agents in the real world with Collinear TraitMix and Together Evals: dynamic persona simulations, multi-turn dialogs, and LLM-as-judge scoring.
Watchlist Matched: evals, agent, agents
Hugging Face · open-source · 2025-10-28
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-10-28
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-10-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-27
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-24
The original article was published on November 14, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Hugging Face · open-source · 2025-10-24
No feed summary available yet.
Watchlist Matched: oss
Google Research · big-tech · 2025-10-23
Climate & Sustainability
Watchlist Matched: none
Google Research · big-tech · 2025-10-23
Quantum
Watchlist Matched: none
Hugging Face · open-source · 2025-10-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-21
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-10-21
Turn whole documents into markdown or grab line-level polygons with two new models from Datalab.
Watchlist Matched: none
Google Research · big-tech · 2025-10-20
General Science
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-20
At LY Corporation we're constantly working to improve our pre-release test process and reduce the ri...
Watchlist Matched: release
LY Corporation Tech Blog · korea · 2025-10-17
The original article was published on November 7, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Hugging Face · open-source · 2025-10-17
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-10-17
General Science
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-16
The original article was published on October 31, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Replicate · inference-infra · 2025-10-16
Google's Veo 3.1 brings powerful new video generation capabilities including reference images, first/last frame control, and enhanced image-to-video. Here's everything you need to know.
Watchlist Matched: generation
Hugging Face · open-source · 2025-10-15
No feed summary available yet.
Watchlist Matched: none
SkyPilot · open-source · 2025-10-14
Want to train an AI agent with RL that can solve math problems or write code? This tutorial walks you through building your own math and coding agents with step-by-step examples with plenty of screenshots to help you along the way. We use...
Watchlist Matched: training, post-training, agent, agents
Hugging Face · open-source · 2025-10-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-10-11
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-10-09
Generative AI
Watchlist Matched: none
Google Research · big-tech · 2025-10-08
Machine Intelligence
Watchlist Matched: retrieval
Hugging Face · open-source · 2025-10-07
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-02
Hello! I’m Yoshidumi from developer relations (DevRel), and I oversaw Tech Week 2025.Tech Week 2025,...
Watchlist Matched: none
Hugging Face · open-source · 2025-10-02
No feed summary available yet.
Watchlist Matched: sota
Replicate · inference-infra · 2025-10-02
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-10-01
LY Corporation recently hosted its tech conference, "Tech-Verse 2025", focusing on artificial intell...
Watchlist Matched: none
Google Research · big-tech · 2025-09-30
Generative AI
Watchlist Matched: agent
Modal · inference-infra · 2025-09-29
We’re excited to announce that we have raised more than $80M in a Series B round, led by Lux Capital. Our post-money valuation is $1.1B.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-29
No feed summary available yet.
Watchlist Matched: agent
Hugging Face · open-source · 2025-09-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-26
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-09-26
Asynchrony, fast approximate exponents, and 10x more efficient softmax.
Watchlist Matched: none
Google Research · big-tech · 2025-09-25
Generative AI
Watchlist Matched: none
Modular · inference-infra · 2025-09-24
Modular Raises $250M to scale AI's Unified Compute Layer
Watchlist Matched: none
Google Research · big-tech · 2025-09-24
Generative AI
Watchlist Matched: none
SkyPilot · open-source · 2025-09-23
How to build production vector search with RedisVL and SkyPilot: 1M documents indexed for $0.85, sub-100ms queries, no Kubernetes required.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-23
No feed summary available yet.
Watchlist Matched: training, post-training, agents, computer use
Hugging Face · open-source · 2025-09-22
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-09-22
We're excited to welcome Justin Dignelli to Modal. As VP of Sales, he will be leading our GTM efforts.
Watchlist Matched: none
Modular · inference-infra · 2025-09-22
Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple
Watchlist Matched: none
Hugging Face · open-source · 2025-09-22
No feed summary available yet.
Watchlist Matched: agents
Modal · inference-infra · 2025-09-22
Modal Sandboxes: impeccable vibes meet incredible scale.
Watchlist Matched: none
Google Research · big-tech · 2025-09-20
Machine Intelligence
Watchlist Matched: none
Modal · inference-infra · 2025-09-19
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-09-19
The original article was published on October 24, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: performance
Modular · inference-infra · 2025-09-19
Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects
Watchlist Matched: none
Google Research · big-tech · 2025-09-19
Human-Computer Interaction and Visualization
Watchlist Matched: agent, agents
Hugging Face · open-source · 2025-09-18
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-09-18
Algorithms & Theory
Watchlist Matched: none
Google Research · big-tech · 2025-09-17
Education Innovation
Watchlist Matched: none
Hugging Face · open-source · 2025-09-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-15
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-09-12
Generative AI
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-09-12
The original article was published on October 17, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Google Research · big-tech · 2025-09-12
Health & Bioscience
Watchlist Matched: none
Hugging Face · open-source · 2025-09-11
No feed summary available yet.
Watchlist Matched: oss
Hugging Face · open-source · 2025-09-11
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-09-10
Together AI expands Fine-Tuning Platform: train 100B+ models, extend context lengths, integrate with Hugging Face Hub, and access new DPO options.
Watchlist Matched: dpo, fine-tuning
Together AI · inference-infra · 2025-09-10
Hiring Mahadev Konar further deepens Together AI’s commitment to deliver the most reliable and scalable GPU infrastructure.
Watchlist Matched: gpu
Hugging Face · open-source · 2025-09-10
No feed summary available yet.
Watchlist Matched: training, agents
LY Corporation Tech Blog · korea · 2025-09-09
Hello, I'm Heewoong Park, a machine learning (ML) engineer at the AI Services Lab team. Our team dev...
Watchlist Matched: none
Hugging Face · open-source · 2025-09-09
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-09-05
The original article was published on October 10, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clien...
Watchlist Matched: none
Hugging Face · open-source · 2025-09-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-09-02
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-08-29
The original article was published on October 3, 2024.Hello, I'm Munetoshi Ishikawa, a mobile client...
Watchlist Matched: none
Google Research · big-tech · 2025-08-28
Education Innovation
Watchlist Matched: none
Google Research · big-tech · 2025-08-26
Generative AI
Watchlist Matched: evaluating
LY Corporation Tech Blog · korea · 2025-08-22
The original article was published on September 26, 2024.Hello, I'm Munetoshi Ishikawa, a mobile cli...
Watchlist Matched: none
Google Research · big-tech · 2025-08-22
Generative AI
Watchlist Matched: none
Modular · inference-infra · 2025-08-21
Modverse #50: Modular Platform 25.5, Community Meetups, and Mojo's Debut in the Stack Overflow Developer Survey
Watchlist Matched: none
Hugging Face · open-source · 2025-08-21
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-08-21
Algorithms & Theory
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-08-20
Hello. I'm Sumin Shin, a developer working on services related to LLM agents at LINE AI LAB, LINE Pl...
Watchlist Matched: agents
Hugging Face · open-source · 2025-08-19
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-08-15
The original article was published on September 12, 2024.Hello, I'm Munetoshi Ishikawa, a mobile cli...
Watchlist Matched: none
Google Research · big-tech · 2025-08-15
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-08-14
No feed summary available yet.
Watchlist Matched: none
LY Corporation Tech Blog · korea · 2025-08-14
Hello. I'm Jeonghoon Kim from the Redis team at LINE Plus. From July 2nd to 4th, I participated in t...
Watchlist Matched: none
Hugging Face · open-source · 2025-08-13
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-08-13
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-08-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-08-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-08-12
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-08-11
No feed summary available yet.
Watchlist Matched: oss
Replicate · inference-infra · 2025-08-10
Use our MCP to discover, compare, and run models from apps like Claude, Cursor, and VS Code.
Watchlist Matched: mcp
LY Corporation Tech Blog · korea · 2025-08-08
The original article was published on September 5, 2024.Hello, I'm Munetoshi Ishikawa, a mobile clie...
Watchlist Matched: none
Google Research · big-tech · 2025-08-07
Human-Computer Interaction and Visualization
Watchlist Matched: training
Google Research · big-tech · 2025-08-07
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-08-05
No feed summary available yet.
Watchlist Matched: open-source
Google Research · big-tech · 2025-08-01
Machine Intelligence
Watchlist Matched: agent
LY Corporation Tech Blog · korea · 2025-08-01
The original article was published on August 29, 2024.Hello, I'm Munetoshi Ishikawa, a mobile client...
Watchlist Matched: none
Replicate · inference-infra · 2025-08-01
You'll be surprised what you can do with AI video now.
Watchlist Matched: none
Replicate · inference-infra · 2025-07-31
Wan 2.2 is our fastest, cheapest video model.
Watchlist Matched: model, open source
Hugging Face · open-source · 2025-07-31
No feed summary available yet.
Watchlist Matched: mcp
Google Research · big-tech · 2025-07-30
Generative AI
Watchlist Matched: none
Together AI · inference-infra · 2025-07-29
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-07-29
Generative AI
Watchlist Matched: none
Hugging Face · open-source · 2025-07-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-25
No feed summary available yet.
Watchlist Matched: none
Google Research · big-tech · 2025-07-24
Generative AI
Watchlist Matched: none
Modal · inference-infra · 2025-07-24
AI code sandboxes are seeing an explosion of adoption as the volume of LLM-generated code in the world grows.
Watchlist Matched: none
Modal · inference-infra · 2025-07-23
How we transcribed one week of audio in one minute for under one dollar.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-22
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-07-21
We compare the best image models for generating consistent characters from a single reference image.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-18
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-07-17
We've partnered with Bria to bring a suite of commercial-grade image generation and editing models to Replicate. Built entirely on licensed data, Bria’s tools are designed for enterprises and developers building safely with visual AI.
Watchlist Matched: generation
Hugging Face · open-source · 2025-07-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-17
No feed summary available yet.
Watchlist Matched: mcp
Hugging Face · open-source · 2025-07-17
No feed summary available yet.
Watchlist Matched: evaluating, agents
Modular · inference-infra · 2025-07-16
AI Agents for AWS Marketplace
Watchlist Matched: agents
Hugging Face · open-source · 2025-07-16
No feed summary available yet.
Watchlist Matched: sota
Replicate · inference-infra · 2025-07-16
A deep-dive into the Taylor Seer optimization technique
Watchlist Matched: none
Hugging Face · open-source · 2025-07-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-10
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-07-10
Jamsocket, a backend platform for building sync engines, is joining Modal.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-10
No feed summary available yet.
Watchlist Matched: agent
Hugging Face · open-source · 2025-07-10
No feed summary available yet.
Watchlist Matched: mcp
Modular · inference-infra · 2025-07-09
Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend
Watchlist Matched: none
Hugging Face · open-source · 2025-07-09
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-07-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-09
No feed summary available yet.
Watchlist Matched: mcp
Together AI · inference-infra · 2025-07-08
Build and deploy AI with peace of mind—Together AI is now SOC 2 Type 2 certified, proving our encryption, access controls, and 24/7 monitoring meet the highest security standards.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-08
No feed summary available yet.
Watchlist Matched: long-context
Hugging Face · open-source · 2025-07-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-07-08
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-07-07
During a single weekend event, Lovable users built 250,000 new applications, all running in isolated development environments. Lovable used Modal to generate 1 million code sandboxes—with 20,000 running concurrently at peak—over just 48 ho...
Watchlist Matched: none
Modular · inference-infra · 2025-07-03
Inside Modular Hack Weekend: Top Projects and Community Highlights
Watchlist Matched: none
Together AI · inference-infra · 2025-07-02
No feed summary available yet.
Watchlist Matched: training, agent
Hugging Face · open-source · 2025-07-01
No feed summary available yet.
Watchlist Matched: training, finetuning
Replicate · inference-infra · 2025-07-01
We hosted a hackathon with BFL for FLUX.1 Kontext. Here were the winners.
Watchlist Matched: none
Modal · inference-infra · 2025-06-30
Quora is building Poe, a platform where anyone can deploy a public AI chatbot. Quora uses Modal Sandboxes at scale to safely run LLM-generated code in the context of user chats.
Watchlist Matched: none
Hugging Face · open-source · 2025-06-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-06-26
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-06-23
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-06-20
How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)
Watchlist Matched: none
Hugging Face · open-source · 2025-06-19
No feed summary available yet.
Watchlist Matched: fine-tuning, lora
Modular · inference-infra · 2025-06-18
Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In
Watchlist Matched: none
Together AI · inference-infra · 2025-06-12
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-06-12
Build a data scientist agent using Together’s open-source models and Code Interpreter—easy to implement, solid benchmarks, and full code on GitHub.
Watchlist Matched: benchmarks, agent, open-source
Hugging Face · open-source · 2025-06-12
No feed summary available yet.
Watchlist Matched: training, post-training
Replicate · inference-infra · 2025-06-10
Learn expert prompting techniques to create stunning videos with Google's Veo 3.
Watchlist Matched: none
Together AI · inference-infra · 2025-06-09
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-06-06
We're sharing our experiments and tips on Google's new Veo 3 model.
Watchlist Matched: model
Hugging Face · open-source · 2025-06-03
No feed summary available yet.
Watchlist Matched: agent
Replicate · inference-infra · 2025-06-02
FLUX.1 Kontext is everywhere - see what folks are cooking.
Watchlist Matched: none
Replicate · inference-infra · 2025-05-29
This is how to get the most from Black Forest Labs' new image editing model.
Watchlist Matched: model
Together AI · inference-infra · 2025-05-29
No feed summary available yet.
Watchlist Matched: fine-tuning
Modal · inference-infra · 2025-05-28
Twirl, a Stockholm-based data orchestration platform, is joining Modal.
Watchlist Matched: none
Together AI · inference-infra · 2025-05-28
No feed summary available yet.
Watchlist Matched: training, post-training, agents, open-source
Hugging Face · open-source · 2025-05-28
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-05-27
Exploring Metaprogramming in Mojo
Watchlist Matched: none
Hugging Face · open-source · 2025-05-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-23
No feed summary available yet.
Watchlist Matched: agent, agents, mcp
Replicate · inference-infra · 2025-05-22
OpenAI's latest models are now available on Replicate, including GPT-4.1, GPT-4o, and the o-series.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-21
No feed summary available yet.
Watchlist Matched: quantization
Hugging Face · open-source · 2025-05-21
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-05-20
We've supercharged our Dicts to support new caching and locking workflows—oh, and unlimited items.
Watchlist Matched: none
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-05-20
No feed summary available yet.
Watchlist Matched: api
Hugging Face · open-source · 2025-05-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-15
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-05-15
We've partnered with Hugging Face to bring Replicate inference to their platform.
Watchlist Matched: inference
Together AI · inference-infra · 2025-05-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-05-11
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-05-08
Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10)
Watchlist Matched: none
Replicate · inference-infra · 2025-05-07
Ideogram 3.0 is packed with powerful design, style transfer, and realism capabilities.
Watchlist Matched: none
Modular · inference-infra · 2025-05-06
Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging
Watchlist Matched: open source
Replicate · inference-infra · 2025-05-06
MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.
Watchlist Matched: api
Modal · inference-infra · 2025-04-30
Today we're releasing lightweight client libraries for JavaScript and Go, making it easier to start sandboxes and call serverless functions — no Python required.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-30
No feed summary available yet.
Watchlist Matched: mcp
Hugging Face · open-source · 2025-04-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-25
No feed summary available yet.
Watchlist Matched: agent, agents, mcp
Modular · inference-infra · 2025-04-23
A New, Simpler License for MAX and Mojo
Watchlist Matched: none
Hugging Face · open-source · 2025-04-23
No feed summary available yet.
Watchlist Matched: finetuning
Modular · inference-infra · 2025-04-22
Why do HW companies struggle to build AI software? (Democratizing AI Compute, Part 9)
Watchlist Matched: none
Together AI · inference-infra · 2025-04-21
No feed summary available yet.
Watchlist Matched: training
Modal · inference-infra · 2025-04-17
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Together AI · inference-infra · 2025-04-17
No feed summary available yet.
Watchlist Matched: training, fine-tuning
Hugging Face · open-source · 2025-04-16
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-04-16
Advanced face swap and AI avatars from Easel AI are now on Replicate.
Watchlist Matched: none
Modal · inference-infra · 2025-04-15
Behind the scenes of updating our visual identity and launching our first-ever out-of-home campaign in San Francisco.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-14
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-04-11
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-09
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-04-08
What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)
Watchlist Matched: none
Hugging Face · open-source · 2025-04-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-04-03
No feed summary available yet.
Watchlist Matched: none
SqueezeBits · korea · 2025-04-02
From Edge AI to NVIDIA GTC: Squeezebits team members share firsthand stories from global AI events, including networking insights, technical trends, and conference experiences.
Watchlist Matched: none
Replicate · inference-infra · 2025-04-01
One of the most fun ways to use Wan2.1 is video style transfer. Learn how here.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-31
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-03-28
We take a quick look at the latest creative models, experiments, and community projects.
Watchlist Matched: lora
Hugging Face · open-source · 2025-03-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-26
No feed summary available yet.
Watchlist Matched: training, finetuning
BAIR · research · 2025-03-25
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...
Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents
Hugging Face · open-source · 2025-03-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-18
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-03-17
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Modular · inference-infra · 2025-03-12
What about TVM, XLA, and AI compilers? (Democratizing AI Compute, Part 6)
Watchlist Matched: none
Hugging Face · open-source · 2025-03-12
No feed summary available yet.
Watchlist Matched: long context
Hugging Face · open-source · 2025-03-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-11
No feed summary available yet.
Watchlist Matched: open-source
SqueezeBits · korea · 2025-03-10
This article describes when to use Fits on Chips toolkit with specific use cases.
Watchlist Matched: none
Replicate · inference-infra · 2025-03-05
We've been playing with Alibaba's WAN2.1 text-to-video model lately. What happens when you tweak those mysterious parameters? Let's find out.
Watchlist Matched: model
Hugging Face · open-source · 2025-03-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-03-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-28
No feed summary available yet.
Watchlist Matched: evaluate, agent
Modular · inference-infra · 2025-02-27
Modverse #46: MAX 25.1, MAX Builds, and Democratizing AI Compute
Watchlist Matched: none
Hugging Face · open-source · 2025-02-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-19
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2025-02-14
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-14
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2025-02-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-12
No feed summary available yet.
Watchlist Matched: none
SkyPilot · open-source · 2025-02-11
SkyPilot enables image-to-image and text-to-image search from 120 Hours to 1 Hour and from $$$ to $
Watchlist Matched: none
Hugging Face · open-source · 2025-02-11
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-10
No feed summary available yet.
Watchlist Matched: leaderboard
SqueezeBits · korea · 2025-02-06
This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-04
No feed summary available yet.
Watchlist Matched: agents, open-source
Hugging Face · open-source · 2025-02-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-02-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-31
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-31
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-01-30
DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)
Watchlist Matched: none
Hugging Face · open-source · 2025-01-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-24
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-01-24
Train your own versions of Tencent's HunyuanVideo for style, motion, and characters on Replicate.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-01-23
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2025-01-23
Use MAX with Open WebUI for RAG and Web Search
Watchlist Matched: rag
Modal · inference-infra · 2025-01-21
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Modal · inference-infra · 2025-01-21
Sandboxes are a new way to run code in Modal, with a focus on security and isolation.
Watchlist Matched: none
Modular · inference-infra · 2025-01-21
Hands-on with Mojo 24.6
Watchlist Matched: none
Hugging Face · open-source · 2025-01-21
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2025-01-17
Create AI videos with a convenient workflow.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2025-01-13
No feed summary available yet.
Watchlist Matched: agents
Hugging Face · open-source · 2025-01-10
No feed summary available yet.
Watchlist Matched: retrieval
Modal · inference-infra · 2025-01-02
Modal is excited to announce its SOC 2 Type II certification.
Watchlist Matched: none
Modal · inference-infra · 2024-12-28
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
SqueezeBits · korea · 2024-12-23
This article provides a comparative analysis of automatic prefix caching.
Watchlist Matched: none
Hugging Face · open-source · 2024-12-20
No feed summary available yet.
Watchlist Matched: evaluating
Modular · inference-infra · 2024-12-19
Evaluating Llama Guard with MAX 24.6 and Hugging Face
Watchlist Matched: evaluating
Hugging Face · open-source · 2024-12-17
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-12-16
There are lots of models that are as good as OpenAI's Sora now.
Watchlist Matched: none
Modal · inference-infra · 2024-12-10
An intro to fine-tuning large language models in 2025
Watchlist Matched: fine-tuning
Modal · inference-infra · 2024-12-09
Highlights from our 2024 internal hackathon, showcasing innovative projects built using Modal.
Watchlist Matched: none
Hugging Face · open-source · 2024-12-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-12-05
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-12-04
Find out how a top tier European soccer team uses Modal for computer vision on game data
Watchlist Matched: none
Modal · inference-infra · 2024-12-02
Now your Modal containers around the world can have static outbound IPs. Featuring WireGuard, policy-based routing, and NAT.
Watchlist Matched: none
Hugging Face · open-source · 2024-12-02
No feed summary available yet.
Watchlist Matched: open source
SqueezeBits · korea · 2024-11-26
This article provides a comparative analysis of different parallelism strategies on vLLM and TensorRT-LLM frameworks.
Watchlist Matched: none
Hugging Face · open-source · 2024-11-26
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-11-26
We've made running fine-tunes on Replicate much faster, and the optimizations are open-source.
Watchlist Matched: open-source
Hugging Face · open-source · 2024-11-25
No feed summary available yet.
Watchlist Matched: state of the art
Replicate · inference-infra · 2024-11-21
A new set of image generation capabilities for FLUX models, including inpainting, outpainting, canny edge detection, and depth maps.
Watchlist Matched: generation
Modal · inference-infra · 2024-11-20
Find out how OpenArt uses Modal to deploy highly customized ComfyUI workflows used by millions of customers
Watchlist Matched: none
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-11-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-11-19
No feed summary available yet.
Watchlist Matched: none
SkyPilot · open-source · 2024-11-14
Announcing SkyPilot 0.7.
Watchlist Matched: none
Hugging Face · open-source · 2024-11-12
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-11-08
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Modal · inference-infra · 2024-11-07
Tidbyt, a NYC-based hardware manufacturer, is joining Modal
Watchlist Matched: none
Hugging Face · open-source · 2024-11-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-28
No feed summary available yet.
Watchlist Matched: rag
Modular · inference-infra · 2024-10-25
Understanding SIMD: Infinite Complexity of Trivial Problems
Watchlist Matched: none
SqueezeBits · korea · 2024-10-24
This article provides a comparative analysis of schedulers in vLLM and TensorRT-LLM frameworks.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-23
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-10-22
Stability AI's latest text-to-image model is now available on Replicate and you can run it with an API.
Watchlist Matched: model, api
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-16
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-10-10
Community Spotlight: Writing Mojo with Cursor
Watchlist Matched: none
Hugging Face · open-source · 2024-10-10
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-10-10
FLUX is now much faster on Replicate, and we’ve made our optimizations open-source so you can see exactly how they work and build upon them.
Watchlist Matched: open-source, open source
Hugging Face · open-source · 2024-10-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-10-03
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-10-01
Hands-on with Mojo 24.5
Watchlist Matched: none
Hugging Face · open-source · 2024-10-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-25
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-09-24
Build intelligent applications with Modal's serverless infrastructure and MongoDB Atlas's data platform.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-20
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-09-20
It's easy to fine-tune Flux, but sometimes you need to do a little more work to get the best results. This post covers techniques you can use to improve your fine-tuned Flux models.
Watchlist Matched: training
Modal · inference-infra · 2024-09-18
Learn how Contextual AI accelerated their developer iteration speed by using Modal to run tests on GPUs.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-18
No feed summary available yet.
Watchlist Matched: fine-tuning, quantization
Hugging Face · open-source · 2024-09-13
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-09-09
Create and run your own fine-tuned Flux models programmatically using Replicate's HTTP API.
Watchlist Matched: api
Modal · inference-infra · 2024-09-06
Welcome to another round of Modal Product Updates! Here's what's new this month.
Watchlist Matched: none
Modal · inference-infra · 2024-09-04
You can now enter BAAs with Modal to run HIPAA-compliant workloads.
Watchlist Matched: none
Hugging Face · open-source · 2024-09-04
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-08-30
Create your own fine-tuned Flux model to generate new images of yourself.
Watchlist Matched: model
Hugging Face · open-source · 2024-08-27
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-08-23
Flux LoRAs, Hot Zuck, and Replicate on Lex Fridman
Watchlist Matched: none
Hugging Face · open-source · 2024-08-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-08-21
No feed summary available yet.
Watchlist Matched: training
Modal · inference-infra · 2024-08-16
How we built an in-browser code playground using Modal Sandboxes.
Watchlist Matched: none
Replicate · inference-infra · 2024-08-16
Fine tune FLUX.1, generative video games, a vision for the metaverse
Watchlist Matched: none
Replicate · inference-infra · 2024-08-15
We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.
Watchlist Matched: generation, fine-tuning, lora, api
Hugging Face · open-source · 2024-08-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-08-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-08-12
No feed summary available yet.
Watchlist Matched: tool use
Replicate · inference-infra · 2024-08-09
Flux developments, Minecraft bot, Streamlit cookbook with Zeke
Watchlist Matched: none
Hugging Face · open-source · 2024-08-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-08-06
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-08-02
Open source frontier image model, cut objects from videos, new Python web framework from Jeremy Howard
Watchlist Matched: model, open source
Replicate · inference-infra · 2024-08-02
We explore FLUX.1's unique strengths and aesthetics to see what we can generate.
Watchlist Matched: none
Replicate · inference-infra · 2024-08-01
FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.
Watchlist Matched: model, api, open-source
Hugging Face · open-source · 2024-07-31
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-30
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-07-26
A top-tier open-ish language model, new safety classifiers, model search API
Watchlist Matched: model, api
Modular · inference-infra · 2024-07-23
Announcing stack-pr: an open source tool for managing stacked PRs on GitHub
Watchlist Matched: open source
Hugging Face · open-source · 2024-07-23
No feed summary available yet.
Watchlist Matched: long context
SkyPilot · open-source · 2024-07-23
Operational guide to finetune Llama 3.1, with everything packaged in a simple SkyPilot YAML.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-18
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-07-16
Debugging in Mojo🔥
Watchlist Matched: none
Hugging Face · open-source · 2024-07-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-16
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-07-12
Data curation, data generation, data data data
Watchlist Matched: generation
Hugging Face · open-source · 2024-07-11
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-10
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-07-09
Take control of your AI
Watchlist Matched: none
Modular · inference-infra · 2024-07-09
Develop locally, deploy globally
Watchlist Matched: none
Hugging Face · open-source · 2024-07-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-07-08
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-07-03
Learn how Basis partnered with Modal to bring the spirit of competitive programming to prompt engineering.
Watchlist Matched: none
Modular · inference-infra · 2024-07-03
A brief guide to the Mojo n-body example
Watchlist Matched: none
Replicate · inference-infra · 2024-06-28
Google's Gemma2 models, language model leaderboard, tips for Stable Diffusion 3
Watchlist Matched: model, leaderboard
Hugging Face · open-source · 2024-06-27
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-06-25
What's new in MAX 24.4? MAX on macOS, fast local Llama3, native quantization and GGUF support
Watchlist Matched: quantization, gguf
Hugging Face · open-source · 2024-06-24
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2024-06-24
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-06-21
Really good coding model, AI search breakthroughs, Discord support bot
Watchlist Matched: model
Hugging Face · open-source · 2024-06-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-06-19
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2024-06-18
We show you how to use Stable Diffusion 3 to get the best images, including new techniques for prompting.
Watchlist Matched: none
Replicate · inference-infra · 2024-06-18
A step-by-step guide to generating images with Stable Diffusion 3 on your M-series Mac using MPS acceleration.
Watchlist Matched: none
Modular · inference-infra · 2024-06-17
What’s new in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements
Watchlist Matched: none
Replicate · inference-infra · 2024-06-14
Copy and paste a few commands into terminal to play with Stable Diffusion 3 on your own GPU-powered machine.
Watchlist Matched: gpu
Replicate · inference-infra · 2024-06-14
Find concepts in GPT models, real-time speech to text in the browser, H100s are coming
Watchlist Matched: none
Hugging Face · open-source · 2024-06-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-06-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-06-12
No feed summary available yet.
Watchlist Matched: rlhf
Replicate · inference-infra · 2024-06-07
Garden State Llama, applied LLMs guide, real-time image generation
Watchlist Matched: generation
Hugging Face · open-source · 2024-06-07
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-06-06
See how Modal combats cryptomining abuse with syscall-based program analysis, to secure GPUs for legitimate users.
Watchlist Matched: none
Hugging Face · open-source · 2024-06-06
No feed summary available yet.
Watchlist Matched: leaderboard
SkyPilot · open-source · 2024-06-04
Announcing SkyPilot 0.6.
Watchlist Matched: api
Modular · inference-infra · 2024-06-04
Deep dive into ownership in Mojo
Watchlist Matched: none
Replicate · inference-infra · 2024-05-31
Faster image generation, AI-powered world simulator, insights on AI dataset complexity
Watchlist Matched: generation
Hugging Face · open-source · 2024-05-31
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-05-28
No feed summary available yet.
Watchlist Matched: training, finetuning
SqueezeBits · korea · 2024-05-27
SqueezeBits' IT exhibition recap: from AI model compression demos to hands-on OwLite experiences, booth visitor reactions, and more. Read our on-the-ground event story!
Watchlist Matched: model
Replicate · inference-infra · 2024-05-24
DIY Llama 3 implementation, open-source smart glasses, steering language models with dictionary learning
Watchlist Matched: open-source
Modal · inference-infra · 2024-05-23
Find out how Hunch uses Modal to run AI code even its users don't trust.
Watchlist Matched: none
Replicate · inference-infra · 2024-05-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-05-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-05-21
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-05-20
Fast⚡k-means clustering in Mojo🔥: a guide to porting Python to Mojo🔥 for accelerated k-means clustering
Watchlist Matched: none
SqueezeBits · korea · 2024-05-16
An introduction to tokenizers and their implications in language models.
Watchlist Matched: none
Hugging Face · open-source · 2024-05-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-05-09
No feed summary available yet.
Watchlist Matched: none
Modular · inference-infra · 2024-05-08
Developer Voices: Deep Dive with Chris Lattner on Mojo
Watchlist Matched: none
Modular · inference-infra · 2024-05-02
What’s New in Mojo 24.3: Community Contributions, Pythonic Collections and Core Language Enhancements
Watchlist Matched: none
Hugging Face · open-source · 2024-04-30
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-04-26
Fine-tune on just a few hundred examples and kick off your very own data flywheel.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-22
No feed summary available yet.
Watchlist Matched: agent
Hugging Face · open-source · 2024-04-19
No feed summary available yet.
Watchlist Matched: leaderboard
Modal · inference-infra · 2024-04-18
Easily develop and deploy custom ETL jobs while saving 99% on sync costs.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-11
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-04-10
Celebrating the best in enterprise tech.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-04-04
No feed summary available yet.
Watchlist Matched: api
Modular · inference-infra · 2024-04-02
What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more
Watchlist Matched: oss
Modular · inference-infra · 2024-03-28
The Next Big Step in Mojo🔥 Open Source
Watchlist Matched: open source
Modal · inference-infra · 2024-03-26
Find out how Ramp uses Modal to customize open source LLMs to automate receipt processing.
Watchlist Matched: open source
Hugging Face · open-source · 2024-03-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-03-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-03-22
No feed summary available yet.
Watchlist Matched: quantization, retrieval
Hugging Face · open-source · 2024-03-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-03-20
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2024-03-18
No feed summary available yet.
Watchlist Matched: quantization
Hugging Face · open-source · 2024-03-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-03-15
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2024-03-14
In this post, we'll talk about how Modal handles real-time HTTP requests and WebSockets in serverless functions.
Watchlist Matched: none
Hugging Face · open-source · 2024-03-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-23
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2024-02-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-16
No feed summary available yet.
Watchlist Matched: open source
Modal · inference-infra · 2024-02-15
We've been busy in 2024 so far, bringing you WebSockets, interactive commands, H100s and more. Learn about what's new at Modal.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-08
No feed summary available yet.
Watchlist Matched: api
Hugging Face · open-source · 2024-02-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-02
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2024-02-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-02-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-29
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2024-01-26
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2024-01-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-24
No feed summary available yet.
Watchlist Matched: agents, open-source
Modal · inference-infra · 2024-01-23
Leverage Modal’s parallel batch jobs and in-house storage features to quickly generate embeddings for billions of tokens.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2024-01-12
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2024-01-10
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2024-01-02
No feed summary available yet.
Watchlist Matched: training, lora
Modal · inference-infra · 2023-12-20
An operational guide to fine-tuning an LLM on any dataset in minutes (ft. CodeLlama, Llama 2, Mistral, and more)
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2023-12-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-12-06
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-12-06
Or, how I met a virtual David Attenborough.
Watchlist Matched: none
Replicate · inference-infra · 2023-12-06
We’ve added fine-tuning for realistic voice cloning (RVC). You can train RVC on your own dataset from a YouTube video with a few lines of code using Replicate's API.
Watchlist Matched: fine-tuning, api, open-source
Replicate · inference-infra · 2023-12-05
We've raised a $40 million Series B led by a16z.
Watchlist Matched: open-source
Hugging Face · open-source · 2023-12-01
No feed summary available yet.
Watchlist Matched: leaderboard
Replicate · inference-infra · 2023-11-23
The Yi series models are large language models trained from scratch by developers at 01.AI. Learn how to run them in the cloud with one line of code.
Watchlist Matched: cloud, api
Replicate · inference-infra · 2023-11-22
We've added a CLI command that makes it easy to get started with Replicate.
Watchlist Matched: none
Hugging Face · open-source · 2023-11-09
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-11-08
We’ve added chord conditioning to Meta’s MusicGen model, so you can create automatic backing tracks in any style using text prompts and chord progressions.
Watchlist Matched: model
Hugging Face · open-source · 2023-10-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-10-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-10-24
No feed summary available yet.
Watchlist Matched: rlhf
Hugging Face · open-source · 2023-10-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-10-19
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-10-13
We’ve added fine-tuning support to MusicGen. You can train the small, medium and melody models on your own audio files using Replicate.
Watchlist Matched: fine-tuning
Modal · inference-infra · 2023-10-10
Modal offically launches today with no waitlist. And we also raised a Series A!
Watchlist Matched: none
Replicate · inference-infra · 2023-10-09
How to use Llama 2 models with grammars for information extraction tasks.
Watchlist Matched: none
Hugging Face · open-source · 2023-10-04
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-10-04
Combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt
Watchlist Matched: none
Hugging Face · open-source · 2023-09-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-18
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2023-09-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-13
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2023-09-12
No feed summary available yet.
Watchlist Matched: quantization
Hugging Face · open-source · 2023-09-11
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-09-06
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-09-06
We've made some dramatic improvements to cold boots for fine-tuned models.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-23
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-08-22
With the recent release of Stable Diffusion XL fine-tuning on Replicate, and today being the 1-year anniversary of Stable Diffusion, now feels like the perfect opportunity to take a step back and reflect on how text-to-image AI has improve...
Watchlist Matched: release, fine-tuning
Replicate · inference-infra · 2023-08-16
The price of public models is being cut in half, and soon we'll start charging new users for setup and idle time on private models.
Watchlist Matched: none
Replicate · inference-infra · 2023-08-14
Learn the art of the Llama prompt.
Watchlist Matched: none
Replicate · inference-infra · 2023-08-14
Our API now supports server-sent event streams for language models. Learn how to use them to make your apps more responsive.
Watchlist Matched: api
Hugging Face · open-source · 2023-08-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-08
No feed summary available yet.
Watchlist Matched: dpo
Hugging Face · open-source · 2023-08-08
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-08-08
We’ve added fine-tuning (Dreambooth, Textual Inversion and LoRA) support to SDXL 1.0. You can train SDXL on your own images with one line of code using the Replicate API.
Watchlist Matched: fine-tuning, lora, api
Hugging Face · open-source · 2023-08-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-08-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-27
No feed summary available yet.
Watchlist Matched: quantization
Replicate · inference-infra · 2023-07-26
How to run Stable Diffusion XL 1.0 using the Replicate API
Watchlist Matched: api
Hugging Face · open-source · 2023-07-24
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-07-22
How to run Llama 2 on Mac, Linux, Windows, and your phone.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-21
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2023-07-20
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-07-20
So you want to train a llama...
Watchlist Matched: none
Hugging Face · open-source · 2023-07-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-14
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2023-07-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-07-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-23
No feed summary available yet.
Watchlist Matched: leaderboard
Hugging Face · open-source · 2023-06-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-19
No feed summary available yet.
Watchlist Matched: adapter
Hugging Face · open-source · 2023-06-16
No feed summary available yet.
Watchlist Matched: none
Modal · inference-infra · 2023-06-15
Modal is excited to announce that it has successfully completed a System and Organization Controls (SOC) 2 Type 1 audit.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-07
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-06
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-06-01
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2023-05-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-24
No feed summary available yet.
Watchlist Matched: qlora, quantization
Hugging Face · open-source · 2023-05-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-23
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-05-18
We've added a status page to provide real-time updates on the health of Replicate.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-05-01
No feed summary available yet.
Watchlist Matched: api
Hugging Face · open-source · 2023-04-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-04-26
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2023-04-21
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-04-19
Give it a machine learning directory and AutoCog will create predict.py and cog.yaml until it successfully runs a prediction
Watchlist Matched: none
Hugging Face · open-source · 2023-04-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-04-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-04-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-04-06
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-04-05
No feed summary available yet.
Watchlist Matched: rlhf
Replicate · inference-infra · 2023-04-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-23
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-03-18
A roundup of recent developments from the llamaverse.
Watchlist Matched: none
Replicate · inference-infra · 2023-03-17
With a small amount of data and an hour of training you can make LLaMA output text in the voice of the dataset.
Watchlist Matched: training
Replicate · inference-infra · 2023-03-16
We'll show you how to train Alpaca, a fine-tuned version of LLaMA that can respond to instructions like ChatGPT.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-03-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-21
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2023-02-21
Lots of people want to build things with machine learning, but they don't have the expertise to use it.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-10
No feed summary available yet.
Watchlist Matched: fine-tuning
Hugging Face · open-source · 2023-02-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-07
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-06
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-02-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-26
No feed summary available yet.
Watchlist Matched: fine-tuning, lora
Hugging Face · open-source · 2023-01-24
No feed summary available yet.
Watchlist Matched: agent
Hugging Face · open-source · 2023-01-24
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2023-01-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2023-01-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-09
No feed summary available yet.
Watchlist Matched: rlhf
Hugging Face · open-source · 2022-12-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-12-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-07
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2022-11-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-11-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-10-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-10-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-10-03
No feed summary available yet.
Watchlist Matched: evaluate
Hugging Face · open-source · 2022-09-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-09-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-09-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-09-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-09-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-09-08
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-31
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-08-29
How to use Replicate to integrate Stable Diffusion into hacks, apps, and projects
Watchlist Matched: api
Replicate · inference-infra · 2022-08-25
A tutorial for building a chat bot that replies to prompts with the output of a text-to-image model.
Watchlist Matched: model
Hugging Face · open-source · 2022-08-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-12
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-08-11
We're bringing people together to explore what's being created with machine learning.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-05
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-08-05
Using CLIP and LAION5B to collect thousands of captioned images.
Watchlist Matched: none
Hugging Face · open-source · 2022-08-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-07-22
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-07-18
The basics of using the API to create your own images from text.
Watchlist Matched: api
Hugging Face · open-source · 2022-07-14
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2022-07-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-07-07
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-29
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-06-07
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-05-27
An introduction to differentiable programming and the process of refining generative art models.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-23
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2022-05-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-18
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-17
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-16
No feed summary available yet.
Watchlist Matched: none
Replicate · inference-infra · 2022-05-16
We're a small team of engineers and machine learning enthusiasts working to make machine learning more accessible.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-06
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-05-04
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-27
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-22
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-04-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-03-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-03-16
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-02-11
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-02-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-02-01
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-01-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-01-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2022-01-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-12-23
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-12-21
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-12-08
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2021-11-30
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-11-15
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-10-26
No feed summary available yet.
Watchlist Matched: launch
Hugging Face · open-source · 2021-10-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-10-20
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-10-13
No feed summary available yet.
Watchlist Matched: fine tuning
Hugging Face · open-source · 2021-10-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-10-05
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-09-24
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-09-14
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-07-15
No feed summary available yet.
Watchlist Matched: training
Hugging Face · open-source · 2021-07-13
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-06-28
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-05-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-03-31
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-03-12
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-03-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-02-25
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-02-09
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-01-26
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2021-01-19
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2020-11-03
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2020-11-02
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2020-10-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2020-09-10
No feed summary available yet.
Watchlist Matched: none
Hugging Face · open-source · 2020-07-03
No feed summary available yet.
Watchlist Matched: none