Fireworks AI · inference-infra · 2026-06-03
Introducing Fireworks on Microsoft Foundry: Bringing Best-in-Class Open Model inference to Azure
No feed summary available yet.
High signal Matched: inference, model, open model
Fireworks AI · inference-infra · 2026-06-03
No feed summary available yet.
High signal Matched: inference, model, open model
PyTorch Foundation · open-source · 2026-05-27
The PyTorch Foundation, a community-driven hub for open source AI under the Linux Foundation, is announcing today that Alibaba Cloud has joined as a Platinum member. Alibaba Cloud is a...
High signal Matched: cloud, open source
LMCache · open-source · 2026-05-27
A collaboration story about LMCache multiprocess mode + MooncakeStore — From 0 to 1, from functional to optimized. 1. Before We Begin Recently, the LMCache community and the Mooncake community carried out a series of valuable open-source c...
High signal Matched: lmcache, adapter, open-source, open source
Lambda · cloud · 2026-05-22
After 15 months of incremental updates, leaks, and rumored leaks, DeepSeek released version 4. It arrived without the fanfare R1 and R1-preview commanded in early 2025. That quiet reception is the most interesting thing about the release....
High signal Matched: inference, serving, performance, cost, release, model, open-source
AMD ROCm Blogs · hardware · 2026-05-22
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and other...
High signal Matched: inference, inferencing, serving, triton, benchmark, model, cloud, open-source
AMD ROCm Blogs · hardware · 2026-05-20
AMD released ROCm Core 7.13, the AMD GPU Driver 31.30, and AMD GPU Virtualization 9.0. With these releases, ROCm software expands hardware support across enterprise datacenters. The platform introduces AMD’s latest Instinct accelerators, e...
High signal Matched: performance, gpu, rocm, open-source
Microsoft Research · big-tech · 2026-05-14
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projec...
High signal Matched: performance, research, open-source
NVIDIA Technical Blog · hardware · 2026-04-28
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...
High signal Matched: model, open model, agent, agentic
Together AI · inference-infra · 2026-04-28
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
High signal Matched: model, open model, agentic
Nota AI · korea · 2026-04-22
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
LMCache · open-source · 2026-04-18
GTC wrapped up a month ago. Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial...
High signal Matched: kv cache, lmcache, open-source
SqueezeBits · korea · 2026-04-14
Check out highlights from the 2nd vLLM Korea Meetup! open-source use cases and real-world production examples that showcase vLLM's technical maturity!
High signal Matched: korea, open-source
NVIDIA Technical Blog · hardware · 2026-04-09
Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations...
High signal Matched: gpu, open source
AI2 · research · 2026-04-07
WildDet3D is an open model that predicts 3D bounding boxes from a single image. It generalizes across cameras and object categories, and folds in depth signals when available—alongside a new dataset of verified 3D annotations.
High signal Matched: introducing, model, open model
vLLM Project · open-source · 2026-04-02
With the debut of Gemma 4, vLLM introduces immediate support for Google's most sophisticated open model lineup, spanning multiple hardware backends, with first-ever Day 0 support on Google TPUs,...
High signal Matched: model, open model
Together AI · inference-infra · 2026-03-31
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
High signal Matched: decoding, speculative decoding, open-source
Nota AI · korea · 2026-03-23
Jaehoon Lee Technical Content Manager, Nota AI GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...
High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source
Together AI · inference-infra · 2026-03-17
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
High signal Matched: inference, open-source
Nota AI · korea · 2026-03-13
Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...
High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context
Together AI · inference-infra · 2026-03-02
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
High signal Matched: introducing, research, open-source
Together AI · inference-infra · 2026-02-02
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...
High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss
Together AI · inference-infra · 2026-02-02
Together Evaluations now supports OpenAI, Anthropic, and Google models for cross-provider benchmarking. Compare open-source, fine-tuned, and proprietary models side-by-side to make data-driven decisions on quality, cost, and performance—al...
High signal Matched: performance, cost, open-source, open source
vLLM Project · open-source · 2026-02-01
TL;DR: In collaboration with the open-source community, vLLM + NVIDIA has achieved significant performance milestones on the gpt-oss-120b model running on NVIDIA's Blackwell GPUs. Through deep...
High signal Matched: performance, blackwell, model, open-source, oss
Together AI · inference-infra · 2026-01-26
Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...
High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source
Together AI · inference-infra · 2026-01-08
Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.
High signal Matched: performance, cost, model, open model, evaluating, open-source
Together AI · inference-infra · 2025-12-01
Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...
High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source
Together AI · inference-infra · 2025-11-04
Together AI launches the fastest voice AI stack: streaming Whisper STT, serverless open-source TTS (Orpheus & Kokoro), and Voxtral transcription. Sub-second latency for production voice agents.
High signal Matched: inference, latency, agents, open-source
Hugging Face · open-source · 2025-10-16
No feed summary available yet.
High signal Matched: cloud, oss
Together AI · inference-infra · 2025-08-19
Customize OpenAI’s gpt-oss-20B/120B with Together AI’s fine-tuning: train, optimize, and instantly deploy domain experts with enterprise reliability and cost efficiency.
High signal Matched: cost, fine-tuning, oss
Together AI · inference-infra · 2025-08-15
Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.
High signal Matched: model, fine-tuning, open-source
SkyPilot · open-source · 2025-08-12
Your AI writes code. Now what? If you’re building AI agents in 2025, you probably wondered that as well. Your LLM generates some Python code that analyzes data, manipulates files, or calls APIs. But where does it run? Most people eit...
High signal Matched: cloud, agent, agents, open-source
Together AI · inference-infra · 2025-08-05
Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.
High signal Matched: model, oss
Hugging Face · open-source · 2025-08-05
No feed summary available yet.
High signal Matched: model, open-source, oss
Together AI · inference-infra · 2025-07-28
Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.
High signal Matched: benchmark, model, open-source
Together AI · inference-infra · 2025-07-17
Together AI inference is now among the world’s fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for NVIDIA HGX B200.
High signal Matched: inference, b200, blackwell, open-source
Together AI · inference-infra · 2025-07-14
Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.
High signal Matched: cost, model, open model, agentic, open-source
llm-d · open-source · 2025-05-20
Red Hat launches llm-d: Open source distributed AI inference platform backed by NVIDIA, Google Cloud, IBM. Scale generative AI with intelligent routing on Kubernetes.
High signal Matched: inference, distributed, release, cloud, open source
Nota AI · korea · 2025-05-07
Jewon Lee | Ki-Ung Song | Seungmin Yang | Donguk Lim | Jaeyeon Kim | Wooksu Shin | Bo-Kyeong Kim | Tae-Ho KimEdgeFM Team, Nota AI Yong Jae Lee, Ph. D.Associate Professor, UW-Madison SummaryOur method, Trimmed-Llama, reduces t...
High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source
SqueezeBits · korea · 2025-03-26
With TensorRT-LLM now open source, we can finally take a deep dive into the secret sauce behind its impressive performance.
High signal Matched: performance, open source
Replicate · inference-infra · 2025-03-05
Wan2.1 is the most capable open-source video generation model, producing coherent and high-quality outputs. Learn how to run it in the cloud with a single line of code.
High signal Matched: generation, model, cloud, api, open-source
AIBrix · open-source · 2025-02-21
Open-source large language models (LLMs) like LLaMA, Deepseek, Qwen and Mistral etc have surged in popularity, offering enterprises greater flexibility, cost savings, and control over their AI deployments. These models have empowered organ...
High signal Matched: inference, generation, latency, cost, introducing, model, agents, open-source
SqueezeBits · korea · 2025-02-10
This article is about an open-source library for direct conversion of PyTorch models to TensorRT-LLM.
High signal Matched: open-source
Hugging Face · open-source · 2024-12-10
No feed summary available yet.
High signal Matched: research, open source
AIBrix · open-source · 2024-11-13
In recent years, large language models (LLMs) have revolutionized AI applications, powering solutions in areas like chatbots, automated content generation, and advanced recommendation engines. Services like OpenAI’s have gained significant...
High signal Matched: decoding, prefill, generation, kv cache, performance, cost, gpu, release, introducing, cloud, open-source
Nota AI · korea · 2024-08-02
Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...
High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source
Replicate · inference-infra · 2024-07-23
Llama 3.1 405B: is the most powerful open-source language model from Meta. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Replicate · inference-infra · 2024-04-23
Arctic is a new open-source language model from Snowflake. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Replicate · inference-infra · 2024-01-30
Code Llama 70B is one of the powerful open-source code generation models. Learn how to run it in the cloud with one line of code.
High signal Matched: generation, cloud, api, open-source
Replicate · inference-infra · 2023-11-10
An interactive example showing how to embed text using a state-of-the-art embedding model that beats OpenAI's embeddings API on price and performance.
High signal Matched: performance, model, api, open-source
Replicate · inference-infra · 2023-10-06
Mistral 7B is an open-source large language model. Learn what it's good at and how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open-source
Replicate · inference-infra · 2023-07-27
Llama 2 is the first open source language model of the same caliber as OpenAI’s models. Learn how to run it in the cloud with one line of code.
High signal Matched: model, cloud, api, open source
Replicate · inference-infra · 2023-07-19
A roundup of recent developments from the llamaverse following the second major release of Meta's open-source large language model.
High signal Matched: release, model, open-source
Hugging Face · open-source · 2023-07-17
No feed summary available yet.
High signal Matched: generation, open-source
Replicate · inference-infra · 2023-04-21
A roundup of recent developments from the world of open-source language models.
High signal Matched: model, open-source
Runpod · cloud · 2026-06-03
No feed summary available yet.
Watchlist Matched: open-source
Modal · inference-infra · 2026-06-01
What we've seen helping teams run Reinforcement Learning at scale on Modal. Plus an open-source library to skip the scaffolding.
Watchlist Matched: open-source
Together AI · inference-infra · 2026-05-14
Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.
Watchlist Matched: open-source
Lambda · cloud · 2026-04-30
Harnesses If you've used Claude Code or Codex, you've used a harness. A harness is the infrastructure layer that wraps an AI coding agent and decides how it operates, what it can touch, and how you measure whether it worked. It's how most...
Watchlist Matched: gpu, training, post-training, agent, agents, open-source
NVIDIA Technical Blog · hardware · 2026-04-20
The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these...
Watchlist Matched: open source
Hugging Face · open-source · 2026-03-18
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2026-03-10
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-02-04
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-01-28
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2026-01-27
No feed summary available yet.
Watchlist Matched: training, agentic, oss
Hugging Face · open-source · 2025-12-04
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2025-10-24
No feed summary available yet.
Watchlist Matched: oss
Hugging Face · open-source · 2025-09-11
No feed summary available yet.
Watchlist Matched: oss
Together AI · inference-infra · 2025-08-11
No feed summary available yet.
Watchlist Matched: oss
Hugging Face · open-source · 2025-08-05
No feed summary available yet.
Watchlist Matched: open-source
Replicate · inference-infra · 2025-07-31
Wan 2.2 is our fastest, cheapest video model.
Watchlist Matched: model, open source
Hugging Face · open-source · 2025-07-09
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-06-26
No feed summary available yet.
Watchlist Matched: open-source
Together AI · inference-infra · 2025-06-12
Build a data scientist agent using Together’s open-source models and Code Interpreter—easy to implement, solid benchmarks, and full code on GitHub.
Watchlist Matched: benchmarks, agent, open-source
Together AI · inference-infra · 2025-05-28
No feed summary available yet.
Watchlist Matched: training, post-training, agents, open-source
Modular · inference-infra · 2025-05-06
Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging
Watchlist Matched: open source
Hugging Face · open-source · 2025-04-14
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-03-11
No feed summary available yet.
Watchlist Matched: open-source
Hugging Face · open-source · 2025-02-04
No feed summary available yet.
Watchlist Matched: agents, open-source
Replicate · inference-infra · 2025-01-24
Train your own versions of Tencent's HunyuanVideo for style, motion, and characters on Replicate.
Watchlist Matched: open-source
Hugging Face · open-source · 2024-12-02
No feed summary available yet.
Watchlist Matched: open source
Replicate · inference-infra · 2024-11-26
We've made running fine-tunes on Replicate much faster, and the optimizations are open-source.
Watchlist Matched: open-source
Replicate · inference-infra · 2024-10-10
FLUX is now much faster on Replicate, and we’ve made our optimizations open-source so you can see exactly how they work and build upon them.
Watchlist Matched: open-source, open source
Replicate · inference-infra · 2024-08-02
Open source frontier image model, cut objects from videos, new Python web framework from Jeremy Howard
Watchlist Matched: model, open source
Replicate · inference-infra · 2024-08-01
FLUX.1 is a new text-to-image model from Black Forest Labs, the creators of Stable Diffusion, that exceeds the capabilities of previous open-source models.
Watchlist Matched: model, api, open-source
Modular · inference-infra · 2024-07-23
Announcing stack-pr: an open source tool for managing stacked PRs on GitHub
Watchlist Matched: open source
Replicate · inference-infra · 2024-05-24
DIY Llama 3 implementation, open-source smart glasses, steering language models with dictionary learning
Watchlist Matched: open-source
Modular · inference-infra · 2024-04-02
What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more
Watchlist Matched: oss
Modular · inference-infra · 2024-03-28
The Next Big Step in Mojo🔥 Open Source
Watchlist Matched: open source
Modal · inference-infra · 2024-03-26
Find out how Ramp uses Modal to customize open source LLMs to automate receipt processing.
Watchlist Matched: open source
Hugging Face · open-source · 2024-02-16
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2024-01-24
No feed summary available yet.
Watchlist Matched: agents, open-source
Replicate · inference-infra · 2023-12-06
We’ve added fine-tuning for realistic voice cloning (RVC). You can train RVC on your own dataset from a YouTube video with a few lines of code using Replicate's API.
Watchlist Matched: fine-tuning, api, open-source
Replicate · inference-infra · 2023-12-05
We've raised a $40 million Series B led by a16z.
Watchlist Matched: open-source
Hugging Face · open-source · 2023-07-21
No feed summary available yet.
Watchlist Matched: open source
Hugging Face · open-source · 2023-06-01
No feed summary available yet.
Watchlist Matched: open source