Modular - MLSys Blogs

Modular · inference-infra · 2026-05-21

Why LLM Inference Needs a New Kind of Router - Part 2

Score 14

Why LLM Inference Needs a New Kind of Router - Part 2

inference moe

Open

High signal Matched: inference, router

Modular · inference-infra · 2026-05-18

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

Score 10

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

inference

Open

High signal Matched: inference

Modular · inference-infra · 2026-05-12

Inkwell: Why Your Inference Platform Matters As Much As Your Model

Score 14

Inkwell: Why Your Inference Platform Matters As Much As Your Model

inference model-release

Open

High signal Matched: inference, model

Modular · inference-infra · 2026-05-08

Why LLM Inference Needs a New Kind of Router - Part 1

Score 14

Why LLM Inference Needs a New Kind of Router - Part 1

inference moe

Open

High signal Matched: inference, router

Modular · inference-infra · 2026-04-13

TileTensor Part 1 - Safer, More Efficient GPU Kernels

Score 10

TileTensor Part 1 - Safer, More Efficient GPU Kernels

hardware

Open

High signal Matched: gpu

Modular · inference-infra · 2026-04-02

Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD

Score 14

Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD

benchmark model-release

Open

High signal Matched: performance, launch

Modular · inference-infra · 2026-03-30

Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem

Score 10

Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem

hardware

Open

High signal Matched: gpu

Modular · inference-infra · 2026-03-19

Modular 26.2: State-of-the-Art Image Generation and Upgraded AI Coding with Mojo

Score 10

Modular 26.2: State-of-the-Art Image Generation and Upgraded AI Coding with Mojo

inference

Open

High signal Matched: generation

Modular · inference-infra · 2026-03-16

Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200

Score 18

Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200

kernel hardware

Open

High signal Matched: kernel, b200, blackwell

Modular · inference-infra · 2026-03-06

Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem

Score 10

Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem

research

Open

High signal Matched: research

Modular · inference-infra · 2026-03-05

Structured Mojo Kernels Part 1 - Peak Performance, Half the Code

Score 10

Structured Mojo Kernels Part 1 - Peak Performance, Half the Code

benchmark

Open

High signal Matched: performance

Modular · inference-infra · 2026-01-14

How to Beat Unsloth's CUDA Kernel Using Mojo—With Zero GPU Experience

Score 18

How to Beat Unsloth's CUDA Kernel Using Mojo—With Zero GPU Experience

kernel cuda hardware

Open

High signal Matched: kernel, cuda, gpu

Modular · inference-infra · 2025-11-20

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

Score 14

Modular 25.7: Faster Inference, Safer GPU Programming, and a More Unified Developer Experience

inference hardware

Open

High signal Matched: inference, gpu

Modular · inference-infra · 2025-11-07

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

Score 10

"TTS 1 Max" (powered by Modular Platform) Ranked #1 Speech Model on Artificial Analysis

model-release

Open

High signal Matched: model

Modular · inference-infra · 2025-10-17

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

Score 10

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

benchmark

Open

High signal Matched: performance

Modular · inference-infra · 2025-09-19

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

Score 10

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

hardware frontier-model

Open

High signal Matched: blackwell, sota

Modular · inference-infra · 2025-09-12

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

Score 14

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

benchmark hardware frontier-model

Open

High signal Matched: performance, blackwell, sota

Modular · inference-infra · 2025-09-05

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

Score 14

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

kernel hardware

Open

High signal Matched: matmul, blackwell

Modular · inference-infra · 2025-08-28

Matrix Multiplication on Blackwell: Part 1 - Introduction

Score 10

Matrix Multiplication on Blackwell: Part 1 - Introduction

hardware

Open

High signal Matched: blackwell

Modular · inference-infra · 2025-08-05

Modular Platform 25.5: Introducing Large Scale Batch Inference

Score 14

Modular Platform 25.5: Introducing Large Scale Batch Inference

inference model-release

Open

High signal Matched: inference, introducing

Modular · inference-infra · 2025-07-31

SF Compute and Modular Partner to Revolutionize AI Inference Economics

Score 10

SF Compute and Modular Partner to Revolutionize AI Inference Economics

inference

Open

High signal Matched: inference

Modular · inference-infra · 2025-06-10

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

Score 10

Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple

model-release

Open

High signal Matched: introducing

Modular · inference-infra · 2025-06-10

Modular + AMD: Unleashing AI performance on AMD GPUs

Score 10

Modular + AMD: Unleashing AI performance on AMD GPUs

benchmark

Open

High signal Matched: performance

Modular · inference-infra · 2025-05-29

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon

Score 14

Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon

kernel hardware

Open

High signal Matched: kernel, gpu

Modular · inference-infra · 2025-05-20

Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥

Score 14

Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥

kernel hardware

Open

High signal Matched: kernel, gpu

Modular · inference-infra · 2025-04-17

Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ

Score 10

Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ

hardware

Open

High signal Matched: gpu

Modular · inference-infra · 2025-03-26

What about Triton and Python eDSLs? (Democratizing AI Compute, Part 7)

Score 10

What about Triton and Python eDSLs? (Democratizing AI Compute, Part 7)

kernel triton

Open

High signal Matched: triton

Modular · inference-infra · 2025-03-25

MAX 25.2: Unleash the power of your H200's–without CUDA!

Score 14

MAX 25.2: Unleash the power of your H200's–without CUDA!

kernel cuda hardware

Open

High signal Matched: cuda, h200

Modular · inference-infra · 2025-03-05

What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)

Score 10

What about OpenCL and CUDA C++ alternatives? (Democratizing AI Compute, Part 5)

kernel cuda

Open

High signal Matched: cuda

Modular · inference-infra · 2025-02-20

CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)

Score 10

CUDA is the incumbent, but is it any good? (Democratizing AI Compute, Part 4)

kernel cuda

Open

High signal Matched: cuda

Modular · inference-infra · 2025-02-18

MAX 25.1 - Introducing MAX Builds

Score 10

MAX 25.1 - Introducing MAX Builds

model-release

Open

High signal Matched: introducing

Modular · inference-infra · 2025-02-12

How did CUDA succeed? (Democratizing AI Compute, Part 3)

Score 10

How did CUDA succeed? (Democratizing AI Compute, Part 3)

kernel cuda

Open

High signal Matched: cuda

Modular · inference-infra · 2025-02-06

Paged Attention & Prefix Caching Now Available in MAX Serve

Score 14

Paged Attention & Prefix Caching Now Available in MAX Serve

serving kv-cache

Open

High signal Matched: serve, paged attention

Modular · inference-infra · 2025-02-05

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

Score 10

What exactly is “CUDA”? (Democratizing AI Compute, Part 2)

kernel cuda

Open

High signal Matched: cuda

Modular · inference-infra · 2025-01-30

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

Score 10

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

serving agents

Open

High signal Matched: serve, agents, agentic, function calling

Modular · inference-infra · 2024-12-17

Introducing MAX 24.6: A GPU Native Generative AI Platform

Score 14

Introducing MAX 24.6: A GPU Native Generative AI Platform

hardware model-release

Open

High signal Matched: gpu, introducing

Modular · inference-infra · 2024-12-17

MAX GPU: State of the Art Throughput on a New GenAI platform

Score 14

MAX GPU: State of the Art Throughput on a New GenAI platform

serving benchmark hardware frontier-model

Open

High signal Matched: throughput, gpu, state of the art

Modular · inference-infra · 2024-12-17

Build a Continuous Chat Interface with Llama 3 and MAX Serve

Score 10

Build a Continuous Chat Interface with Llama 3 and MAX Serve

serving

Open

High signal Matched: serve

Modular · inference-infra · 2024-09-13

MAX 24.5 - With SOTA CPU Performance for Llama 3.1

Score 10

MAX 24.5 - With SOTA CPU Performance for Llama 3.1

benchmark frontier-model

Open

High signal Matched: performance, sota

Modular · inference-infra · 2024-07-09

Bring your own PyTorch model

Score 10

Bring your own PyTorch model

model-release

Open

High signal Matched: model

Modular · inference-infra · 2024-06-07

MAX 24.4 - Introducing quantization APIs and MAX on macOS

Score 10

MAX 24.4 - Introducing quantization APIs and MAX on macOS

model-release quantization

Open

High signal Matched: introducing, quantization

Modular · inference-infra · 2024-05-29

What ownership is really about: a mental model approach

Score 10

What ownership is really about: a mental model approach

model-release

Open

High signal Matched: model

Modular · inference-infra · 2024-05-02

MAX 24.3 - Introducing MAX Engine Extensibility

Score 10

MAX 24.3 - Introducing MAX Engine Extensibility

model-release

Open

High signal Matched: introducing

Modular · inference-infra · 2024-04-10

Row-major vs. Column-major Matrices: A Performance Analysis in Mojo and NumPy

Score 10

Row-major vs. Column-major Matrices: A Performance Analysis in Mojo and NumPy

benchmark

Open

High signal Matched: performance

Modular · inference-infra · 2026-05-29

Three trends from MLSys 2026

Score 2

Three trends from MLSys 2026

Open

Watchlist Matched: none

Modular · inference-infra · 2026-05-19

How I built a pure Mojo app (and 10 libraries) with AI agents

Score 1

How I built a pure Mojo app (and 10 libraries) with AI agents

agents

Open

Watchlist Matched: agents

Modular · inference-infra · 2026-05-13

Translating to Mojo via AI Agents

Score 1

Translating to Mojo via AI Agents

agents

Open

Watchlist Matched: agents

Modular · inference-infra · 2026-05-07

Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more

Score 1

Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more

Open

Watchlist Matched: none

Modular · inference-infra · 2026-05-04

Modverse #54: AMD AI DevDay, New Modular Offices, and a Community That Keeps Shipping

Score 1

Modverse #54: AMD AI DevDay, New Modular Offices, and a Community That Keeps Shipping

Open

Watchlist Matched: none

Modular · inference-infra · 2026-04-16

How Frontier Coding Agents Built a Video Diffusion Pipeline on MAX

Score 1

How Frontier Coding Agents Built a Video Diffusion Pipeline on MAX

agents

Open

Watchlist Matched: agents

Modular · inference-infra · 2026-04-10

Modular Opens Edinburgh & San Francisco Offices

Score 1

Modular Opens Edinburgh & San Francisco Offices

Open

Watchlist Matched: none

Modular · inference-infra · 2026-04-03

Structured Mojo Kernels Part 4 - Portability and the Road Ahead

Score 1

Structured Mojo Kernels Part 4 - Portability and the Road Ahead

Open

Watchlist Matched: none

Modular · inference-infra · 2026-03-31

Modverse #54: From GTC to Edinburgh, a Community Building Momentum

Score 1

Modverse #54: From GTC to Edinburgh, a Community Building Momentum

Open

Watchlist Matched: none

Modular · inference-infra · 2026-03-26

Structured Mojo Kernels Part 3 - Composition in Practice

Score 1

Structured Mojo Kernels Part 3 - Composition in Practice

Open

Watchlist Matched: none

Modular · inference-infra · 2026-03-11

Structured Mojo Kernels Part 2 - The Three Pillars

Score 1

Structured Mojo Kernels Part 2 - The Three Pillars

Open

Watchlist Matched: none

Modular · inference-infra · 2026-02-18

The Claude C Compiler: What It Reveals About the Future of Software

Score 1

The Claude C Compiler: What It Reveals About the Future of Software

Open

Watchlist Matched: none

Modular · inference-infra · 2026-02-10

BentoML Joins Modular

Score 1

BentoML Joins Modular

Open

Watchlist Matched: none

Modular · inference-infra · 2026-02-05

The Five Eras of KVCache

Score 1

The Five Eras of KVCache

Open

Watchlist Matched: none

Modular · inference-infra · 2026-01-29

Modular 26.1: A Big Step Towards More Programmable and Portable AI Infrastructure

Score 1

Modular 26.1: A Big Step Towards More Programmable and Portable AI Infrastructure

Open

Watchlist Matched: none

Modular · inference-infra · 2025-12-19

🔥 Modular 2025 Year in Review

Score 1

🔥 Modular 2025 Year in Review

Open

Watchlist Matched: none

Modular · inference-infra · 2025-12-05

The path to Mojo 1.0

Score 1

The path to Mojo 1.0

Open

Watchlist Matched: none

Modular · inference-infra · 2025-12-03

Modverse #52: Advancing AI Together — Community Projects & Platform Milestones

Score 1

Modverse #52: Advancing AI Together — Community Projects & Platform Milestones

Open

Watchlist Matched: none

Modular · inference-infra · 2025-11-06

PyTorch and LLVM in 2025 — Keeping up With AI Innovation

Score 1

PyTorch and LLVM in 2025 — Keeping up With AI Innovation

Open

Watchlist Matched: none

Modular · inference-infra · 2025-09-24

Modular Raises $250M to scale AI's Unified Compute Layer

Score 1

Modular Raises $250M to scale AI's Unified Compute Layer

Open

Watchlist Matched: none

Modular · inference-infra · 2025-09-22

Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple

Score 1

Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple

Open

Watchlist Matched: none

Modular · inference-infra · 2025-09-19

Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects

Score 1

Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects

Open

Watchlist Matched: none

Modular · inference-infra · 2025-08-21

Modverse #50: Modular Platform 25.5, Community Meetups, and Mojo's Debut in the Stack Overflow Developer Survey

Score 1

Modverse #50: Modular Platform 25.5, Community Meetups, and Mojo's Debut in the Stack Overflow Developer Survey

Open

Watchlist Matched: none

Modular · inference-infra · 2025-07-16

AI Agents for AWS Marketplace

Score 1

AI Agents for AWS Marketplace

agents

Open

Watchlist Matched: agents

Modular · inference-infra · 2025-07-09

Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend

Score 1

Modverse #49: Modular Platform 25.4, Modular 🤝 AMD, and Modular Hack Weekend

Open

Watchlist Matched: none

Modular · inference-infra · 2025-07-03

Inside Modular Hack Weekend: Top Projects and Community Highlights

Score 1

Inside Modular Hack Weekend: Top Projects and Community Highlights

Open

Watchlist Matched: none

Modular · inference-infra · 2025-06-20

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

Score 1

How is Modular Democratizing AI Compute? (Democratizing AI Compute, Part 11)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-06-18

Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In

Score 1

Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In

Open

Watchlist Matched: none

Modular · inference-infra · 2025-05-27

Exploring Metaprogramming in Mojo

Score 1

Exploring Metaprogramming in Mojo

Open

Watchlist Matched: none

Modular · inference-infra · 2025-05-08

Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10)

Score 1

Modular’s bet to break out of the Matrix (Democratizing AI Compute, Part 10)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-05-06

Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging

Score 1

Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging

open-source

Open

Watchlist Matched: open source

Modular · inference-infra · 2025-04-23

A New, Simpler License for MAX and Mojo

Score 1

A New, Simpler License for MAX and Mojo

Open

Watchlist Matched: none

Modular · inference-infra · 2025-04-22

Why do HW companies struggle to build AI software? (Democratizing AI Compute, Part 9)

Score 1

Why do HW companies struggle to build AI software? (Democratizing AI Compute, Part 9)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-04-08

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

Score 1

What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-03-12

What about TVM, XLA, and AI compilers? (Democratizing AI Compute, Part 6)

Score 1

What about TVM, XLA, and AI compilers? (Democratizing AI Compute, Part 6)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-02-27

Modverse #46: MAX 25.1, MAX Builds, and Democratizing AI Compute

Score 1

Modverse #46: MAX 25.1, MAX Builds, and Democratizing AI Compute

Open

Watchlist Matched: none

Modular · inference-infra · 2025-01-30

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

Score 1

DeepSeek's Impact on AI (Democratizing AI Compute, Part 1)

Open

Watchlist Matched: none

Modular · inference-infra · 2025-01-23

Use MAX with Open WebUI for RAG and Web Search

Score 1

Use MAX with Open WebUI for RAG and Web Search

rag

Open

Watchlist Matched: rag

Modular · inference-infra · 2025-01-21

Hands-on with Mojo 24.6

Score 1

Hands-on with Mojo 24.6

Open

Watchlist Matched: none

Modular · inference-infra · 2024-12-19

Evaluating Llama Guard with MAX 24.6 and Hugging Face

Score 1

Evaluating Llama Guard with MAX 24.6 and Hugging Face

evals

Open

Watchlist Matched: evaluating

Modular · inference-infra · 2024-10-25

Understanding SIMD: Infinite Complexity of Trivial Problems

Score 1

Understanding SIMD: Infinite Complexity of Trivial Problems

Open

Watchlist Matched: none

Modular · inference-infra · 2024-10-10

Community Spotlight: Writing Mojo with Cursor

Score 1

Community Spotlight: Writing Mojo with Cursor

Open

Watchlist Matched: none

Modular · inference-infra · 2024-10-01

Hands-on with Mojo 24.5

Score 1

Hands-on with Mojo 24.5

Open

Watchlist Matched: none

Modular · inference-infra · 2024-07-23

Announcing stack-pr: an open source tool for managing stacked PRs on GitHub

Score 1

Announcing stack-pr: an open source tool for managing stacked PRs on GitHub

open-source

Open

Watchlist Matched: open source

Modular · inference-infra · 2024-07-16

Debugging in Mojo🔥

Score 1

Debugging in Mojo🔥

Open

Watchlist Matched: none

Modular · inference-infra · 2024-07-09

Take control of your AI

Score 1

Take control of your AI

Open

Watchlist Matched: none

Modular · inference-infra · 2024-07-09

Develop locally, deploy globally

Score 1

Develop locally, deploy globally

Open

Watchlist Matched: none

Modular · inference-infra · 2024-07-03

A brief guide to the Mojo n-body example

Score 1

A brief guide to the Mojo n-body example

Open

Watchlist Matched: none

Modular · inference-infra · 2024-06-25

What's new in MAX 24.4? MAX on macOS, fast local Llama3, native quantization and GGUF support

Score 1

What's new in MAX 24.4? MAX on macOS, fast local Llama3, native quantization and GGUF support

quantization

Open

Watchlist Matched: quantization, gguf

Modular · inference-infra · 2024-06-17

What’s new in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements

Score 1

What’s new in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements

Open

Watchlist Matched: none

Modular · inference-infra · 2024-06-04

Deep dive into ownership in Mojo

Score 1

Deep dive into ownership in Mojo

Open

Watchlist Matched: none

Modular · inference-infra · 2024-05-20

Fast⚡k-means clustering in Mojo🔥: a guide to porting Python to Mojo🔥 for accelerated k-means clustering

Score 1

Fast⚡k-means clustering in Mojo🔥: a guide to porting Python to Mojo🔥 for accelerated k-means clustering

Open

Watchlist Matched: none

Modular · inference-infra · 2024-05-08

Developer Voices: Deep Dive with Chris Lattner on Mojo

Score 1

Developer Voices: Deep Dive with Chris Lattner on Mojo

Open

Watchlist Matched: none

Modular · inference-infra · 2024-05-02

What’s New in Mojo 24.3: Community Contributions, Pythonic Collections and Core Language Enhancements

Score 1

What’s New in Mojo 24.3: Community Contributions, Pythonic Collections and Core Language Enhancements

Open

Watchlist Matched: none

Modular · inference-infra · 2024-04-02

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

Score 1

What’s new in Mojo 24.2: Mojo Nightly, Enhanced Python Interop, OSS stdlib and more

open-source

Open

Watchlist Matched: oss

Modular · inference-infra · 2024-03-28

The Next Big Step in Mojo🔥 Open Source

Score 1

The Next Big Step in Mojo🔥 Open Source

open-source

Open

Watchlist Matched: open source