MLSys Radar

Together AI

AI cloud provider covering inference, fine-tuning, GPU clusters, optimization, and research.

Country
Unknown
Category
inference-infra
Blog
https://www.together.ai/blog
Feed
Feed discovery status
pending

Together AI · inference-infra · 2026-05-11

Serving DeepSeek-V4: why million-token context is an inference systems problem

Score 22

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...

inference serving kernel hardware long-context api

Open

High signal Matched: inference, serving, endpoint, kernel, b200, long-context

Together AI · inference-infra · 2026-05-08

Deploy and inference any model from HuggingFace

Score 20

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.

inference hardware model-release

Open

High signal Matched: inference, gpu, release, model

Together AI · inference-infra · 2026-04-29

DeepSeek-V4 Pro now available on Together AI

Score 10

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.

research long-context agents

Open

High signal Matched: research, long-context, agents

Together AI · inference-infra · 2026-04-07

What is an AI Native Cloud?

Score 12

AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.

cloud

Open

High signal Matched: cloud

Together AI · inference-infra · 2026-04-01

Inside the Together AI kernels team

Score 16

The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.

kernel hardware

Open

High signal Matched: kernel, flashattention, gpu

Together AI · inference-infra · 2026-03-31

Aurora

Score 12

1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.

inference speculative-decoding open-source

Open

High signal Matched: decoding, speculative decoding, open-source

Together AI · inference-infra · 2026-03-17

Mamba-3

Score 10

Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.

inference open-source

Open

High signal Matched: inference, open-source

Together AI · inference-infra · 2026-03-12

Build real-time voice agents on Together AI

Score 10

Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.

benchmark agents

Open

High signal Matched: latency, agents

Together AI · inference-infra · 2026-03-04

Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving

Score 20

Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM...

inference serving benchmark long-context

Open

High signal Matched: inference, serving, prefill, throughput, long-context

Together AI · inference-infra · 2026-03-02

Introducing Together AI’s new look

Score 14

We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.

model-release research open-source

Open

High signal Matched: introducing, research, open-source

Together AI · inference-infra · 2026-02-06

What do LLMs think when you don't tell them what to think about?

Score 10

What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-02-02

Fine-tuning open LLM judges to outperform GPT-5.2

Score 14

Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...

inference benchmark model-release fine-tuning evals open-source

Open

High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

inference benchmark model-release research training evals agents open-source

Open

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

Together AI · inference-infra · 2026-01-13

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Score 24

Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...

inference benchmark hardware model-release quantization agents

Open

High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents

Together AI · inference-infra · 2025-12-17

Research POV: Yes, AGI Can Happen – A Computational Perspective

Score 14

Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...

benchmark research

Open

High signal Matched: performance, research

Together AI · inference-infra · 2025-12-03

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Score 20

AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...

inference speculative-decoding model-release

Open

High signal Matched: inference, decoding, speculative decoding, introducing

Together AI · inference-infra · 2025-12-01

Together AI delivers fastest inference for the top open-source models

Score 20

Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...

inference speculative-decoding hardware quantization evals open-source

Open

High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source

Together AI · inference-infra · 2026-04-30

Announcing Together AI and Adaption Partnership

Score 3

Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.

fine-tuning evals

Open

Watchlist Matched: fine-tuning, evaluate

Together AI · inference-infra · 2025-12-18

Rime voice models now available on Together AI

Score 3

Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-06-09

The Frontier is Open

Score 3

No feed summary available yet.

Open

Watchlist Matched: none