Together AI - MLSys Blogs

Together AI · inference-infra · 2026-06-02

Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets

Score 17

How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.

inference serving

Open

High signal Matched: inference, serving

Together AI · inference-infra · 2026-05-29

How Together AI built the world’s fastest speech-to-text stack

Score 13

Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.

inference hardware

Open

High signal Matched: inference, gpu

Together AI · inference-infra · 2026-05-19

Benchmarking inference at scale: coding agents

Score 16

Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.

inference benchmark evals agents

Open

High signal Matched: inference, ttft, cost, benchmarks, agents

Together AI · inference-infra · 2026-05-15

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Score 24

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.

inference serving benchmark model-release research api

Open

High signal Matched: inference, endpoint, cost, launch, research

Together AI · inference-infra · 2026-05-12

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

Score 12

Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.

model-release

Open

High signal Matched: introducing

Together AI · inference-infra · 2026-05-11

Serving DeepSeek-V4: why million-token context is an inference systems problem

Score 22

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...

inference serving kernel hardware long-context api

Open

High signal Matched: inference, serving, endpoint, kernel, b200, long-context

Together AI · inference-infra · 2026-05-08

Deploy and inference any model from HuggingFace

Score 20

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.

inference hardware model-release

Open

High signal Matched: inference, gpu, release, model

Together AI · inference-infra · 2026-05-04

Foundational research powering efficient inference at scale

Score 16

As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.

inference research

Open

High signal Matched: inference, research

Together AI · inference-infra · 2026-04-29

DeepSeek-V4 Pro now available on Together AI

Score 10

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.

research long-context agents

Open

High signal Matched: research, long-context, agents

Together AI · inference-infra · 2026-04-28

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

Score 12

NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.

model-release agents open-source

Open

High signal Matched: model, open model, agentic

Together AI · inference-infra · 2026-04-24

Accelerate RL rollouts by up to 50% with distribution-aware speculative decoding

Score 16

Rollout is the silent bottleneck in RL post-training. DAS fixes it with adaptive speculative decoding — up to 50% faster, zero degradation in reward quality.

inference speculative-decoding training

Open

High signal Matched: decoding, speculative decoding, training, post-training

Together AI · inference-infra · 2026-04-21

Capacity without conflict: A guide to multi-tenant GPU cluster design for AI-native teams

Score 12

Learn how AI-native companies design multi-tenant GPU clusters that pool capacity without sacrificing team isolation — and how Together AI makes it work in practice.

hardware

Open

High signal Matched: gpu

Together AI · inference-infra · 2026-04-15

Parcae: Doing more with fewer parameters using stable looped models

Score 14

Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just...

benchmark model-release

Open

High signal Matched: performance, model

Together AI · inference-infra · 2026-04-07

What is an AI Native Cloud?

Score 12

AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.

cloud

Open

High signal Matched: cloud

Together AI · inference-infra · 2026-04-03

Wan 2.7 video model suite now available on Together AI

Score 14

A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.

inference model-release

Open

High signal Matched: generation, model

Together AI · inference-infra · 2026-04-03

AI for Systems: Using LLMs to Optimize Database Query Execution

Score 10

New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-04-02

Deepgram speech-to-text and voice models now available natively on Together AI

Score 14

Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.

inference model-release agents

Open

High signal Matched: inference, model, agents

Together AI · inference-infra · 2026-04-01

Inside the Together AI kernels team

Score 16

The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.

kernel hardware

Open

High signal Matched: kernel, flashattention, gpu

Together AI · inference-infra · 2026-03-31

Aurora

Score 12

1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.

inference speculative-decoding open-source

Open

High signal Matched: decoding, speculative decoding, open-source

Together AI · inference-infra · 2026-03-26

Plan, divide, and conquer: How weak models excel at long context tasks

Score 10

As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and...

benchmark long-context

Open

High signal Matched: performance, long context

Together AI · inference-infra · 2026-03-18

Together AI expands fine-tuning service with tool calling, reasoning, and vision support

Score 14

Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.

serving benchmark model-release training fine-tuning

Open

High signal Matched: throughput, cost, model, training, fine-tuning

Together AI · inference-infra · 2026-03-17

Mamba-3

Score 10

Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.

inference open-source

Open

High signal Matched: inference, open-source

Together AI · inference-infra · 2026-03-16

Together AI at NVIDIA GTC 2026: Explore our latest innovations across research and products

Score 14

Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.

inference research agents

Open

High signal Matched: inference, research, agents

Together AI · inference-infra · 2026-03-12

Build real-time voice agents on Together AI

Score 10

Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.

benchmark agents

Open

High signal Matched: latency, agents

Together AI · inference-infra · 2026-03-11

Together AI Brings NVIDIA Nemotron 3 to Developers on Day 0

Score 10

NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.

inference long-context agents

Open

High signal Matched: inference, context window, agent

Together AI · inference-infra · 2026-03-10

New in Together GPU Clusters: Autoscaling, observability, and self-healing

Score 12

Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise...

hardware

Open

High signal Matched: gpu

Together AI · inference-infra · 2026-03-05

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

Score 20

As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to soft...

serving kernel benchmark hardware

Open

High signal Matched: throughput, kernel, flashattention, gpu

Together AI · inference-infra · 2026-03-05

Key research and product announcements at the AI Native Conf

Score 18

At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.

inference kernel research cloud

Open

High signal Matched: inference, flashattention, research, cloud

Together AI · inference-infra · 2026-03-04

Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving

Score 20

Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM...

inference serving benchmark long-context

Open

High signal Matched: inference, serving, prefill, throughput, long-context

Together AI · inference-infra · 2026-03-02

Introducing Together AI’s new look

Score 14

We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.

model-release research open-source

Open

High signal Matched: introducing, research, open-source

Together AI · inference-infra · 2026-02-23

How speech models fail where it matters the most and what to do about it

Score 10

State-of-the-art speech models like Whisper and Deepgram score near-human on benchmarks — then fail 39% of the time on street names. New research from Together AI exposes the gap and a fix.

research evals

Open

High signal Matched: research, benchmarks

Together AI · inference-infra · 2026-02-19

Consistency diffusion language models: Up to 14x faster inference without sacrificing quality

Score 14

Standard diffusion language models can't use KV caching and need too many refinement steps to be practical. CDLM fixes both with a post-training recipe that enables exact block-wise KV caching and trajectory-consistent step reduction — del...

inference benchmark training

Open

High signal Matched: inference, latency, training, post-training

Together AI · inference-infra · 2026-02-12

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Score 16

Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.

inference model-release

Open

High signal Matched: inference, introducing

Together AI · inference-infra · 2026-02-06

What do LLMs think when you don't tell them what to think about?

Score 10

What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-02-02

Fine-tuning open LLM judges to outperform GPT-5.2

Score 14

Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...

inference benchmark model-release fine-tuning evals open-source

Open

High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss

Together AI · inference-infra · 2026-02-02

Together Evaluations now supports comparing top commercial APIs vs. open source models

Score 12

Together Evaluations now supports OpenAI, Anthropic, and Google models for cross-provider benchmarking. Compare open-source, fine-tuned, and proprietary models side-by-side to make data-driven decisions on quality, cost, and performance—al...

benchmark open-source

Open

High signal Matched: performance, cost, open-source, open source

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

inference benchmark model-release research training evals agents open-source

Open

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

Together AI · inference-infra · 2026-01-22

Optimizing inference speed and costs: Lessons learned from large-scale deployments

Score 22

Learn how to reduce inference latency without massive cost using proven inference optimization tactics — improving throughput, GPU utilization, and cost efficiency while balancing throughput vs. latency tradeoffs.

inference serving benchmark hardware

Open

High signal Matched: inference, throughput, latency, cost, gpu

Together AI · inference-infra · 2026-01-13

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Score 24

Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...

inference benchmark hardware model-release quantization agents

Open

High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents

Together AI · inference-infra · 2026-01-12

Inside multi-node training: How to scale model training across GPU clusters

Score 22

Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.

distributed hardware model-release training

Open

High signal Matched: distributed, multi-node, gpu, model, training, distributed training

Together AI · inference-infra · 2026-01-08

How to choose the right open model for production

Score 20

Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.

benchmark model-release evals open-source

Open

High signal Matched: performance, cost, model, open model, evaluating, open-source

Together AI · inference-infra · 2025-12-23

MiniMax Speech 2.6 Turbo now available natively on Together AI

Score 10

MiniMax Speech 2.6 Turbo: State-of-the-art multilingual TTS with human-level emotional awareness, sub-250ms latency, and 40+ languages—now on Together AI.

benchmark

Open

High signal Matched: latency

Together AI · inference-infra · 2025-12-17

Research POV: Yes, AGI Can Happen – A Computational Perspective

Score 14

Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...

benchmark research

Open

High signal Matched: performance, research

Together AI · inference-infra · 2025-12-15

Announcing native availability of NVIDIA Nemotron 3 Nano, NVIDIA’s latest reasoning model

Score 14

Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud

model-release cloud frontier-model

Open

High signal Matched: model, cloud, reasoning model

Together AI · inference-infra · 2025-12-03

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Score 20

AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...

inference speculative-decoding model-release

Open

High signal Matched: inference, decoding, speculative decoding, introducing

Together AI · inference-infra · 2025-12-03

Together AI and Meta partner to bring PyTorch Reinforcement Learning to the AI Native Cloud

Score 12

Build, train, and deploy advanced AI agents with integrated reinforcement learning on the Together platform.

cloud agents

Open

High signal Matched: cloud, agents

Together AI · inference-infra · 2025-12-03

How to run TorchForge reinforcement learning pipelines in the Together AI Native Cloud

Score 12

No feed summary available yet.

cloud

Open

High signal Matched: cloud

Together AI · inference-infra · 2025-12-01

Together AI delivers fastest inference for the top open-source models

Score 20

Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...

inference speculative-decoding hardware quantization evals open-source

Open

High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source

Together AI · inference-infra · 2025-11-25

FLUX.2: Multi-reference image generation now available on Together AI

Score 12

Production-grade image generation with multi-reference consistency, exact brand colors, and reliable text rendering. FLUX.2 from Black Forest Labs, now on Together AI's platform.

inference

Open

High signal Matched: generation

Together AI · inference-infra · 2025-11-04

Announcing the fastest inference for realtime voice AI agents

Score 14

Together AI launches the fastest voice AI stack: streaming Whisper STT, serverless open-source TTS (Orpheus & Kokoro), and Voxtral transcription. Sub-second latency for production voice agents.

inference benchmark agents open-source

Open

High signal Matched: inference, latency, agents, open-source

Together AI · inference-infra · 2025-11-04

How to evaluate and benchmark Large Language Models (LLMs)

Score 12

Understanding how to evaluate and benchmark Large Language Models (LLMS). Test, compare, and understand LLMs.

benchmark evals

Open

High signal Matched: benchmark, evaluate

Together AI · inference-infra · 2025-10-22

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

Score 12

ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.

benchmark

Open

High signal Matched: benchmark

Together AI · inference-infra · 2025-10-21

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

Score 16

Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.

inference model-release api

Open

High signal Matched: generation, model, openai-compatible

Together AI · inference-infra · 2025-10-15

Announcing the Together AI Startup Accelerator, purpose-built for AI Native Apps

Score 12

We've launched the Together AI Startup Accelerator: Up to $50K credits, expert engineering hours, GTM support, community and VC access for AI-native apps in build–scale tiers.

hardware

Open

High signal Matched: accelerator

Together AI · inference-infra · 2025-10-10

AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

Score 20

LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance without manual tuning.

inference moe benchmark hardware

Open

High signal Matched: inference, deepseek-v3, performance, accelerator

Together AI · inference-infra · 2025-09-15

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Score 18

Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...

inference benchmark model-release api

Open

High signal Matched: inference, cost, model, api

Together AI · inference-infra · 2025-09-09

Announcing General Availability of Together Instant Clusters, offering ready to use, self-service NVIDIA GPUs

Score 18

Together AI launches Instant Clusters: self-service GPU clusters with NVIDIA H100/B200, ready in minutes for training or inference at any scale.

inference hardware training

Open

High signal Matched: inference, gpu, h100, b200, training

Together AI · inference-infra · 2025-08-27

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

Score 16

Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.

moe model-release evals

Open

High signal Matched: deepseek-v3, model, swe-bench

Together AI · inference-infra · 2025-08-21

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

Score 16

Build AI agents for complex, long-running engineering tasks. Learn key patterns from a case study: accelerating LLM inference with speculative decoding.

inference speculative-decoding agents

Open

High signal Matched: inference, decoding, speculative decoding, agents

Together AI · inference-infra · 2025-08-19

Transform OpenAI gpt-oss Models into Domain Experts with Together AI Fine-Tuning

Score 10

Customize OpenAI’s gpt-oss-20B/120B with Together AI’s fine-tuning: train, optimize, and instantly deploy domain experts with enterprise reliability and cost efficiency.

benchmark fine-tuning open-source

Open

High signal Matched: cost, fine-tuning, oss

Together AI · inference-infra · 2025-08-15

Fine-Tuning Small Open-Source LLMs to Outperform Large Closed-Source Models by 60% on Specialized Tasks

Score 12

Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.

model-release fine-tuning open-source

Open

High signal Matched: model, fine-tuning, open-source

Together AI · inference-infra · 2025-08-05

Announcing the Availability of OpenAI's Open Models on Together AI

Score 12

Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.

model-release open-source

Open

High signal Matched: model, oss

Together AI · inference-infra · 2025-07-28

Together Evaluations: Benchmark Models for Your Tasks

Score 16

Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.

benchmark model-release open-source

Open

High signal Matched: benchmark, model, open-source

Together AI · inference-infra · 2025-07-25

Qwen3-Coder: The Most Capable Agentic Coding Model Now Available on Together AI

Score 12

Unlock agentic coding with Qwen3-Coder on Together AI: 256K context, SWE-bench rivaling Claude Sonnet 4, zero-setup instant deployment.

model-release evals agents

Open

High signal Matched: model, swe-bench, agentic

Together AI · inference-infra · 2025-07-17

Together AI Delivers Top Speeds for DeepSeek-R1-0528 Inference on NVIDIA Blackwell

Score 18

Together AI inference is now among the world’s fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for NVIDIA HGX B200.

inference hardware open-source

Open

High signal Matched: inference, b200, blackwell, open-source

Together AI · inference-infra · 2025-07-14

Kimi K2: Leading Open-Source Model Now Available on Together AI

Score 16

Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.

benchmark model-release agents open-source

Open

High signal Matched: cost, model, open model, agentic, open-source

Together AI · inference-infra · 2025-07-10

Together AI Launches Speech-to-Text: High-Performance Whisper APIs

Score 12

No feed summary available yet.

benchmark

Open

High signal Matched: performance

Together AI · inference-infra · 2025-06-11

Introducing the Together AI Batch API: Process Thousands of LLM Requests at 50% Lower Cost

Score 16

No feed summary available yet.

benchmark model-release api

Open

High signal Matched: cost, introducing, api

Together AI · inference-infra · 2025-06-05

Model-Preserving Adaptive Rounding with YAQA

Score 12

No feed summary available yet.

model-release

Open

High signal Matched: model

Together AI · inference-infra · 2025-05-20

Introducing Together Code Sandbox & Together Code Interpreter: SOTA code execution for AI

Score 12

No feed summary available yet.

model-release frontier-model

Open

High signal Matched: introducing, sota

Together AI · inference-infra · 2025-05-12

Boosting DeepSeek-R1’s Speed with Customized Speculative Decoding

Score 16

No feed summary available yet.

inference speculative-decoding

Open

High signal Matched: decoding, speculative decoding

Together AI · inference-infra · 2025-05-05

From AWS to Together Dedicated Endpoints: Arcee AI's journey to greater inference flexibility

Score 12

No feed summary available yet.

inference

Open

High signal Matched: inference

Together AI · inference-infra · 2025-04-24

Salesforce, Zoom, InVideo Train Faster with Together AI Turbocharged with NVIDIA Blackwell

Score 12

No feed summary available yet.

hardware

Open

High signal Matched: blackwell

Together AI · inference-infra · 2026-05-14

Violin: An open-source video translation skill that breaks language barriers

Score 3

Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.

open-source

Open

Watchlist Matched: open-source

Together AI · inference-infra · 2026-04-30

From 732 bytes to nowhere: shutting down Copy Fail in production

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Together AI · inference-infra · 2026-04-30

Announcing Together AI and Adaption Partnership

Score 3

Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.

fine-tuning evals

Open

Watchlist Matched: fine-tuning, evaluate

Together AI · inference-infra · 2026-04-13

EinsteinArena: Harnessing the collective intelligence of agents in the wild to advance science

Score 3

EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on open math problems — including pushing the kissing number lower bound...

agents

Open

Watchlist Matched: agents

Together AI · inference-infra · 2026-02-25

CoderForge-Preview: SOTA open dataset for training efficient coding agents

Score 3

No feed summary available yet.

training agents frontier-model

Open

Watchlist Matched: training, agents, sota

Together AI · inference-infra · 2026-02-04

Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Together AI · inference-infra · 2026-02-03

Together AI welcomes Alon Gavrielov as VP of Infrastructure Strategy

Score 0

Hiring Alon Gavrielov further deepens Together AI’s commitment to building AI factories that deliver the most reliable, efficient, and scalable infrastructure for AI-native teams.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-12-18

Rime voice models now available on Together AI

Score 3

Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-12-12

Announcing Together Python SDK v2.0

Score 3

No feed summary available yet.

api

Open

Watchlist Matched: sdk

Together AI · inference-infra · 2025-10-28

Dynamic AI agent testing for the real world with Collinear Simulations and Together Evals

Score 3

Test AI agents in the real world with Collinear TraitMix and Together Evals: dynamic persona simulations, multi-turn dialogs, and LLM-as-judge scoring.

evals agents

Open

Watchlist Matched: evals, agent, agents

Together AI · inference-infra · 2025-09-10

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging Face Integrations

Score 3

Together AI expands Fine-Tuning Platform: train 100B+ models, extend context lengths, integrate with Hugging Face Hub, and access new DPO options.

training fine-tuning

Open

Watchlist Matched: dpo, fine-tuning

Together AI · inference-infra · 2025-09-10

Together AI welcomes Mahadev Konar as SVP for Infrastructure Engineering

Score 2

Hiring Mahadev Konar further deepens Together AI’s commitment to deliver the most reliable and scalable GPU infrastructure.

hardware

Open

Watchlist Matched: gpu

Together AI · inference-infra · 2025-08-11

OpenAI's New Open gpt-oss Models vs o4-mini: A Real-World Comparison

Score 3

No feed summary available yet.

open-source

Open

Watchlist Matched: oss

Together AI · inference-infra · 2025-07-29

VirtueGuard: Enterprise-Grade AI Security and Safety Now on Together AI

Score 3

No feed summary available yet.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-07-08

Powering Secure AI: Together AI Achieves SOC 2 Type 2 Compliance

Score 3

Build and deploy AI with peace of mind—Together AI is now SOC 2 Type 2 certified, proving our encryption, access controls, and 24/7 monitoring meet the highest security standards.

Open

Watchlist Matched: none

Together AI · inference-infra · 2025-07-02