AI cloud provider covering inference, fine-tuning, GPU clusters, optimization, and research.
Together AI · inference-infra · 2026-06-02
Score 17
How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.
High signal Matched: inference, serving
Together AI · inference-infra · 2026-05-29
Score 13
Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.
High signal Matched: inference, gpu
Together AI · inference-infra · 2026-05-19
Score 16
Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.
High signal Matched: inference, ttft, cost, benchmarks, agents
Together AI · inference-infra · 2026-05-15
Score 24
Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.
High signal Matched: inference, endpoint, cost, launch, research
Together AI · inference-infra · 2026-05-12
Score 12
Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.
High signal Matched: introducing
Together AI · inference-infra · 2026-05-11
Score 22
DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte...
High signal Matched: inference, serving, endpoint, kernel, b200, long-context
Together AI · inference-infra · 2026-05-08
Score 20
Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.
High signal Matched: inference, gpu, release, model
Together AI · inference-infra · 2026-05-04
Score 16
As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.
High signal Matched: inference, research
Together AI · inference-infra · 2026-04-29
Score 10
DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.
High signal Matched: research, long-context, agents
Together AI · inference-infra · 2026-04-28
Score 12
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
High signal Matched: model, open model, agentic
Together AI · inference-infra · 2026-04-24
Score 16
Rollout is the silent bottleneck in RL post-training. DAS fixes it with adaptive speculative decoding — up to 50% faster, zero degradation in reward quality.
High signal Matched: decoding, speculative decoding, training, post-training
Together AI · inference-infra · 2026-04-21
Score 12
Learn how AI-native companies design multi-tenant GPU clusters that pool capacity without sacrificing team isolation — and how Together AI makes it work in practice.
High signal Matched: gpu
Together AI · inference-infra · 2026-04-15
Score 14
Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just...
High signal Matched: performance, model
Together AI · inference-infra · 2026-04-07
Score 12
AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.
High signal Matched: cloud
Together AI · inference-infra · 2026-04-03
Score 14
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
High signal Matched: generation, model
Together AI · inference-infra · 2026-04-03
Score 10
New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.
High signal Matched: research
Together AI · inference-infra · 2026-04-02
Score 14
Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.
High signal Matched: inference, model, agents
Together AI · inference-infra · 2026-04-01
Score 16
The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.
High signal Matched: kernel, flashattention, gpu
Together AI · inference-infra · 2026-03-31
Score 12
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
High signal Matched: decoding, speculative decoding, open-source
Together AI · inference-infra · 2026-03-26
Score 10
As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and...
High signal Matched: performance, long context
Together AI · inference-infra · 2026-03-18
Score 14
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
High signal Matched: throughput, cost, model, training, fine-tuning
Together AI · inference-infra · 2026-03-17
Score 10
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
High signal Matched: inference, open-source
Together AI · inference-infra · 2026-03-16
Score 14
Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.
High signal Matched: inference, research, agents
Together AI · inference-infra · 2026-03-12
Score 10
Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.
High signal Matched: latency, agents
Together AI · inference-infra · 2026-03-11
Score 10
NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.
High signal Matched: inference, context window, agent
Together AI · inference-infra · 2026-03-10
Score 12
Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise...
High signal Matched: gpu
Together AI · inference-infra · 2026-03-05
Score 20
As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to soft...
High signal Matched: throughput, kernel, flashattention, gpu
Together AI · inference-infra · 2026-03-05
Score 18
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
High signal Matched: inference, flashattention, research, cloud
Together AI · inference-infra · 2026-03-04
Score 20
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM...
High signal Matched: inference, serving, prefill, throughput, long-context
Together AI · inference-infra · 2026-03-02
Score 14
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
High signal Matched: introducing, research, open-source
Together AI · inference-infra · 2026-02-23
Score 10
State-of-the-art speech models like Whisper and Deepgram score near-human on benchmarks — then fail 39% of the time on street names. New research from Together AI exposes the gap and a fix.
High signal Matched: research, benchmarks
Together AI · inference-infra · 2026-02-19
Score 14
Standard diffusion language models can't use KV caching and need too many refinement steps to be practical. CDLM fixes both with a post-training recipe that enables exact block-wise KV caching and trajectory-consistent step reduction — del...
High signal Matched: inference, latency, training, post-training
Together AI · inference-infra · 2026-02-12
Score 16
Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
High signal Matched: inference, introducing
Together AI · inference-infra · 2026-02-06
Score 10
What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...
High signal Matched: research
Together AI · inference-infra · 2026-02-02
Score 14
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower c...
High signal Matched: inference, cost, model, fine-tuning, evaluating, open-source, oss
Together AI · inference-infra · 2026-02-02
Score 12
Together Evaluations now supports OpenAI, Anthropic, and Google models for cross-provider benchmarking. Compare open-source, fine-tuned, and proprietary models side-by-side to make data-driven decisions on quality, cost, and performance—al...
High signal Matched: performance, cost, open-source, open source
Together AI · inference-infra · 2026-01-26
Score 18
Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...
High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source
Together AI · inference-infra · 2026-01-22
Score 22
Learn how to reduce inference latency without massive cost using proven inference optimization tactics — improving throughput, GPU utilization, and cost efficiency while balancing throughput vs. latency tradeoffs.
High signal Matched: inference, throughput, latency, cost, gpu
Together AI · inference-infra · 2026-01-13
Score 24
Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latenc...
High signal Matched: inference, latency, b200, gb200, blackwell, model, quantization, agents
Together AI · inference-infra · 2026-01-12
Score 22
Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.
High signal Matched: distributed, multi-node, gpu, model, training, distributed training
Together AI · inference-infra · 2026-01-08
Score 20
Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.
High signal Matched: performance, cost, model, open model, evaluating, open-source
Together AI · inference-infra · 2025-12-23
Score 10
MiniMax Speech 2.6 Turbo: State-of-the-art multilingual TTS with human-level emotional awareness, sub-250ms latency, and 40+ languages—now on Together AI.
High signal Matched: latency
Together AI · inference-infra · 2025-12-17
Score 14
Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...
High signal Matched: performance, research
Together AI · inference-infra · 2025-12-15
Score 14
Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud
High signal Matched: model, cloud, reasoning model
Together AI · inference-infra · 2025-12-03
Score 20
AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standar...
High signal Matched: inference, decoding, speculative decoding, introducing
Together AI · inference-infra · 2025-12-03
Score 12
Build, train, and deploy advanced AI agents with integrated reinforcement learning on the Together platform.
High signal Matched: cloud, agents
Together AI · inference-infra · 2025-12-03
Score 12
No feed summary available yet.
High signal Matched: cloud
Together AI · inference-infra · 2025-12-01
Score 20
Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell archit...
High signal Matched: inference, decoding, speculative decoding, gpu, blackwell, quantization, benchmarks, open-source
Together AI · inference-infra · 2025-11-25
Score 12
Production-grade image generation with multi-reference consistency, exact brand colors, and reliable text rendering. FLUX.2 from Black Forest Labs, now on Together AI's platform.
High signal Matched: generation
Together AI · inference-infra · 2025-11-04
Score 14
Together AI launches the fastest voice AI stack: streaming Whisper STT, serverless open-source TTS (Orpheus & Kokoro), and Voxtral transcription. Sub-second latency for production voice agents.
High signal Matched: inference, latency, agents, open-source
Together AI · inference-infra · 2025-11-04
Score 12
Understanding how to evaluate and benchmark Large Language Models (LLMS). Test, compare, and understand LLMs.
High signal Matched: benchmark, evaluate
Together AI · inference-infra · 2025-10-22
Score 12
ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.
High signal Matched: benchmark
Together AI · inference-infra · 2025-10-21
Score 16
Together AI adds 40+ image & video models, including Sora 2 and Veo 3, to build end-to-end multimodal apps with unified OpenAI-compatible APIs and transparent pricing.
High signal Matched: generation, model, openai-compatible
Together AI · inference-infra · 2025-10-15
Score 12
We've launched the Together AI Startup Accelerator: Up to $50K credits, expert engineering hours, GTM support, community and VC access for AI-native apps in build–scale tiers.
High signal Matched: accelerator
Together AI · inference-infra · 2025-10-10
Score 20
LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance without manual tuning.
High signal Matched: inference, deepseek-v3, performance, accelerator
Together AI · inference-infra · 2025-09-15
Score 18
Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of...
High signal Matched: inference, cost, model, api
Together AI · inference-infra · 2025-09-09
Score 18
Together AI launches Instant Clusters: self-service GPU clusters with NVIDIA H100/B200, ready in minutes for training or inference at any scale.
High signal Matched: inference, gpu, h100, b200, training
Together AI · inference-infra · 2025-08-27
Score 16
Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.
High signal Matched: deepseek-v3, model, swe-bench
Together AI · inference-infra · 2025-08-21
Score 16
Build AI agents for complex, long-running engineering tasks. Learn key patterns from a case study: accelerating LLM inference with speculative decoding.
High signal Matched: inference, decoding, speculative decoding, agents
Together AI · inference-infra · 2025-08-19
Score 10
Customize OpenAI’s gpt-oss-20B/120B with Together AI’s fine-tuning: train, optimize, and instantly deploy domain experts with enterprise reliability and cost efficiency.
High signal Matched: cost, fine-tuning, oss
Together AI · inference-infra · 2025-08-15
Score 12
Parsed fine-tuned a 27B open-source model to beat Claude Sonnet 4 by 60% on a real-world healthcare task—while running 10–100x cheaper.
High signal Matched: model, fine-tuning, open-source
Together AI · inference-infra · 2025-08-05
Score 12
Access OpenAI’s gpt-oss-120B on Together AI: Apache-2.0 open-weight model with serverless & dedicated endpoints, $0.50/1M in, $1.50/1M out, 99.9% SLA.
High signal Matched: model, oss
Together AI · inference-infra · 2025-07-28
Score 16
Together Evaluations is a flexible framework for benchmarking LLMs using strong open-source models as judges. Skip manual labeling and rigid metrics—get fast, customizable insights into model quality for your specific tasks.
High signal Matched: benchmark, model, open-source
Together AI · inference-infra · 2025-07-25
Score 12
Unlock agentic coding with Qwen3-Coder on Together AI: 256K context, SWE-bench rivaling Claude Sonnet 4, zero-setup instant deployment.
High signal Matched: model, swe-bench, agentic
Together AI · inference-infra · 2025-07-17
Score 18
Together AI inference is now among the world’s fastest, most capable platforms for running open-source reasoning models like DeepSeek-R1 at scale, thanks to our new inference engine designed for NVIDIA HGX B200.
High signal Matched: inference, b200, blackwell, open-source
Together AI · inference-infra · 2025-07-14
Score 16
Run Kimi K2 (1T params) on Together AI—frontier open model for agentic reasoning and coding, serverless deployment, 99.9% SLA, lower cost and instant scaling.
High signal Matched: cost, model, open model, agentic, open-source
Together AI · inference-infra · 2025-07-10
Score 12
No feed summary available yet.
High signal Matched: performance
Together AI · inference-infra · 2025-06-11
Score 16
No feed summary available yet.
High signal Matched: cost, introducing, api
Together AI · inference-infra · 2025-06-05
Score 12
No feed summary available yet.
High signal Matched: model
Together AI · inference-infra · 2025-05-20
Score 12
No feed summary available yet.
High signal Matched: introducing, sota
Together AI · inference-infra · 2025-05-12
Score 16
No feed summary available yet.
High signal Matched: decoding, speculative decoding
Together AI · inference-infra · 2025-05-05
Score 12
No feed summary available yet.
High signal Matched: inference
Together AI · inference-infra · 2025-04-24
Score 12
No feed summary available yet.
High signal Matched: blackwell
Together AI · inference-infra · 2026-05-14
Score 3
Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.
Watchlist Matched: open-source
Together AI · inference-infra · 2026-04-30
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2026-04-30
Score 3
Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.
Watchlist Matched: fine-tuning, evaluate
Together AI · inference-infra · 2026-04-13
Score 3
EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on open math problems — including pushing the kissing number lower bound...
Watchlist Matched: agents
Together AI · inference-infra · 2026-02-25
Score 3
No feed summary available yet.
Watchlist Matched: training, agents, sota
Together AI · inference-infra · 2026-02-04
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2026-02-03
Score 0
Hiring Alon Gavrielov further deepens Together AI’s commitment to building AI factories that deliver the most reliable, efficient, and scalable infrastructure for AI-native teams.
Watchlist Matched: none
Together AI · inference-infra · 2025-12-18
Score 3
Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.
Watchlist Matched: none
Together AI · inference-infra · 2025-12-12
Score 3
No feed summary available yet.
Watchlist Matched: sdk
Together AI · inference-infra · 2025-10-28
Score 3
Test AI agents in the real world with Collinear TraitMix and Together Evals: dynamic persona simulations, multi-turn dialogs, and LLM-as-judge scoring.
Watchlist Matched: evals, agent, agents
Together AI · inference-infra · 2025-09-10
Score 3
Together AI expands Fine-Tuning Platform: train 100B+ models, extend context lengths, integrate with Hugging Face Hub, and access new DPO options.
Watchlist Matched: dpo, fine-tuning
Together AI · inference-infra · 2025-09-10
Score 2
Hiring Mahadev Konar further deepens Together AI’s commitment to deliver the most reliable and scalable GPU infrastructure.
Watchlist Matched: gpu
Together AI · inference-infra · 2025-08-11
Score 3
No feed summary available yet.
Watchlist Matched: oss
Together AI · inference-infra · 2025-07-29
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-07-08
Score 3
Build and deploy AI with peace of mind—Together AI is now SOC 2 Type 2 certified, proving our encryption, access controls, and 24/7 monitoring meet the highest security standards.
Watchlist Matched: none
Together AI · inference-infra · 2025-07-02
Score 3
No feed summary available yet.
Watchlist Matched: training, agent
Together AI · inference-infra · 2025-06-12
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-06-12
Score 3
Build a data scientist agent using Together’s open-source models and Code Interpreter—easy to implement, solid benchmarks, and full code on GitHub.
Watchlist Matched: benchmarks, agent, open-source
Together AI · inference-infra · 2025-06-09
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-05-29
Score 3
No feed summary available yet.
Watchlist Matched: fine-tuning
Together AI · inference-infra · 2025-05-28
Score 3
No feed summary available yet.
Watchlist Matched: training, post-training, agents, open-source
Together AI · inference-infra · 2025-05-20
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-05-20
Score 3
No feed summary available yet.
Watchlist Matched: api
Together AI · inference-infra · 2025-05-15
Score 3
No feed summary available yet.
Watchlist Matched: none
Together AI · inference-infra · 2025-04-21
Score 3
No feed summary available yet.
Watchlist Matched: training
Together AI · inference-infra · 2025-04-17
Score 3
No feed summary available yet.
Watchlist Matched: training, fine-tuning