Together AI has announced the general availability of Instant Clusters, a new feature that streamlines AI model training and deployment. This innovative tool promises to enhance productivity and collaboration among developers working on AI projects.

Instant Clusters eliminate GPU procurement delays, allowing AI teams to train models at any scale with minutes of setup time and pay only for actual compute usage.
Signal analysis
Together AI has launched Instant Clusters, a service that allows developers to provision large-scale GPU compute resources in minutes rather than weeks. The service provides access to clusters ranging from 8 to 512 NVIDIA H100 GPUs with NVLink interconnects, enabling distributed training of large language models and other compute-intensive AI workloads. This release directly addresses the GPU availability crisis that has constrained AI development teams throughout 2025.
The technical architecture leverages Together's partnership with multiple data center providers to offer geographic diversity across US-West, US-East, and EU-West regions. Each cluster comes pre-configured with popular ML frameworks including PyTorch, JAX, and TensorFlow, with container images optimized for multi-node training. The InfiniBand networking achieves up to 3.2 Tbps bandwidth between nodes, eliminating the communication bottlenecks that plague loosely coupled cloud GPU instances.
Pricing follows a per-GPU-hour model with commitments as short as one hour for burst workloads, though significant discounts apply to weekly and monthly reservations. Together claims their pricing undercuts hyperscaler alternatives by 30-40% due to operational efficiencies and purpose-built infrastructure. The on-demand nature means developers no longer need to maintain idle capacity during development phases, paying only for actual training runs.
AI startups training foundation models or large-scale fine-tunes stand to benefit most from Instant Clusters. The ability to access hundreds of GPUs without enterprise sales cycles or long-term commitments removes a major barrier to scaling. Teams that previously had to distribute training across multiple smaller instances - with the associated complexity of managing distributed training - can now request properly interconnected clusters sized to their workload.
Research teams at universities and labs gain democratized access to resources previously available only at well-funded institutions. A PhD student training a novel architecture can now access the same compute infrastructure as Google or OpenAI, leveling the playing field for AI research. The hourly billing model aligns with academic funding structures where committing to annual GPU reservations isn't feasible.
Enterprise ML teams with variable training demands can finally right-size their compute usage. Rather than maintaining expensive reserved capacity that sits idle between training runs, teams can spin up clusters for specific experiments and return resources when complete. This shift from capital expense to operational expense provides finance teams with predictable, project-based cost attribution.
Getting started requires a Together AI account with a verified payment method. Navigate to the Clusters section of the Together console and select 'Create Instant Cluster'. Choose your GPU count (8, 16, 32, 64, 128, 256, or 512 H100s), region, and expected duration. The system provides real-time availability and a cost estimate before confirmation. Click 'Launch' to begin provisioning.
Cluster provisioning typically completes in 2-8 minutes depending on size and availability. Once ready, you receive SSH credentials and API endpoints. The cluster comes with a shared filesystem mounted across all nodes, allowing you to upload training data once and access it from any node. Configure your training script to use the provided MASTER_ADDR and MASTER_PORT environment variables for distributed training coordination.
Monitoring and management are handled through Together's dashboard and CLI. Use `together cluster status` to check utilization, `together cluster logs` to stream training output, and `together cluster terminate` when your run completes. Automatic termination can be configured as a safety measure to prevent runaway costs if a training job crashes but the cluster remains running.
Compared to AWS p5 instances (H100-equipped), Together Instant Clusters offer significantly lower latency to provision - minutes versus hours or days for large reservations. AWS requires navigating capacity limits and potentially filing support tickets for multi-node requests, while Together's interface is self-service for any available configuration. The networking difference is also substantial; Together's InfiniBand interconnects outperform AWS EFA for distributed training workloads.
Lambda Labs offers competitive GPU pricing but currently caps cluster sizes at smaller configurations than Together's 512-GPU maximum. For mid-size training runs (8-64 GPUs), Lambda remains price-competitive, but organizations training larger models have limited alternatives. CoreWeave provides similar scale but requires more setup and configuration, targeting teams with more infrastructure expertise.
The trade-off is ecosystem maturity. AWS and GCP offer richer integrations with data storage, experiment tracking, and deployment services. Together's focus is purely on compute - you'll need to bring your own MLOps stack. For teams already using tools like Weights & Biases, MLflow, and external storage, this isn't a limitation. Teams expecting an integrated platform may find Together's offering more modular than desired.
Together has announced plans to add AMD MI300X clusters to their Instant Clusters offering in late 2026, providing an alternative to NVIDIA's GPUs. This diversification addresses concerns about NVIDIA supply constraints and offers developers the flexibility to optimize for price or performance. Early benchmarks suggest MI300X configurations will be priced 20% below equivalent H100 clusters.
The integration roadmap includes native support for popular training frameworks like HuggingFace Accelerate, DeepSpeed, and Megatron-LM. Rather than requiring developers to configure distributed training manually, these integrations will allow single-command launches of pre-configured training environments optimized for Together's interconnect architecture.
The broader market trajectory suggests on-demand GPU clusters will become standard infrastructure for AI teams by 2027. The significant capital requirements for building owned GPU capacity push all but the largest organizations toward rental models. Together's early-mover advantage in self-service provisioning positions them well, though increasing competition from Azure, GCP, and specialized providers will likely compress margins.
Watch the breakdown
Prefer video? Watch the quick breakdown before diving into the use cases below.
Best use cases
Open the scenarios below to see where this shift creates the clearest practical advantage.
One concise email with the releases, workflow changes, and AI dev moves worth paying attention to.
More updates in the same lane.
Cursor introduces self-hosted cloud agents, empowering developers with flexibility and control over their AI tools. Discover how this innovation can transform your development workflow.
Cursor's Warp Decode feature enhances AI-driven code interpretation, streamlining development workflows and improving productivity for developers. Discover how this innovation reshapes coding practices.
Together AI introduces the Adaptive Learning Speculator System, revolutionizing how developers create personalized learning experiences. This cutting-edge technology leverages AI to adapt content dynamically, enhancing engagement and effectiveness.