AWS What's New June 03, 2026

Amazon SageMaker AI launches multi-turn reinforcement learning for AI agent model customization

ai hardware

Summary

Amazon Web Services has launched multi-turn reinforcement learning (RL) capabilities within SageMaker AI, introducing a new serverless model customization technique designed to fine-tune AI models for complex, multi-step agent tasks. The service allows developers to train models against custom agent environments while rewarding complete decision sequences across tasks, enabling smaller, cost-effective models to match or exceed the performance of larger general-purpose models on specific workloads. SageMaker's multi-turn RL offering integrates with Amazon Bedrock AgentCore Runtime for managed hosting or can connect to custom infrastructure running on Amazon EKS, EC2, AWS Fargate, or external platforms. The service manages the entire training loop, including rollout orchestration, trajectory collection, training, and checkpoint management, while providing built-in MLflow tracking for inspecting agent trajectories, rewards, and traces. As a fully serverless capability, users pay only for tokens processed without needing to provision or manage underlying infrastructure. The service is now available through SageMaker Studio and the SageMaker Python SDK, supporting models including Qwen 3.6 27B, Nova Lite 2.0, GPT-OSS-20B, and Gemma 31B in the us-west region.

Why It Matters

This launch addresses a critical challenge in AI development where training reliable multi-step agent models typically requires weeks of custom infrastructure development. By offering this as a managed, serverless service, AWS is lowering the barrier for organizations to develop specialized AI agents while potentially reducing costs through the ability to use smaller, task-optimized models instead of expensive large general-purpose models. This could accelerate enterprise adoption of AI agents for complex workflows and give AWS a competitive advantage in the rapidly growing AI infrastructure market.

Read Original Release →

Note

This summary is generated using AI analysis of the original press release. Always refer to the original source for complete details.