With OptScale ML/AI and data engineering teams get an instrument for tracking and profiling ML/AI model trainings and other relevant tasks. OptScale collects a holistic set of inside and outside performance and model-specific metrics, which help to give performance and cost optimization recommendations for ML/AI experiments or production tasks. OptScale integration with Apache Spark makes Spark ML/AI task profiling process more efficient and transparent.

By integrating with an ML/AI model training process OptScale highlights bottlenecks and provides clear recommendations to reach ML/AI performance optimization. The recommendations include utilizing Reserved/Spot instances and Saving Plans, rightsizing and instance family migration, Spark executors idle state, detecting CPU/IO, IOPS inconsistencies that can be caused by data transformations or model code inefficiencies.

OptScale enables ML/AI engineers to run a bunch of training jobs based on pre-defined budget, different hyperparameters, hardware (leveraging Reserved/Spot instances) to reveal the best and most efficient results for your ML/AI model training.


After profiling of ML/AI model training OptScale gives dozens of real-life optimization recommendations and in-depth cost analysis, which help minimize cloud costs for ML/AI experiments and development. The tool delivers ML/AI metrics and KPI tracking, providing full transparency across ML/AI teams.

