Recognized by Forrester as a leading cloud cost management solution
ML/AI model training tracking & profiling, internal/external performance metrics
Granular ML/AI optimization recommendations
Runsets to identify the most efficient ML/AI model training results
Spark integration
OptScale profiles machine learning models and gives a deep analysis of internal and external metrics to identify training issues and bottlenecks. ML/AI model training is a complex process, which depends on a defined hyperparameter set, hardware, or cloud resource usage. OptScale improves ML/AI profiling process by getting optimal performance and helps reach the best outcome of ML/AI experiments.
OptScale provides full transparency across the whole process of ML/AI model training and teams, captures ML/AI metrics and KPI tracking, which help identify complex issues appearing in ML/AI training jobs. To improve the performance OptScale users get tangible recommendations such as utilizing Reserved/Spot instances and Saving Plans, rightsizing and instance family migration, detecting CPU/IO, IOPS inconsistencies that can be caused by data transformations, effective usage of cross-regional traffic, avoiding Spark executors idle state, running comparison based on the segment duration.
OptScale enables ML/AI engineers to run a bunch of training jobs based on a pre-defined budget, different hyperparameters, and hardware (leveraging Reserved/Spot instances) to reveal the best and most efficient outcome for your ML/AI model training.
OptScale supports Spark to make Spark ML/AI task profiling process more efficient and transparent. A set of OptScale recommendations, which are delivered to users after profiling ML/AI models, includes avoiding Spark executors’ idle state.
A full description of OptScale as a FinOps and Test Environment Management platform to organize shared IT environment usage, optimize & forecast Kubernetes and cloud costs