Whitepaper 'FinOps y gestión de costes para Kubernetes'
Considere darle a OptScale un Estrella en GitHub, es 100% de código abierto. Aumentaría su visibilidad ante los demás y aceleraría el desarrollo de productos. ¡Gracias!
Ebook 'De FinOps a estrategias comprobadas de gestión y optimización de costos en la nube'
OptScale FinOps
OptScale - FinOps
Descripción general de FinOps
Optimización de costos:
AWS
MS Azure
Nube de Google
Alibaba Cloud
Kubernetes
MLOps
OptScale - MLOps
Perfiles de ML/IA
Optimización de ML/IA
Perfilado de Big Data
PRECIOS DE ESCALA OPTICA
cloud migration
Acura: migración a la nube
Descripción general
Cambio de plataforma de la base de datos
Migración a:
AWS
MS Azure
Nube de Google
Alibaba Cloud
VMware
OpenStack
KVM
Nube pública
Migración desde:
En la premisa
disaster recovery
Acura: recuperación ante desastres y respaldo en la nube
Descripción general
Migración a:
AWS
MS Azure
Nube de Google
Alibaba Cloud
VMware
OpenStack
KVM

Procesos clave de MLOps (parte 3): flujo de trabajo automatizado de aprendizaje automático

In this article, we describe block D, devoted to the automated machine learning workflow.

Por favor, encuentre el esquema completo, que describe los procesos clave de MLOps aquí. The main parts of the scheme are horizontal blocks, inside of which the procedural aspects of MLOps are described (they are assigned letters A, B, C, D). Each of them is designed to solve specific tasks within the framework of ensuring the uninterrupted operation of the company’s ML services. 

MLOps_Automated-Machine-Learning-Workflow

From the beginning, we would like to mention that in this article, by ML system, we mean an information system that contains one or more components with a trained model that performs some part of the overall business logic. The better the ML model developed, the greater the impact of its operation. The trained model processes an incoming stream of requests and provides some predictions in response, automating some parts of the analysis or decision-making process.

The process of using the model to generate predictions is called inference, and the process of training the model is called training. A clear explanation of the difference between them can be borrowed from Gartner – here we’ll use cats as an example.

IoT Data Input to ML Models

For the effective operation of a production ML system, it is important to monitor the inference metrics of the model. As soon as they begin to decline, the model needs to be retrained or replaced with a new one. This often happens due to changes in input data (data drift). For example, there is a business task where the model can recognize bread in photos, but it is given a photo of a corgi instead. The dogs in the example are for balance:

ML dog or bread

The model in the initial dataset knew nothing about corgis, so it predicts incorrectly. Therefore, the dataset needs to be changed, and new experiments need to be conducted. At the same time, the new model should be put into production as soon as possible. Users are not prohibited from uploading images of corgis, and they will get an erroneous result.

Now, let’s look at more real-life examples: let’s consider the recommendation system of a marketplace. Based on a user’s purchase history, purchases of similar users, and other parameters, a model or ensemble of models generates a block with recommendations. It includes products whose purchase revenue is regularly calculated and tracked.

Something happens, and the needs of buyers change. Therefore, the recommendations for them become outdated, and demand for the recommended products decreases. All this leads to a decrease in revenue.

Next, come the cries of managers and demands to restore everything by tomorrow, which lead to nothing. Why? There is not enough data about the new preferences of buyers, so you can’t even create a new model. Of course, you can take some basic recommendation generation algorithms (item-based collaborative filtering) and add them to production. This way, recommendations will work somehow, but it’s only a temporary “band-aid.”

Ideally, the process should be set up so that, based on metrics and without the guidance of managers, the process of retraining or experimenting with different models is launched. And the best one eventually replaces the current one in production. In the diagram, this is the Automated ML Workflow Pipeline (block D), which is triggered in some orchestration tool.

Data processing computation infrastructure
cost optimization, ML resource management

Optimización gratuita de costos en la nube y gestión mejorada de recursos de ML/AI para toda la vida

This is probably the most heavily loaded section of the scheme. The operation of Block D involves several key external components:

  • the workflow orchestrator component is responsible for launching the pipeline on a specified schedule or event
  • the feature store provides data on the features required for the model.
  • the model registry and ML metadata store, where models and their metrics obtained after the completion of the launched pipeline are placed.

The structure of the block essentially combines the stages of experimentation (C) and feature development (B2). This is not surprising, given that these processes need to be automated. The main differences are in the last two stages:

  • export model
  • push to the model registry

The other stages are identical to those described above.

It is worth noting separately the service artifacts required by the orchestrator to launch the model retraining pipelines. In fact, this is code that is stored in a repository and runs on dedicated servers. It is versioned and modernized according to all the rules of software development. It is this code that implements the model retraining pipeline, and the result depends on its correctness.

It is worth noting that automating experiments in general is impossible. Of course, you can add the concept of AutoML to the process, but to date, there is no recognized solution that could be used with the same results for any experiment topic.

In general, AutoML works as follows:

  1. It somehow forms a set of combinations of model operating parameters.
  2. It launches an experiment for each resulting combination.
  3. It records the metrics for each experiment, based on which the best model is selected.

In essence, AutoML performs all the manipulations that a hypothetical Junior/Middle Data Scientist would perform in a circle of more or less standard tasks.

We’ve covered automation a bit. Next, it is necessary to organize the delivery of the new model version into production.

💡 You might be also interested in our article ‘Key MLOps processes (part 2): Feature engineering, or the development of features’ → https://hystax.com/key-mlops-processes-part-2-feature-engineering-or-the-development-of-features.

✔️ OptScale, una plataforma de código abierto FinOps y MLOps que ayuda a las empresas a optimizar los costos de la nube y brindar más transparencia en el uso de la nube, está completamente disponible en Apache 2.0 en GitHub → https://github.com/hystax/optscale.

Ingresa tu email para recibir contenido nuevo y relevante

¡Gracias por estar con nosotros!

Esperamos que lo encuentre útil.

Puede darse de baja de estas comunicaciones en cualquier momento. política de privacidad

Noticias e informes

FinOps y MLOps

Una descripción completa de OptScale como una plataforma de código abierto FinOps y MLOps para optimizar el rendimiento de la carga de trabajo en la nube y el costo de la infraestructura. Optimización de los costos de la nube, Dimensionamiento correcto de VM, instrumentación PaaS, Buscador de duplicados S3, Uso de RI/SP, detección de anomalías, + herramientas de desarrollo de IA para una utilización óptima de la nube.

FinOps, optimización de costos en la nube y seguridad

Descubra nuestras mejores prácticas: 

  • Cómo liberar direcciones IP elásticas en Amazon EC2
  • Detectar máquinas virtuales de MS Azure detenidas incorrectamente
  • Reduce tu factura de AWS eliminando las copias instantáneas de disco huérfanas y no utilizadas
  • Y conocimientos mucho más profundos

Optimice el uso de RI/SP para equipos de ML/AI con OptScale

Descubra cómo:

  • ver cobertura RI/SP
  • obtenga recomendaciones para el uso óptimo de RI/SP
  • Mejore la utilización de RI/SP por parte de los equipos de ML/AI con OptScale