MLOps, de la valeur à l'adoption chez @decathlon_digital

MLOps, de la valeur à l'adoption chez @decathlon_digital

Introduction and Overview

In this section, Hugo Amade and Corentin introduce themselves and provide an overview of Decathlon, a leading sports company with a focus on product creation, distribution, and digital innovation.

Introductions

  • Hugo Amade is the Live Factory Manager at Decathlon for the past three months.
  • Corentin has been working at Decathlon for almost 8 years as a learning engineer.

About Decathlon

  • Decathlon is a favorite sports company in France.
  • They are a leader in the sports market with activities in product creation and distribution.
  • They have over 1,750 stores worldwide and employ 105,000 collaborators.
  • Decathlon Digital is their digital division that focuses on e-commerce, data, cybersecurity, product design, and innovation.

The Role of Live Factory

  • Live Factory is an internal center of expertise within Decathlon focused on innovative data solutions.
  • They collaborate with various data ecosystems within Decathlon to drive digital innovation.
  • Their teams work on areas such as data value engineering and innovation.

Importance of MLOps

In this section, the speakers discuss the importance of MLOps (Machine Learning Operations) in managing the lifecycle of ML models. They emphasize the need to shift from a focus solely on algorithmic approaches to considering the entire project lifecycle.

Understanding MLOps

  • MLOps refers to a culture, mindset, and practices for managing the lifecycle of ML models.
  • It involves not only developing high-performing ML models but also considering other aspects such as needs analysis, packaging, deployment infrastructure, architecture, etc.

Shifting Mindset towards Industrial Solutions

  • In the past, there was often a focus on developing ML models without considering the broader project context.
  • The shift towards MLOps involves considering the entire project lifecycle, including needs analysis, algorithmic approaches, packaging, deployment, and production.
  • This mindset change is crucial to develop industrial solutions rather than just focusing on innovation for innovation's sake.

Live Factory and Innovation

In this section, the speakers discuss how Live Factory at Decathlon fosters innovation by collaborating with different teams and creating cross-functional projects. They highlight the importance of addressing specific business needs while driving innovation.

Role of Live Factory in Innovation

  • Live Factory collaborates with various data ecosystems within Decathlon.
  • They work on innovative and industrial solutions for data-related challenges.
  • Their teams include Data Value Engineering and Innovation teams.

Cross-functional Collaboration

  • Live Factory collaborates with different teams within Decathlon to deliver value.
  • They have vertical-specific fabriques (factories) focused on areas such as pricing, personalization, and offers.
  • They can also create cross-functional teams to address specific use cases that require expertise from multiple domains.

MLOps Culture and Practices

In this section, the speakers delve deeper into MLOps culture and practices. They emphasize the importance of considering all components of a project beyond ML code when working on data projects.

Beyond ML Code

  • ML code is just one component of a broader set of modules in a data project.
  • Other components include preprocessing, post-processing, data cleaning, infrastructure setup, governance, monitoring, etc.
  • Working on all these components from the start helps ensure a holistic approach to project development.

Importance of Industrial Solutions

  • Developing ML models is important but not sufficient for successful data projects.
  • Taking an industrial approach involves considering all aspects of a project's lifecycle from inception to production deployment.
  • This includes packaging models effectively and ensuring they align with business needs.

Conclusion and Debt Management

In this section, the speakers conclude by emphasizing the importance of considering all aspects of a data project and managing technical debt. They highlight the need to work on the entire project from the start rather than focusing solely on ML code.

Managing Technical Debt

  • Working on data projects involves managing technical debt, similar to other software development projects.
  • It is crucial to consider infrastructure, governance, monitoring, and other components beyond ML code.
  • Addressing technical debt from the beginning helps ensure a more robust and scalable solution.

Importance of Holistic Approach

  • A holistic approach to data projects involves considering all components from needs analysis to production deployment.
  • This includes understanding business requirements, developing algorithmic approaches, packaging solutions effectively, and deploying them in production.

[t=XXXXs] Additional Sections (if applicable)

If there are any additional sections in the transcript that were not covered above, they can be summarized using a similar structure as shown above.

Principles of MLOps

In this section, the speaker discusses the principles of MLOps and emphasizes the importance of following these principles for achieving maturity in ML projects.

Key Points:

  • MLOps is not just about developing ML models, but it involves a trajectory of maturity for businesses.
  • Teams should measure their maturity level and identify steps to move towards higher maturity.
  • Monitoring is crucial for putting AI models into production.
  • Development of ML projects should be treated like software development, with specific considerations for data and models.
  • Industrial thinking should be applied from the beginning, focusing on automation, testing, and documentation.

Three Pillars of Decathlon's MLOps Approach

The speaker introduces Decathlon's approach to adopting MLOps culture and highlights the three pillars: process, people, and platform.

Key Points:

  • Decathlon's MLOps approach is based on three pillars: process, people, and platform.
  • Process refers to having a well-defined lifecycle for ML projects that includes key stages such as problem definition, data collection and preparation, model training, error analysis, deployment, and monitoring.
  • People are essential in driving the adoption of MLOps culture within an organization.
  • Platform refers to the tools and infrastructure needed to support MLOps practices.

Process as a Key Element in MLOps

The speaker emphasizes the importance of having a well-defined process in MLOps and shares insights on how Decathlon has structured its process based on key stages in the ML project lifecycle.

Key Points:

  • A well-defined process is crucial for developing ML systems at scale.
  • Decathlon has identified key stages in the ML project lifecycle: problem definition, data collection and preparation, model training, error analysis, deployment, and monitoring.
  • Having a shared understanding of these stages helps in planning projects and minimizing surprises.
  • Each stage involves specific tasks and critical milestones that need to be addressed.

Importance of Monitoring in MLOps

The speaker highlights the significance of monitoring in MLOps and how it enables understanding the functioning of ML systems with real data.

Key Points:

  • Monitoring is a crucial step after delivering a solution and ensuring its proper functioning.
  • It provides insights into how the system performs with real data, which was not possible with cold testing alone.
  • Monitoring helps identify issues, track performance, and make necessary improvements to the ML solution.

Conclusion: Decathlon's MLOps Journey

The speaker concludes by sharing key takeaways from Decathlon's journey in adopting MLOps culture and emphasizes the importance of process, people, and platform in achieving successful ML projects.

Key Points:

  • Decathlon's MLOps journey is based on three pillars: process, people, and platform.
  • Process plays a vital role in structuring ML projects and minimizing surprises along the way.
  • People are essential for driving cultural change within an organization towards embracing MLOps practices.
  • Platform refers to the tools and infrastructure needed to support efficient development, deployment, and monitoring of ML solutions.

Understanding the Project Lifecycle

In this section, the speaker discusses the project lifecycle of machine learning projects and the challenges that can arise.

Importance of Defining the Problem

  • Defining the problem is a crucial step in the success of any ML project.
  • Developing a great solution for a wrong problem does not create value.
  • It is important to clearly define the problem to be addressed and understand its impact on Decathlon's business.

Identifying Important Factors

  • Understanding Decathlon's priorities and business strategy helps identify what is important for them.
  • Collaborating with domain experts who have knowledge about processes and tools used by users is essential.
  • Data design thinking workshops help uncover real business problems and stimulate value creation.

Stimulation of Value Creation

  • Once the project is identified, it is important to determine the value it can generate.
  • Functional constraints and success criteria should be well-defined to ensure acceptance by stakeholders.
  • Feasibility from an ML perspective needs to be considered, including methodology selection and deployment constraints.

Building Strong Foundations

  • Access to high-quality data through data governance teams accelerates ML projects.
  • Empowering data scientists with autonomous access to computational resources helps in resource allocation based on project requirements.
  • Tracking costs at both project and tool levels allows for optimization while ensuring accountability.

Deployment and Monitoring

  • The deployment phase involves designing solutions, considering business problems, ML methodologies, and technical constraints.
  • Monitoring models during training, tracking performance metrics, and optimizing resources are crucial for innovation while meeting constraints.

The transcript was provided in French.

Overview of the Process

This section discusses the process followed in organizations to facilitate communication between technical and business teams, focusing on developing clear standards and automating processes.

Communication Between Teams

  • The central technical attack platform team is responsible for providing quality data and services to the business teams.
  • Two teams are involved: the platform team aims for a secure and stable platform, while the business team focuses on flexibility and delivering value to the business.

Defining Clear Standards

  • The first step is to define clear standards for product development that allow easy communication between the two teams.
  • Existing best practices are applied, such as continuous integration and deployment, to ensure efficient development.

Automating Processes

  • Once clear standards are established, the next step is to automate the process of transferring work from one team to another while adhering to these standards.
  • Tools like continuous integration and deployment are utilized, along with automation of manual tasks related to data processing.

Monitoring in Production

This section focuses on monitoring models in production, including defining contracts with stakeholders, creating dashboards for performance monitoring, evaluating risks, and maintaining documentation.

Defining Contracts with Stakeholders

  • It is crucial to establish a contract with stakeholders regarding responsibilities for data quality, model robustness, deployment in production, and proper monitoring.
  • Performance monitoring dashboards can be created using historical data to ensure consistency in predictions.

Evaluating Risks

  • Assessing model robustness and performance helps determine appropriate actions if predictions deviate from expectations.
  • Continual streaming or fallback solutions may be implemented based on risk evaluation.

Documentation Importance

  • Documentation plays a vital role in facilitating understanding and maintenance of models by different individuals involved in production.
  • It helps explain weaknesses or issues encountered and enables effective communication with the right person for problem-solving.

Importance of People in the Process

This section emphasizes the importance of people in the process, highlighting the need for collaboration among different profiles and the transition from a single-person responsibility to feature teams.

Collaboration Among Profiles

  • Full-stack "unicorns" who handle all aspects of a project are not efficient.
  • Feature teams consisting of various profiles, including business analysts, data engineers, and scientists, work together to ensure successful project execution.

Transition to Feature Teams

  • Decathlon transitioned from relying on a single person for end-to-end project responsibility to feature teams.
  • These teams collaborate on different stages of the project, such as business requirements gathering, data collection, model development, deployment, and monitoring.

Importance of Collaboration and Coordination

This section emphasizes the significance of collaboration and coordination among different profiles in a project. It highlights the importance of teamwork, agility, and mutual support to achieve successful implementation.

Collaboration and Coordination

  • Effective coordination and collaboration between different profiles are crucial for project success.
  • Agility is necessary to respond to unexpected challenges.
  • Mutual support among team members is essential for achieving goals.

Understanding Business Objectives

This section discusses the importance of understanding business objectives when implementing a project. It emphasizes the need for all stakeholders to have a clear understanding of what they aim to achieve.

Understanding Business Objectives

  • All project stakeholders should have a comprehensive understanding of the overall business objectives.
  • Technical coordination and collaboration are required to align with these objectives.
  • Consideration should be given to technical feasibility and alignment with business goals.

Platform as an Essential Pillar

This section focuses on the technical aspect of implementing a platform. It highlights the importance of choosing appropriate technical implementations based on existing solutions, budget, team experience, and specific business needs.

Platform Implementation

  • The choice of technical implementation depends on existing solutions within the organization.
  • Various tools and options are available for platform implementation.
  • The selection process considers factors such as production environment, budget, team expertise, and specific business requirements.

Types of Tools for Implementation

This section categorizes tools into three main types based on flexibility, control, and operational complexity. It provides an overview of each type's characteristics.

Types of Tools

  1. Managed Services:
  • Ready-to-use services provided by cloud providers.
  • Offers convenience but limited control.
  1. Open Source Solutions:
  • Solutions that require implementation and operation within the organization.
  • Provides flexibility but requires more effort.
  1. Hybrid Solutions:
  • Combination of managed services and open source tools.
  • Offers adaptability based on specific needs.

Choosing the Right Tools

This section emphasizes the importance of selecting appropriate tools for specific purposes and avoiding using tools for unintended purposes. It provides examples and compares tool selection to choosing the right golf club for different situations.

Tool Selection

  • Avoid using tools for purposes they were not designed for.
  • Consider the specific requirements of each situation when selecting tools.
  • Examples include data exploration, code versioning, documentation, orchestration, model versioning, etc.

Three Pillars of ML Projects

This section introduces three main pillars of machine learning projects: collaborative project management, development feasibility assessment, and deployment in production.

Three Pillars

  1. Collaborative Project Management:
  • Facilitates communication and interaction among team members with different profiles.
  1. Development Feasibility Assessment:
  • Evaluates the technical feasibility and scientific validation of proposed ideas or concepts.
  1. Deployment in Production:
  • Involves putting machine learning models into production use.

Available Tools and Services

This section provides an overview of various tools and services available for data exploration, code versioning, documentation, orchestration, model versioning, etc., which can be used to enhance machine learning projects' industrialization process.

Available Tools and Services

  • Data exploration: Linotte Book, Datawix
  • Code versioning: Meetup, Poetrie
  • Technical documentation: Sphinx
  • Orchestration: Airflow
  • Model versioning: Emma Faux

The provided summary covers the main points of the transcript in a clear and concise manner.

The Importance of Maintaining Simplicity in ML Projects

In this section, the speaker discusses the importance of maintaining simplicity in machine learning (ML) projects and avoiding complex solutions that can become difficult to maintain.

Simplifying ML Projects

  • It is not advisable to apply practices that lead to complex and convoluted solutions in ML projects.
  • While these solutions may initially accelerate development, they require significant effort and resources to maintain.
  • Complex solutions often result in a tangled web of dependencies and can be challenging to maintain over time.
  • It is crucial to focus on the core activities of the business and avoid unnecessary complexities.
  • If a tool or solution already exists in the open-source domain, it is recommended to leverage it instead of reinventing the wheel.
  • Consider the costs associated with maintaining and evolving the technical implementation of a solution, which are often overlooked during budgeting phases.

Building Standards and Best Practices for ML Projects

This section emphasizes the importance of building standards and best practices for ML projects to enhance team skills, facilitate knowledge transfer, and ensure efficient development cycles.

Establishing Standards

  • Building standards is essential for enhancing team competencies and facilitating knowledge transfer within an organization.
  • Utilizing consistent best practices across teams helps streamline processes when transferring solutions from one team to another.
  • Development in ML projects differs from traditional software development as it involves experimentation. Flexibility is crucial for data scientists to explore different models or test new features while tracking their results effectively.

Leveraging Experimentation Tracking and Pipelines

This section highlights the significance of leveraging experimentation tracking tools and creating pipelines in ML projects for better model management, reproducibility, and adaptability.

Experimentation Tracking

  • Experimentation tracking tools like Databricks' experiment manager enable data scientists to track model performance and store necessary information for production deployment.
  • Experimentation tracking helps identify models with good performance and provides the required information for seamless production deployment.

Creating Pipelines

  • Creating pipelines, such as data validation, feature engineering, training, inference, and hyperparameter tuning pipelines, is crucial in ML projects.
  • Platforms like Amazon SageMaker or DataBricks can be used to version control ML pipelines.
  • Versioning workflows and models ensures reliability, traceability, and easy rollback to previous versions if needed.

Importance of Monitoring and Adaptability in ML Projects

This section emphasizes the importance of monitoring ML models' weaknesses and being able to adapt quickly to changes in real-world scenarios.

Monitoring Models

  • Monitoring models is essential for understanding their weaknesses and adapting them to changing contexts.
  • Similar to a kitesurfer adjusting their equipment based on wind intensity, data scientists should be capable of changing models or equipment in production based on real-world observations.

Iterative Approach for MLOps Adoption

This section discusses the iterative approach for adopting MLOps practices rather than aiming for perfection from the start. It highlights the importance of delivering business value while continuously improving ML capabilities.

Iterative Approach

  • The goal is not to achieve perfection from the beginning but rather respond quickly to business needs by delivering value.
  • Adopt an iterative approach that focuses on solving specific business problems while continuously improving ML practices.
  • Matrices exist to measure maturity levels in MLOps adoption.

Defining Principles and Components for Successful MLOps Implementation

This section emphasizes the significance of defining principles and components when implementing successful MLOps practices.

Defining Principles

  • Defining principles is crucial for establishing good practices and components within an organization.
  • The MLD design document presented by Alfonso highlights the importance of defining principles and components for successful MLOps implementation.

Principles and Best Practices

The team discussed the principles and best practices for improving MLOps maturity. They identified the components needed, assessed their availability, and determined areas for improvement.

Defining Components

  • Identified necessary components for MLOps.
  • Assessed availability of existing components.
  • Identified areas that require further development to improve MLOps maturity.

Component Lifecycle

The team discussed the lifecycle of components in MLOps and how to develop, deploy, and run solutions effectively. Roles and responsibilities were also emphasized.

Developing Solutions

  • Emphasized the importance of developing solutions from scratch.
  • Discussed phases such as design, deployment, and running the solution.
  • Highlighted the significance of defining roles and responsibilities.

Architecture Examples

The team presented examples of architecture products developed within the community to inspire and assist other teams in their own projects.

Sharing Architecture Examples

  • Shared examples of architecture products developed by the community.
  • Aimed to inspire and help other teams adapt or replicate these architectures based on their specific contexts.

Continuous Improvement

The team discussed future plans for continuous improvement in MLOps at Decathlon. They emphasized iterative approaches, standardization through tools, documentation, automation, and proper management of ML solutions.

Iterative Approach

  • Emphasized continuing an iterative approach to improve MLOps maturity.
  • Acknowledged that different projects may have varying levels of maturity based on their criticality.
  • Aimed to standardize processes through tools, versioning, documentation, automation, etc., while considering unique requirements for ML solutions.

Managing ML Solutions

  • Discussed the need to manage ML solutions with similar standards as SRE (Site Reliability Engineering).
  • Aimed to structure and manage ML solutions effectively within the organization.

Machine Learning Observability

The team discussed the importance of observability in machine learning and how it differs from traditional monitoring. They highlighted the need for tools that provide analytics and insights to understand and improve ML solutions.

Observability vs Monitoring

  • Differentiated observability from traditional monitoring.
  • Highlighted the importance of understanding why issues occur, not just detecting them.
  • Discussed the availability of tools for overall solution observability, enabling analysis and continuous improvement.

Trustworthiness and Security

The team emphasized the increasing focus on trustworthiness and security in MLOps. While this is a relatively new concept at Decathlon, it is essential for ensuring reliable ML solutions.

Trustworthiness and Security

  • Highlighted the importance of trustworthiness and security in MLOps.
  • Acknowledged that this is a new area for Decathlon but emphasized its significance.
  • Aimed to incorporate trustworthiness practices into overall MLOps processes.

Continuous Improvement Metrics

The team discussed metrics for measuring continuous improvement in MLOps. They aimed to establish scores or benchmarks for each solution, allowing teams to track progress and identify areas for improvement.

Measuring Progress

  • Aimed to establish a scoring system or metrics to measure progress in MLOps maturity.
  • Intended to track levels of maturity achieved by each solution.
  • Planned to use these scores as a guide for further improvements.

MLOps Culture and Mindset

The team emphasized the importance of cultivating a culture and mindset that aligns with MLOps principles. They discussed the need for cross-functional collaboration, adopting the right tools, measuring maturity, and following best practices.

Cultivating Culture and Mindset

  • Highlighted the importance of cultivating a culture and mindset aligned with MLOps principles.
  • Emphasized cross-functional collaboration between data scientists, data engineers, ML engineers, etc.
  • Stressed the need for appropriate tools to support MLOps practices.

Maturity Assessment and Planning

The team discussed the significance of assessing maturity levels in MLOps and creating a roadmap for improvement. They emphasized the importance of selecting suitable tools, following best practices, and setting goals for achieving higher maturity levels.

Assessing Maturity

  • Emphasized the importance of assessing current maturity levels in MLOps.
  • Encouraged teams to identify requirements for reaching higher maturity levels.
  • Discussed the need to create a roadmap or plan for improving maturity.

Conclusion

The team concluded by highlighting that MLOps is not just about technology but also about culture and mindset. They stressed the importance of nurturing a collaborative environment, choosing appropriate tools, measuring progress through metrics, and following best practices to achieve successful implementation of MLOps.

Video description

Vous voulez 15 minutes avec un expert MLOps Hymaia: https://meetings-eu1.hubspot.com/meetings/francois-laurain Retrouvez notre définition du MLOps https://www.hymaia.com/qu-est-ce-que/mlops Hugo Hamad qui a rejoint @decathlon_digital en tant que AI Director, avec pour ambition d’aider le groupe à passer à l’échelle dans l’exploitation et la gestion du Machine Learning en production. Hugo viendra nous exposer ses enjeux actuels et son objectif de placer le MLOps au coeur des bonnes pratiques et de la culture des équipes Data de Décathlon. Il sera accompagné d’Alfonso Carta, Data Science Manager & AI Lab Leader et de Corentin Vasseur, ML Engineer (XAI & MLOps). 00:00 Introduction 04:23 Pourquoi le MLOps? 10:17 Comment Decathlon a adopté le MLOps 15:20 Les retours d'expériences MLOps 17:14 Le Data Design Thinking 21:16 Le déploiement et le monitoring - son KO :( 22:53 Automatisation de process - son OK :) 23:40 Le monitoring avec MLOps 29:36 L'implémentation technique de la Data plateforme 33:14 Le panorama des outils Data 37:00 Gérer les phases d'expérimentation en MLOps 39:35 Commencer petit ! Itérer rapidement ! 42:25 Conclusion