Introduction to Data Mesh - Zhamak Dehghani

Introduction to Data Mesh - Zhamak Dehghani

Introduction to Data Mesh

In this section, Zhamak introduces herself and explains why she is excited to talk about Data Mesh. She also highlights the importance of data in every company's strategy and shares some mission statements from different industries.

Importance of Data

  • Data is at the core of every company's strategy.
  • Different companies have great expectations around data.
  • Companies want to use data and AI to change their offerings, streamline their workforce, optimize their business, and improve customer experience.

Challenges with Big Data Management

  • Despite large investments in data and AI, measurable results are low.
  • The percentage of companies that measure themselves as having a data culture or being data-driven is quite low compared to the amount of investment.
  • Companies fail to scale big data platforms for sourcing diverse sets of domains producing large volumes of rapid pace data or scaling consumption of that data.

Introduction to Data Mesh

  • The reason for Data Mesh was due to failure symptoms in building big data platforms where companies failed to scale them after they were successful in building them.
  • We need to challenge assumptions and approach managing and collecting big data differently.

Current State of Big Data

In this section, Zhamak discusses the current state of big data management. She talks about operational vs analytical data, ETL processes, and various technologies used for managing big data.

Operational vs Analytical Data

  • There is a great divide between operational and analytical data for good reason.
  • Operational data runs the business while analytical data is used for big data platforms.

Technologies for Managing Big Data

  • For decades, we have been working on different approaches to manage data at scale or aggregate data.
  • We've had data analytics and initial versions of data warehousing to support business intelligence and various business reporting from the 60s.
  • There are various technologies, often proprietary technologies, to allow you to extract the data from these sources and then model them into some multidimensional systems that you can then run queries and build.

Introduction

The speaker introduces herself and the topic of Data Mesh. She talks about the importance of data in every company's strategy and shares mission statements from different companies.

Introducing Data Mesh

  • Zhamak introduces herself and her role as the director of Emerging Technologies.
  • She emphasizes that data is at the core of every company's strategy.
  • Zhamak shares mission statements from different companies, highlighting their focus on using data and AI to improve their operations.

Challenges with Big Data Platforms

The speaker discusses the challenges faced by big data platforms, including difficulties in scaling consumption of data and failure to materialize any value.

Competing on Data

  • Zhamak asks if we have a data culture, pointing out that despite large investments in big data platforms, only a small percentage of companies are competing on data.
  • She notes that many organizations try to build big data platforms but fail to scale them or consume the data effectively.
  • Zhamak suggests that we need to think outside the box and approach these challenges differently.

Analytical Data Management

The speaker discusses analytical data management, including technologies used for extracting and querying data.

From ETL to Data Warehousing

  • Zhamak explains how traditional approaches to managing analytical data involved extracting, transforming, and loading (ETL), as well as using technologies like data warehousing for business intelligence.
  • She notes that newer technologies allow for extracting data from various sources and running queries to build analytical models.

Introducing Data Mesh

The speaker introduces the concept of Data Mesh as a solution to the challenges faced by big data platforms.

The Birth of Data Mesh

  • Zhamak explains that Data Mesh came about as a hypothesis to address the symptoms of failure in big data platforms.
  • She describes how traditional approaches to managing data involve centralizing it, while Data Mesh proposes a decentralized approach where each domain owns its own data products.
  • Zhamak outlines the four core principles of Data Mesh, including domain-oriented decentralized data ownership and federated governance.

Data Mesh

In this section, the speaker discusses the concept of Data Mesh and why it is necessary in today's world.

Introduction to Data Mesh

  • The speaker introduces themselves and explains that they work with many leaders at different companies.
  • The speaker mentions that there are great expectations around big data management.
  • The speaker talks about how they felt a few years back that there was a crisis in managing and collecting big data differently.

Challenges with Traditional Approaches

  • The speaker questions what assumptions have been made regarding operational data and analytical data.
  • The speaker talks about how fragile the architecture for managing big data is.
  • The speaker discusses how we've been working on different approaches for decades, including data analytics and various business reporting from the 60s.

Need for Data Mesh

  • The speaker asks if we have become truly data-driven despite our efforts.
  • The speaker talks about how companies often fail to scale sourcing data from diverse sets of domains or producing large volumes of rapid pace.
  • According to the speaker, the answer to these points of friction lies in challenging some assumptions and managing big data differently.

Conclusion

  • Traditional modeling into star schemas or various forms of schemas doesn't serve today's analytical models such as running and training machine learning models. Instead, a late model approach is suggested where we get the data from operational systems.

The transcript discusses the concept of Data Mesh and why it is necessary in today's world. It also highlights the challenges with traditional approaches to big data management and the need for a new approach. The speaker suggests that we need to challenge some assumptions and manage big data differently.

Challenges of AI-Driven Expert Platform

The speaker discusses the challenges faced by an AI-driven expert platform.

Key Points:

  • The platform faces many challenges around the world.
  • The goal is to use data and AI to change their offerings.
  • Balancing between what they offer and how they empower their users and small-medium business owners.
  • Decentralizing decision-making local to the domains.

Components of Governance for Financial Stats Provider Company

The speaker briefly explains the components of governance that many financial stats provider companies use in their products.

Key Points:

  • The company wants to use data and AI to include thinking about global concerns.
  • They want to improve their workforce, streamline it, optimize their business, and serve customers using data.
  • Defining principles that define the boundary of the scope of governance.
  • Principles should define how decisions are made, whether a concern is local or global.

Principles for Healthcare Provider

The speaker talks about defining principles for a healthcare provider using data and AI.

Key Points:

  • Mission is to improve every single member's experience at every touchpoint with organizations through data and AI.
  • Principles should define how decisions are made, whether a concern is local or global.
  • Defining those global policies as part of those definitions.
  • Embedding computational policies into every single product blueprint.

Automation Verification

The speaker talks about automation verification in creating schemas for all domains.

Key Points:

  • Governance is going to look very different despite having the same objective of providing quality data across organizations safely and securely.
  • Embedding capabilities to enable the platform with automation and verification.
  • A ton of tooling can enable the platform and verification.
  • Do we have a data culture? Have we become data-driven?

Failure of Data Mesh

The speaker talks about why Data Mesh came about as a hypothesis.

Key Points:

  • The reason Data Mesh came about as a hypothesis was that they started seeing these failures.
  • Despite having the same objective of providing quality data across organizations safely and securely, governance is going to look very different.
  • Implementing computational policies into every single product blueprint.
  • Embedding those computational policies into every single product blueprint.

Understood, thank you for the detailed instructions. I will follow them to create a clear and concise markdown file that makes use of timestamps when available.

Introduction

In this section, the speaker introduces the topic of federated learning and explains how it differs from traditional machine learning approaches.

What is Federated Learning?

  • Federated learning is a machine learning approach that allows multiple parties to collaborate on building a shared model without sharing their data.
  • This approach is particularly useful in situations where data privacy concerns prevent centralized training of models.
  • Instead of sending data to a central server for processing, each party trains its own local model using its own data.
  • The local models are then combined into a global model that can be used by all parties.

How Does Federated Learning Work?

  • In federated learning, each party trains its own local model using its own data.
  • The local models are then sent to a central server, which combines them into a global model.
  • The central server sends the updated global model back to each party, which uses it to improve their local models.
  • This process continues until the global model reaches an acceptable level of accuracy.

Advantages of Federated Learning

In this section, the speaker discusses some advantages of federated learning over traditional machine learning approaches.

Data Privacy

  • One advantage of federated learning is that it allows multiple parties to collaborate on building a shared model without sharing their data.
  • This is particularly useful in situations where data privacy concerns prevent centralized training of models.

Efficiency

  • Federated learning can be more efficient than traditional machine learning approaches because it allows multiple parties to collaborate on building a shared model.
  • This can reduce the amount of time and resources required to train a model.

Flexibility

  • Federated learning is a flexible approach that can be used in a variety of settings, including healthcare, finance, and transportation.
  • It can also be used with different types of data, such as text, images, and audio.

Challenges of Federated Learning

In this section, the speaker discusses some challenges associated with federated learning.

Communication Overhead

  • One challenge of federated learning is that it requires communication between multiple parties.
  • This can result in increased communication overhead and slower training times.

Heterogeneous Data

  • Another challenge of federated learning is that each party may have different types or amounts of data.
  • This can make it difficult to combine local models into a global model that accurately represents all parties' data.

Security Risks

  • Federated learning introduces new security risks because it involves sharing models between multiple parties.
  • These risks include the possibility of model poisoning attacks or other forms of malicious behavior.
Video description

Today, data is ubiquitous. Data is the by-product of any and every action we take. Everything, every system, every process, every sensor generates data. Technology makes it easier for organizations to collect and store data, for businesses to leverage to make better decisions or create more tailored experiences for their customers. However, organizations are struggling to enable and empower their employees to make the most informed and timely decisions possible. Centralized data platform architectures fail to deliver data with the speed and flexibility scaling organizations need. Data mesh is a response to this problem. Data mesh applies the principles of modern software engineering and the learnings from building robust, internet-scale solutions to unlock the true potential of enterprise data. Learn more at https://www.thoughtworks.com/what-we-do/data-and-ai/data-mesh