top of page
  • Writer's pictureKarol Tajduś

Join the Data Mesh Revolution

Updated: Aug 21, 2023



"The tree of liberty must be refreshed from time to time with the blood of patriots and tyrants."

Inspired by the words of Thomas Jefferson, who spoke about the necessity of refreshing the 'tree of liberty,' we recognize that data and its architecture also require periodic filtering through the prism of innovation and experience.


Just 4-5 years ago, when we thought about data, traditional data management platforms would come to mind: respected, powerful, but often lagging behind dynamically changing business problems. Additionally, they had their unintended flaws resulting from functional or business shortcomings of the technical people who managed them. Such an approach to data architecture needed a revolution, and its name is Data Mesh.

It's not just a trendy phrase; it's an entirely new way we approach data infrastructure. So let's say goodbye to monolithic platforms and introduce to our banners the era of Data Mesh, a time of true data equality and their effective orchestration.


"Liberty, Equality, Brotherhood"

Data Mesh is not just an update of current rules but a radical shift in architectural paradigm with game-changing potential. Essentially, this architecture challenges the conventional, centralized approach to data platforms. Instead of creating one massive, monolithic warehouse, it promotes a decentralized model where different business domains, such as marketing, sales, or logistics, manage and deliver their own data as independent products. Imagine a dynamic revolution where each sector or ideology develops with its own unique characteristics, yet all coexist harmoniously under the broad banner of a common goal. Such is the essence of Data Mesh.



The architecture is based on four key principles:


1. Domain-oriented, decentralized ownership and data architecture: Instead of one team controlling all the data, ownership is dispersed among various teams or domains.


2. Data as a Product: Data is treated with the same rigor and discipline as the development of a specific product. They are maintained, improved, and delivered to meet the needs of users or stakeholders.


3. Self-sufficient data infrastructure as a platform: Providing teams with the tools, infrastructure, and guidelines necessary for the independent creation and delivery of their data products.


4. Shared Control: Although each domain has autonomy, there is a unifying management layer that ensures consistent standards and protocols.


By adopting the Data Mesh philosophy, we effectively address challenges that often plague centralized systems: with data silos, with slow response times, and with the constant risk of quality loss. Moreover, by treating data as a product—which is the basis of the Data Mesh philosophy—we give them the attention and care they deserve, leading to better, faster, and more innovative insights.


"Power to the People!"

At first glance, Data Mesh has only advantages. Increased data availability, agility, clear ownership, and harmonious synergy between technological and business domains. The decentralized nature of Data Mesh can lead to reduced bottlenecks and a more democratic and empowered data culture.


However, the process of changing organizational culture can be as difficult as convincing a nation of your arguments. This requires redefining roles, upskilling teams, and shaping a new approach to data. On the technical front, integrating decentralized data sets, ensuring uniform data quality, and configuring cross-domain access can seem like preparing an army where each regiment has its own independent orders.


What may help in implementing such significant changes in the organization is that Data Mesh architecture is not homogeneous. This means that for different types of organizations, we can implement different architectural concepts that best meet our business needs. Here we distinguish two basic concepts:


Full decentralization – where all data are fully divided and managed by different departments of our organization, is the quintessence of Data Mesh. However, it is suitable for companies that are cloud-native, young, and have many skilled software engineers. With a more complicated organizational structure or low technology penetration, it may be too demanding to implement.



Decentralization of Analytical Products – This is the most commonly used version of Data Mesh, where as an organization we have what's known as a Federated-zone, which is a centralized data lake containing data in formats coming from source systems. All the business know-how is applied in analytical products (Domain Zones). This allows for the implementation of the most important principles of Data Mesh while simultaneously reducing the responsibilities on the side of the domain owners.


In this setup, the architecture allows for a kind of 'best of both worlds' scenario. While raw data is stored in a centralized data lake, domain-specific know-how is layered on top of this raw data through analytical products that are owned by individual business units or 'domains.' This approach maintains some centralized control and data governance through the Federated-zone, while still giving each domain the autonomy to apply its expertise and generate insights through its analytical products. This effectively decentralizes some aspects of data ownership and analysis, while not entirely relinquishing centralized control, thereby reducing the burden on domain owners.



In real implementations, we see many different permutations of both architectures, aimed to optimize data management in various organizations - like data mesh operating on domain clusters (for organizations that do not have clearly divided business responsibilities - where some domains are managed by multiple departments). Thanks to this freedom, each company can decide which data-mesh implementation is right for them. However, architecture is not the only thing to consider when implementing a data mesh; such a revolution also requires the right set of tools:


Infrastructure-as-Code (IaC) (e.g., Terraform) allows teams to define and deliver their infrastructure through code. This is crucial in a decentralized system like Data Mesh, where domains have to autonomously deploy and manage their data products.


Metadata Management: The compass of our toolset. In a vast, decentralized system, it is easy to get lost. Tools like Amundsen, Collibra, or DataHub track where data comes from, its origin, and how it transforms over time. They are key to ensuring transparency and credibility.


Self-service platforms for data: By offering sandbox environments and tools for teams, platforms like Dremio, BigQuery, Snowflake allow domains to independently explore, transform, and deliver data - embodying the spirit of Data Mesh's "self-service" principle.


Monitoring and Observability Tools: The vigilant guardians of our toolset. Tools like Prometheus or Grafana ensure that everything runs smoothly. They offer insights into how data products operate, enabling proactive adjustments and optimizations.


Event Streaming Platforms: Think of them as incredibly fast roads connecting different parts of your business. Tools like Apache Kafka or Pub/Sub allow domains to publish and consume data in real-time, ensuring fluidity, efficiency, and timeliness of data flows.


As we know from life, tools or architecture alone will not chart the course. Alongside these technologies, a mindset shift is essential. Teams must move from being mere consumers of data to managers of data products specific to their domain. Adopting this role - supported by the right set of tools - is key to unlocking the true power of Data Mesh.


"There is nothing more powerful than an idea whose time has come."

The magic of Data Mesh lies in its promise to break down the division between technology and business, creating a cohesive data ecosystem where domains are not just passive consumers but active curators of their data products. And when this paradigm takes hold, several predictions emerge:


Growth of domain-specific data experts: As domains take control of their data, we will see a rise in roles tailored to expertise in data specific to that domain. Think of data product managers or domain data architects, specialists who understand both the intricacies of their domain and the nuances of data management.


Innovation: With the decentralization of data, barriers to innovation may collapse. Domains, empowered by autonomy and appropriate tools, will prototype, iterate, and innovate faster, leading to rapid business transformation.


Broader governance and ethics: With great power comes great responsibility. As domains gain greater autonomy, focus on data ethics, security, and governance will be strengthened. Ensuring responsible and ethical use of data will be key.


In summing up our odyssey, let us pause for a moment. Although the road to full realization of Data Mesh is layered and full of challenges and complexities, the vision it presents is transformative. Organizations that dare to embark on this journey today may be the creators of tomorrow's successes. The future is tangled and undeniably promising. Are you therefore ready to join the revolution?

43 views0 comments

Recent Posts

See All

Gen.AI & Data Governance. Good Idea?

There is hype on Gen.Ai right now, you know it I know it. It looks like we are forgetting about other data concepts right now and focusing on what Artificial Intelligence can do for us and how to har

Comments


bottom of page