Just an illustration – not the truth and we will pivot if it does not work.
I discovered Zhamak Dehghani’s first article about Data Mesh in August 2020. Thanks to Youtube, you have the live illustration in this video with even more context and explanations. And then, you have this second video that is an introduction to her second article (december 2020). It is really a journey with a brillant person who has really a discovery mindset. She keeps making this concept better and clearer. Most of the time, I am seeing her in this small window but even like this you can feel her very high level of assertiveness.
Her ideas are known by many now and you even have a community here.
The next question after reading her two articles : how do we implement it ? In any organisation, you never have everything in control and you have to consider the context and the starting point. It is always a work in progress too.
Just to know, I never have had in mind that “this is what we should implement”. But I recognise I fully agree with the analysis and the principles proposed to fix the problem.
I will try to answer to this question “Could you illustrate your journey towards a Data Mesh ?” with 3 articles : this one about Data domains and Team Topologies, a second one devoted to the architecture and the technology and the last one about change management and the needed skills.
1. Solving the right problem
Let’s say that your starting point is having multiple data teams and multiple data platforms. It is a very common pattern for data (I call it the “it’s my data syndrome”) because like this everybody is happy with the autonomy but your data is naturally in silos with possibly multiple technologies. The solution could be tempting to have only one team working on one data platform. In fact, it is a problem too.
This is what has been described by Zhamak Dehghani as “centralized and monolithic” illustrated with this “big data” platform schema.
Creating a central platform and a central team will not match two very important points mentioned in her first article : data is ubiquitious and the need to innovate with data is urgent. With this kind of organisation, we are creating a point with a very high level of pressure on it. This team and the platform is not just a bottleneck, it will concentrate all the frustrations. You can trust me because I have been so many times in this situation : it will never be enough if you have multiple business units or many countries. And because you can’t stop the sea with your arms, you will have silos again. Even without any technology, at the end, people will use Excel or any other solution to fulfill their needs.
The right problem is to solve is this one ; to have different teams working on different subjects (data domains) and still be able to share and cooperate together.
In this article, Juan Sequada gives maybe one of the best definition of Data Mesh ” It is paradigm shift towards a distributed architecture that attempts to find an ideal balance between centralization and decentralization of metadata and data management.”. I would have added data teams to metadata and data management.
2. From Data Domains to stream aligned teams
Everything starts with the data domains : you will have to divide all the data of your organisation into data domains. I will not develop how you can define these domains (to stay focus on people and teams) but I strongly recommend this article by Ramesh Hariharan about Domain Driven Design (DDD). This methodology is totally adapted to data because it is about the business problem space to modelize, the language and the context.
Now you have these data domains, you will need to personify it with a Data Domain Owner. You will find many definitions of what a Data Domain Owner should do. Some are with a strong Data Governance flavour very focus on glossaries, data quality and data stewardship. But you can see it too as a Data Strategist who will translate business problems to analytical solutions and start to think data as a product. This is really about building the vision of what should be the data domain : scope of data, use cases and value.
When you have this vision, how to make it become a reality : the only answer to that is building a team (probably more than one) and to be more specific like a stream aligned team as defined by Matthew Skelton and Manual Pais in this book. “A stream aligned team is a team aligned to a single valuable stream of work… Further, the team is empowered to build and deliver customer or user value as quickly, safely, and independently as possible, without requiring hands-off to other teams to perform parts of the work”. The hand-off part is very important and you can illustrate that with the « you build it,you own it, you run it » principle.
Because, we are using scrum as a framework, a stream aligned team on a data domain will include :
- a data product owner : a very important role because he/she will convert the vision of a product to a deliverable or increment. Not just a classic PO but he/she has strong data and analytics skills too.
- a scrum master : because the team animation is crucial
- the developpers (including a tech lead) : people who will “code” the solution.
This is one team who is focus on value and data as a product. Everyone in this team has a good underestanding of the business challenge and the data they are manipulating. We want to avoid this schema defined by Zhamak Dehgahni (the left one who are building data pipelines without understanding what they are doing) and be on the right side (building a data product including data pipelines but not just that).
You can create as many teams as needed (if you have the ressources) because all these teams are independent. It can be a team for a data domain (full scope) but it can be also a part of this data domain because it is for a country or for a business unit. From a data governance point of view, all these teams will share the same categorization by data domain and sub-domains.
The most important ritual is going to be the « sprint review » of all these teams one by one. It can last all day but this is the place where each Data Domain Team can see the work of the other teams (demonstrate their product) to ask questions and imagine collaborations and partnerships. It is a ritual to develop transparency and to see the progress of each data domains. With the virtual meetings, this ritual is very different from the physical one we use to have, it is not rare to have more than 80 people attending.
The agile methodology is a very important framework to animate the teams (explained here in a previous article) and it is very important to develop the mindset if you do not want to do « fake agile ». The change management is huge and underestimated because you will never find someone against being agile. But in fact, every organisation is developing many anti-agile patterns because of the culture or the history.
3. The Platform Team
After having creating these autonomous team, you will need to support them and to ensure consistency on the way they are going to deliver their data products. As defined in Team Topologies, « the purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy… The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services ».
You have the same organisation with a product owner, a scrum master and the developers. The « product » is (most of the time) an API Service that will not just ease the work of the developers in the stream aligned team but also standardise their work.
You can have a VERY long list of services but the priorities should be
- Data objects creation (from ingestion, transformation and exposition)
- Data observability (as defined by Barr Moses here)
- Orchestration (from real time to complex workflows)
- Security & Access management
This is where you have the link with the DataOps move. In this previous article, you have all the basics to have for a « Data Platform As A service ».
It is totally aligned with this schema from Zhamak Dehghani in her first article
This team is just the gap between « we are like any other organisation » and the efficient one described by Nicole Forsgren in her book Accelerate. They are also involved in the sprint review ritual described for the stream aligned teams. The intent is the same : develop transparency and get the feedback from the other teams. The success of this team is based on their ability to develop two opposite things :
- Empathy for the developers in the stream aligned team (by building feedbacks)
- Great services with a 99.9996% quality (by staying focus and concentrate on the quality of the development)
There will be frictions between the « data infra engineers » and the data engineers in the stream aligned teams. The best image to use is the chicken and the pig fable. The developers in the stream aligned team are committed, the platform team is involved. The idea is to have only pigs and no chicken. Every team should be committed. Working on this relationship will be a key success factor even if they have the same leader.
The choice of technology will play an important role (will be covered in the second article) and you decision on which service to implement should be based on how to improve the velocity of the team and not a supposed « state of the art ».
The Enabling Team
It is not finished yet. There is still one missing part, a team (as defined in Team Topologies) « composed of specialists in a given technical (or product) domain. In the DATA context, you have many transversal subjects with a very high level of expertise needed like :
- Your Cloud Service Provider (mastering the cloud services, the connection to your source systems, the costs with Finops)
- Data Security & Access management and Data Privacy
- Data Management like data modelisation, catalogs or data quality
Because Data is an asset, you must have a governance and to make the loop, because you have created these data domains, you need to govern them and have all the same rules. These topics are transversal for the stream aligned teams and the platform team as well.
In her second article from Zhamak Dehghani, the closest to this Enabling team is what she is calling a federated computational governance.
The mistake would be to let each stream aligned teams and the platform team to handle these topics by themselves because it is a lot of efforts (and meetings…). That is why you need this Enabling team made up of individuals with a very high level of expertise and with a “strong collaborative nature”. They are the link (or the proxy) between these two team topologies (stream aligned and platform team) with the other services in your very large organization. The perfect wrong thing to do would be having all your stream aligned teams attending a meeting on data security by the security departement and figuring out how to do that in their context.
In the real life, the enablers can turn very rapidly into a big blocking point because of these reasons :
- Lost in the processes of the organisation : they were here to help to navigate but they are the first to need help.
- Lost their credit : they do not have enough expertise or empathy and every team leader will try to avoid this team or individuals as much as possible.
- Lost in their expertise : I am in my “ivory tower” and I strongly believe that no one can understand the subject. They are seen not like doers.
The key about building this team is not selecting the right expertise but the right mindset that will fit with the culture of all the other teams. Let face the reality, Data Governance needs a paradigm shift too and I strongly recommend the work of Barr Moses and the blue/red pill analogy.
You have now the 3 team topologies for a data mesh organisation, it looks like this and you the all the key elements :
- Data domains oriented with the stream aligned teams building data as a product
- Data services as a platform with the platform team
- Global Data Governance with the enabling team
Conclusion : the link between Team Topologies & Data Mesh
I just wanted to map the data mesh principles (described by Zhamak Dehghani in her two articles) with the the 4 Team Topologies (yes there are 4 and not 3, the 4th one could be all the teams managing the IT systems you need as a source) . You have so much more in the book with subjects like :
- Sofware sizing and the cognitive load (very high when it is about data)
- Heuristics for Conway’s Law (the link between the architecture chosen and your teams)
- Patterns for team interactions (the success of your teams is there)
- Triggers for change and evolution (because nothing is static)
I like the subtitle of the book too : “organizing business and technology teams for fast flow”. You will not have a better summary when we are talking about the data challenge in any organisation.
Do not be mislead too, the heart of the Data Mesh is about the architecture (I will come back to that in the second article) but everything starts with Data Teams and how they are organized. It is very far from the one central team that will save the world because they have a big data platform. And because of the Conway Law, we all know about the link between teams and the design of your system.
If you want to start your journey to a data mesh, it will start for sure with the way you have organized your data teams and how each team can contribute and share to build the data asset of your organisation.
My next article will be how to do it from an architecture and technology point of view.