Towards a Data Mesh (part 1) : Data Domains and Teams Topologies.

Just an illustration – not the truth and we will pivot if it does not work.

I discovered Zhamak Dehghani’s first article about Data Mesh in August 2020. Thanks to Youtube, you have the live illustration in this video with even more context and explanations. And then, you have this second video that is an introduction to her second article (december 2020). It is really a journey with a brillant person who has really a discovery mindset. She keeps making this concept better and clearer. Most of the time, I am seeing her in this small window but even like this you can feel her very high level of assertiveness.

Her ideas are known by many now and you even have a community here.

The next question after reading her two articles : how do we implement it ? In any organisation, you never have everything in control and you have to consider the context and the starting point. It is always a work in progress too.

Just to know, I never have had in mind that “this is what we should implement”. But I recognise I fully agree with the analysis and the principles proposed to fix the problem.

I will try to answer to this question “Could you illustrate your journey towards a Data Mesh ?” with 3 articles : this one about Data domains and Team Topologies, a second one devoted to the architecture and the technology and the last one about change management and the needed skills.

1. Solving the right problem

Let’s say that your starting point is having multiple data teams and multiple data platforms. It is a very common pattern for data (I call it the “it’s my data syndrome”) because like this everybody is happy with the autonomy but your data is naturally in silos with possibly multiple technologies. The solution could be tempting to have only one team working on one data platform. In fact, it is a problem too.

This is what has been described by Zhamak Dehghani as “centralized and monolithic” illustrated with this “big data” platform schema.

Creating a central platform and a central team will not match two very important points mentioned in her first article : data is ubiquitious and the need to innovate with data is urgent. With this kind of organisation, we are creating a point with a very high level of pressure on it. This team and the platform is not just a bottleneck, it will concentrate all the frustrations. You can trust me because I have been so many times in this situation : it will never be enough if you have multiple business units or many countries. And because you can’t stop the sea with your arms, you will have silos again. Even without any technology, at the end, people will use Excel or any other solution to fulfill their needs.

The right problem is to solve is this one ; to have different teams working on different subjects (data domains) and still be able to share and cooperate together.

In this article, Juan Sequada gives maybe one of the best definition of Data Mesh ” It is paradigm shift towards a distributed architecture that attempts to find an ideal balance between centralization and decentralization of metadata and data management.”. I would have added data teams to metadata and data management.

2. From Data Domains to stream aligned teams

Everything starts with the data domains : you will have to divide all the data of your organisation into data domains. I will not develop how you can define these domains (to stay focus on people and teams) but I strongly recommend this article by Ramesh Hariharan about Domain Driven Design (DDD). This methodology is totally adapted to data because it is about the business problem space to modelize, the language and the context.

Now you have these data domains, you will need to personify it with a Data Domain Owner. You will find many definitions of what a Data Domain Owner should do. Some are with a strong Data Governance flavour very focus on glossaries, data quality and data stewardship. But you can see it too as a Data Strategist who will translate business problems to analytical solutions and start to think data as a product. This is really about building the vision of what should be the data domain : scope of data, use cases and value.

When you have this vision, how to make it become a reality : the only answer to that is building a team (probably more than one) and to be more specific like a stream aligned team as defined by Matthew Skelton and Manual Pais in this book. “A stream aligned team is a team aligned to a single valuable stream of work… Further, the team is empowered to build and deliver customer or user value as quickly, safely, and independently as possible, without requiring hands-off to other teams to perform parts of the work”. The hand-off part is very important and you can illustrate that with the « you build it,you own it, you run it » principle.

Because, we are using scrum as a framework, a stream aligned team on a data domain will include :

  • a data product owner : a very important role because he/she will convert the vision of a product to a deliverable or increment. Not just a classic PO but he/she has strong data and analytics skills too.
  • a scrum master : because the team animation is crucial
  • the developpers (including a tech lead) : people who will “code” the solution.

This is one team who is focus on value and data as a product. Everyone in this team has a good underestanding of the business challenge and the data they are manipulating. We want to avoid this schema defined by Zhamak Dehgahni (the left one who are building data pipelines without understanding what they are doing) and be on the right side (building a data product including data pipelines but not just that).

You can create as many teams as needed (if you have the ressources) because all these teams are independent. It can be a team for a data domain (full scope) but it can be also a part of this data domain because it is for a country or for a business unit. From a data governance point of view, all these teams will share the same categorization by data domain and sub-domains.

The most important ritual is going to be the « sprint review » of all these teams one by one. It can last all day but this is the place where each Data Domain Team can see the work of the other teams (demonstrate their product) to ask questions and imagine collaborations and partnerships. It is a ritual to develop transparency and to see the progress of each data domains. With the virtual meetings, this ritual is very different from the physical one we use to have, it is not rare to have more than 80 people attending.

The agile methodology is a very important framework to animate the teams (explained here in a previous article) and it is very important to develop the mindset if you do not want to do « fake agile ». The change management is huge and underestimated because you will never find someone against being agile. But in fact, every organisation is developing many anti-agile patterns because of the culture or the history.

3. The Platform Team

After having creating these autonomous team, you will need to support them and to ensure consistency on the way they are going to deliver their data products. As defined in Team Topologies, « the purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy… The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services ».

You have the same organisation with a product owner, a scrum master and the developers. The « product » is (most of the time) an API Service that will not just ease the work of the developers in the stream aligned team but also standardise their work.

You can have a VERY long list of services but the priorities should be

  • Data objects creation (from ingestion, transformation and exposition)
  • Data observability (as defined by Barr Moses here)
  • Orchestration (from real time to complex workflows)
  • Security & Access management

This is where you have the link with the DataOps move. In this previous article, you have all the basics to have for a « Data Platform As A service ».

It is totally aligned with this schema from Zhamak Dehghani in her first article

This team is just the gap between « we are like any other organisation » and the efficient one described by Nicole Forsgren in her book Accelerate. They are also involved in the sprint review ritual described for the stream aligned teams. The intent is the same : develop transparency and get the feedback from the other teams. The success of this team is based on their ability to develop two opposite things :

  • Empathy for the developers in the stream aligned team (by building feedbacks)
  • Great services with a 99.9996% quality (by staying focus and concentrate on the quality of the development)

There will be frictions between the « data infra engineers » and the data engineers in the stream aligned teams. The best image to use is the chicken and the pig fable. The developers in the stream aligned team are committed, the platform team is involved. The idea is to have only pigs and no chicken. Every team should be committed. Working on this relationship will be a key success factor even if they have the same leader.

The choice of technology will play an important role (will be covered in the second article) and you decision on which service to implement should be based on how to improve the velocity of the team and not a supposed « state of the art ».

The Enabling Team

It is not finished yet. There is still one missing part, a team (as defined in Team Topologies) « composed of specialists in a given technical (or product) domain. In the DATA context, you have many transversal subjects with a very high level of expertise needed like :

  • Your Cloud Service Provider (mastering the cloud services, the connection to your source systems, the costs with Finops)
  • Data Security & Access management and Data Privacy
  • Data Management like data modelisation, catalogs or data quality

Because Data is an asset, you must have a governance and to make the loop, because you have created these data domains, you need to govern them and have all the same rules. These topics are transversal for the stream aligned teams and the platform team as well.

In her second article from Zhamak Dehghani, the closest to this Enabling team is what she is calling a federated computational governance.

The mistake would be to let each stream aligned teams and the platform team to handle these topics by themselves because it is a lot of efforts (and meetings…). That is why you need this Enabling team made up of individuals with a very high level of expertise and with a “strong collaborative nature”. They are the link (or the proxy) between these two team topologies (stream aligned and platform team) with the other services in your very large organization. The perfect wrong thing to do would be having all your stream aligned teams attending a meeting on data security by the security departement and figuring out how to do that in their context.

In the real life, the enablers can turn very rapidly into a big blocking point because of these reasons :

  • Lost in the processes of the organisation : they were here to help to navigate but they are the first to need help.
  • Lost their credit : they do not have enough expertise or empathy and every team leader will try to avoid this team or individuals as much as possible.
  • Lost in their expertise : I am in my “ivory tower” and I strongly believe that no one can understand the subject. They are seen not like doers.

The key about building this team is not selecting the right expertise but the right mindset that will fit with the culture of all the other teams. Let face the reality, Data Governance needs a paradigm shift too and I strongly recommend the work of Barr Moses and the blue/red pill analogy.

You have now the 3 team topologies for a data mesh organisation, it looks like this and you the all the key elements :

  • Data domains oriented with the stream aligned teams building data as a product
  • Data services as a platform with the platform team
  • Global Data Governance with the enabling team

Conclusion : the link between Team Topologies & Data Mesh

I just wanted to map the data mesh principles (described by Zhamak Dehghani in her two articles) with the the 4 Team Topologies (yes there are 4 and not 3, the 4th one could be all the teams managing the IT systems you need as a source) . You have so much more in the book with subjects like :

  • Sofware sizing and the cognitive load (very high when it is about data)
  • Heuristics for Conway’s Law (the link between the architecture chosen and your teams)
  • Patterns for team interactions (the success of your teams is there)
  • Triggers for change and evolution (because nothing is static)

I like the subtitle of the book too : “organizing business and technology teams for fast flow”. You will not have a better summary when we are talking about the data challenge in any organisation.

Do not be mislead too, the heart of the Data Mesh is about the architecture (I will come back to that in the second article) but everything starts with Data Teams and how they are organized. It is very far from the one central team that will save the world because they have a big data platform. And because of the Conway Law, we all know about the link between teams and the design of your system.

If you want to start your journey to a data mesh, it will start for sure with the way you have organized your data teams and how each team can contribute and share to build the data asset of your organisation.

My next article will be how to do it from an architecture and technology point of view.

4 thoughts on “Towards a Data Mesh (part 1) : Data Domains and Teams Topologies.

  1. That’s a great article on Data Mesh topic, very comprehensive and convincing.

    Sorry for the long rant but I do have one question though about the management buy-in on such an approach. Most organizations will see these “Data Teams” as too much rigid and academic approach of doing things. Setting up 3 teams for this purpose will be “laughed at” at least where I’m working. Having 1 person in each team playing all the roles will be counter-productive.

    In large matrix organizations, it is very hard to advocate for such an approach because every business is creating their own data product/mesh for themselves and their needs only. They simply do not care about the users of their data outside their domain. Even even the benefits of sharing/using well-defined, quality data from multiple domains are clear, management is often unsupportive of such initiatives. We may not associate ourselves with tribes but we do carry the tribal mentality in 21st century.

    Data Mesh (or any other activity on Data) should tackle the mindset of the management who takes the decisions of go/no go. The technology, methodology, team setups all fall apart if no one is interested at upper management.

    I might be going philosophical here but we need “an activity” before proposing Data Mesh or another topic. This activity should feed the upper management to get their buy in for such initiatives. Questions like what is the value/benefits for me/my team/my business should be addressed. Maybe it will help to avoid the situations where normally business says ok show me X in action before I commit. But creating that X as MVP needs resources from the same business which they will provide after seeing things in action. It quickly gets into chicken and egg problem.

    Again sorry for the long rant, I guess I’m frustrated at not being able to make any headway on my end to do things.

    Some minor questions on the article itself.
    i) Do you suggest any order to create these data teams? I mean start with the Enabling team first and then Stream and lastly Platform. I find it difficult that all 3 will be setup at the same time.

    ii) In my opinion marketing and offers development teams should also be included somewhere. These guys will provide impetus to the work since they know the customer need and market trends better than technical guys.

    Thanks a lot and I’m eagerly waiting for your next article.

    Like

  2. Dear Lubna,

    First of all, thanks a lot because as we say in my company “feedback is a gift”.

    I am in a very decentralized organisation (its DNA) so we have already many teams taking care the same topic. I just want to underline the fact that having one team doing all (central team on a central platform) will crystalize many frustrations like “it is too slow”, “we are never in their prioritiy”, “I need it now because I am going to loose business”.

    About this point “They simply do not care about the users of their data outside their domain”, you are very right but If you have the right data domains, you will see that everybody needs each other. To give you an example, let’s imagine that you have the product data domain (all the data related to your products), they can’t really do something with the data coming from the finance data domain (turnover by product) if they want to develop a use case about having a better product portfolio management. The best use cases are the ones when you can cross many data together so you have interest to develop collaboration. It is going also to be less expensive too if they use existing datasets.

    About the management mindset, that is true too but between total decentralisation (aka the jungle) and total centralisation (aka the dictator), you do not have too much choice than promoting collective intelligence and collaboration.

    If you want to find an activity before jumping into it, my recommandation would be to find the best potential product owner on the business side with a big pain where data is needed and create a team with the best scrum master, tech lead and developers you can find and tell them “your only goal is to fix the pain”. Every 3 weeks, you should publicize the sprint review and see what’s happening. Like in the book Team Topologies, the belief is Cross functional Teams is the only way to solve problems. The team can be small (4-5) and we just let them play 3 months.

    The right order for me is 1) The platform team 2) The stream aligned team 3) the Enabling Team. The rational is that you need to prepare the appartement before someone moving in than you can create your stream aligned team and when you have several stream aligned teams, it becomes urgent to have the enabling team.

    A data product owner with a strong knowledge about the customers and the market trends is the best option. If they are too busy, it is important that the product owner has this kind of network.

    Stay safe,
    Francois

    Like

Leave a reply to frnguyen01 Cancel reply