Facebook DataWarehouse (maybe the most important article for my professionnal life – reboot 2020 – 10 years later)

This 2009 article has totally changed my professional life and surprisingly it is still here in 2020. More than 10 years later, everything looks obvious but at that time it was just an UFO for me and I have read it again and again to understand each concept. Let me comeback to the top 5 most striking points.

1. Size does matter

A petabyte scale data warehouse was in 2011 out of my… scale with my 60 To datawarehouse (at that time, one of the biggest in France and Telecom industry is known to be super data rich). Today, it is still a very strong maturity milestone in your data journey but it does not look impossible with cloud technologies. More importantly, the ressources needed to handle it is nearly zero if you go serverless.

2. It is about Engineering

Data engineers are now everywhere – it is an obvious role in a data team. But at the end of 2010s, nobody was talking about that because everything was commercial software based.

You buy a software, you install it, you configure it and then you deploy it : nothing was “engineered” and skills were focus on the knowledge of the software.

In a world “everything as code” in the cloud, everything is now data engineering centric.

The naming with “Facebook Engineering” gave a new way of considering things.

3. How much is it ? Free ? Really ? Open what ?

I still remembered my “favorite” Database vendor telling me : “if it’s free, it is not good, People are not stupid to pay something that can be free”.

In fact, it is not free but tech companies in the Silicon Valley have shared their best talents to build something they never could made individually. Open source is not based on charity. It is the believe that a community will be superior to any software vendor. If Facebook can dedicate 20 of their best engineers and you have also EBay, Yahoo or Google doing the same, the result will a valuable investment.

It is also the best way to recrut your next talent if it is open. Anyone on this planet could learn the project and start to show his/her skills by contributing or challenging it.

There are still valuable commercial solutions in the data world today but at least half of your data platform should be based on open source packaged in a cloud service.

4. In SQL we trust

Strangely, the coming of big data has put SQL (a 1970 language !!) at the center of everything. I remember learning SQL when I was student and the feeling at that time was it looks like a second class language (compare to others) and the grammar was limited. I was totally wrong : it is very powerful and easy to read !
I have annoyed my kids with only one language to learn : SQL

5. Not the same business users

For my users, a drag and drop tool was already too hard… A one week training was necessary to use it. So discovering that business users in Facebook could do SQL queries was like going to Mars and finding business aliens users. Facebook was not an isolated case, I will meet many others organisations like this (GAFA mainly, data startups, etc…) .

It is the same for the vendor about open source, you will always have users telling you “never, you are dreaming”.

Conclusion

11 years later, the facebook datawarehouse looks like a good piece to be in a museum of Information Technology. You can see how fast things have changed. But the principles are still valid : open source, scability, unlimited size with a cheap price, SQL and tech savvy business users.

Being 100% serverless with a pay as you go model for a data warehouse is just the upgraded version of what Facebook did.

4 thoughts on “Facebook DataWarehouse (maybe the most important article for my professionnal life – reboot 2020 – 10 years later)

  1. Oui – la \”vraie\” virtualisation – celle qui consiste à avoir au global un serveur composé d'une multitude de machines physiques. Et non celle qui consiste à avoir plusieurs serveurs virtuels sur une seule machine physique.

    Like

  2. Love your point on Business users learning SQL, I was trained for my first job as marketing analyst and still use it many many years after

    Liked by 1 person

Leave a Reply to François Nguyen Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s