1. Size does matter
2. It is about Engineering
Data engineers are now everywhere – it is an obvious role in a data team. But at the end of 2010s, nobody was talking about that because everything was commercial software based.
You buy a software, you install it, you configure it and then you deploy it : nothing was “engineered” and skills were focus on the knowledge of the software.
In a world “everything as code” in the cloud, everything is now data engineering centric.
The naming with “Facebook Engineering” gave a new way of considering things.
3. How much is it ? Free ? Really ? Open what ?
I still remembered my “favorite” Database vendor telling me : “if it’s free, it is not good, People are not stupid to pay something that can be free”.
It is also the best way to recrut your next talent if it is open. Anyone on this planet could learn the project and start to show his/her skills by contributing or challenging it.
There are still valuable commercial solutions in the data world today but at least half of your data platform should be based on open source packaged in a cloud service.
4. In SQL we trust
5. Not the same business users
For my users, a drag and drop tool was already too hard… A one week training was necessary to use it. So discovering that business users in Facebook could do SQL queries was like going to Mars and finding business aliens users. Facebook was not an isolated case, I will meet many others organisations like this (GAFA mainly, data startups, etc…) .
It is the same for the vendor about open source, you will always have users telling you “never, you are dreaming”.
11 years later, the facebook datawarehouse looks like a good piece to be in a museum of Information Technology. You can see how fast things have changed. But the principles are still valid : open source, scability, unlimited size with a cheap price, SQL and tech savvy business users.
4 thoughts on “Facebook DataWarehouse (maybe the most important article for my professionnal life – reboot 2020 – 10 years later)”
Bonjour,On peut donc parler de virtualisation du serveur et de serveurs virtuels.Nathalie Mollet
Oui – la \”vraie\” virtualisation – celle qui consiste à avoir au global un serveur composé d'une multitude de machines physiques. Et non celle qui consiste à avoir plusieurs serveurs virtuels sur une seule machine physique.
Love your point on Business users learning SQL, I was trained for my first job as marketing analyst and still use it many many years after
LikeLiked by 1 person
I have annoyed my children only for one language to learn : SQL !