Explaining and Exploring Microsoft Fabric

Exploring Microsoft's end-to-end analysis stack as a rookie analyst

·

7 min read

Explaining and Exploring Microsoft Fabric

Disclaimer: I am a Microsoft employee, this post represents my personal understanding and thoughts on the Microsoft Fabric reveal - all my information is gathered from public sources linked through the post

Introduction

In late May during Microsoft's annual developer conference, Microsoft Build, the company announced Microsoft Fabric, a brand new product that combines the Low-Code analytics engine Power BI with many of Microsoft's existing data platform solutions (namely Synapse Analytics) to provide a single SaaS product that could cover an entire analytics workload end to end.

Microsoft Fabric Reveal

At first glance, even as a Microsoft employee, this product didn't seem terribly interesting to me - existing products bundled into a convenient package make for a great sales pitch, but I wasn't clamouring to try it out as I'd used tools like Synapse Analytics and Power BI on their own before.

It wasn't until I started to see videos from the likes of Guy in a Cube and Pragmatic Works (if you somehow don't already watch these two channels, make sure you check them out) that I began to realise the significance of Fabric, particularly for organisations that may not have access to large-scale data infrastructure, or those that have already invested heavily in existing solutions for storing data outside of the Microsoft ecosystem.

I'm hoping to use this blog post to collect my thoughts on just how cool this new product is, from the perspective of someone new to the world of data analysis and engineering.

The Pitch

So here is the elevator pitch: Microsoft Fabric is a single solution that allows organisations to easily pull in relevant business data from all locations (AWS, Google, Azure) and store it in one centralised location for all forms of analysis.

Simply, Microsoft Fabric allows organisations to create a "Source of truth", one location where data is not only accessible to both business users and analysts, but is also able to be transformed, queried, and automated - all on the web. The secret sauce that powers the integration between end-user business intelligence, analytical workloads, and data storage is a data lake solution that Microsoft is calling OneLake.

Diagram showing the function and structure of OneLake.

The OneLake architecture

OneLake is a unified data storage location for all business data. Whether your business data is structured into tables, semi-structured through loosely defined schemas, or even unstructured, your OneLake can store this data through a system of Data Warehouses and a more modern approach, the Lakehouse. I intend to dive a little deeper into exactly what Data Warehouses, Lakes, and Lakehouses are in a future article, but suffice it to say if you have data, it'll fit somewhere in Microsoft Fabric's structure.

Once data is processed and stored in your OneLake, the power of Microsoft's analytical stack is unlocked for all users in your business. Fabric is integrated directly with almost every analytical tool an organisation could want. Users exploring machine learning and data science workloads can easily connect to business data to build machine learning models with one click using Synapse Data Science, Analysts can quickly query data using the SQL endpoint integrated into every lakehouse, and business users can connect directly to pre-made Power BI datasets for quick analysis in a visual environment.

A screenshot of the "Create New" page inside of Fabric, detailing every item that can be created within the platform

The incredible number of options under the "New" button in a Fabric Workspace

Use Cases

It can get really overwhelming to try and comprehend exactly how all of these moving pieces come together to be beneficial to a business, particularly if you're new to concepts that might traditionally fall under the discipline of data engineering like data preparation and warehousing. Here are a few quick examples that have come up in my recent conversations with colleagues and customers:

Historical Data

I can't tell you how many times I've encountered this problem - an organisation wishes to use Power BI to analyse historical data from their sales tools, however, the tool itself only keeps historical data for a finite period (Usually to save on server space in the case of SaaS products).

Fabric allows any IT worker in an organisation to quickly create a data Lakehouse directly from the web, and then a Data Pipeline, and/or Data Flow to append historical data for analysis directly into OneLake, meaning that even if the data is erased from its source, it is availble for analysis inside our "source of truth", OneLake. Business users can connect directly to the pre-created Power BI dataset that comes with every Lakehouse and begin analysis right away.

Unstructured Data

It's no secret that the modern data stack of most organisations can be a little mismatched. Depending on the size and age of an organisation, data that is used for analysis can sometimes come from vastly different sources. An organisation may be able to connect directly to a SQL database to ingest sales data, however, their HR system may export data directly in the form of CSV files.

This mismatch of formats and structure can cause great difficulty when trying to combine and analyse critical business data from multiple sources, however, this is where the Lakehouse in Fabric shines. A Lakehouse can house traditional structured data in the form of tables, while also being able to store data in file formats, like JSON, CSV, XML, and Parquet. A Lakehouse can even have files directly uploaded via the web, meaning that you can simply upload a static CSV file and have that queryable right alongside traditional data tables.

Processing offload

Both of the use cases above are general examples of how businesses could solve common issues with Microsoft Fabric, but what about existing Microsoft developers? As a Power BI developer myself, my first thought when seeing what Fabric can do was: "I can already do most of this in Power BI", but I was missing something big: Performance.

Recently I was asked to perform a Power Query data transformation for a Power BI report that was relatively simple, but it needed to be processed across tens of thousands of rows. The time and processing power it would have taken to perform this every single time my dataset was refreshed meant that this report would likely not be viable. For a business critical report, there may have been justification to take the time (and cost) to set up a separate data warehouse and to perform the data transformation before the data was stored, however in many cases the cost and labour required to set up this type of workflow simply would not be worth the result.

Using Fabric, I could set up a Dataflow Gen 2 directly from the browser to quickly ingest and transform this data, storing it all directly inside of a warehouse with a Power BI dataset pre-created for my analysis. No special setup is required, and I could easily paste my Power Query over to the dataflow itself, meaning I didn't have to rewrite my entire solution.

These are just a few basic use cases I've been able to encounter in the last few weeks of learning the platform, I have no doubt we will begin to see amazing stories of how organisations use Fabric one general availability begins.

Getting Started

If you're reading this article and you're thinking that you'd like to give Fabric a try, consider heading over to Microsoft Learn to get information on starting a trial today. I'm currently exploring the free trial myself, and it's worth noting that Fabric is still in preview, meaning you will likely run into bugs and unexpected behaviours. If you encounter any issues, or you want to join the discussion on how this tool could be made better, head over to the Fabric Ideas Forum to leave your thoughts.

As a user with ample experience developing and consuming analysis in the Power BI platform, I found the Fabric interface to be incredibly easy to follow along with, I had a Python notebook querying data directly from a Lakehouse within my first few hours, however, if you are new to the Microsoft data ecosystem, it's worth following a tutorial relevant to your discipline. Microsoft have collated a few end-to-end articles to follow here

Conclusion

Whether you're a seasoned data professional or just a beginner like me, Microsoft Fabric makes it hard to not get excited about the future of analytics tools. Advancements like Fabric allow data analysts, data scientists, and even business users to spend less time worrying about complicated infrastructure, and more time on discovering actionable insights to improve their business.

Keep your eyes peeled for more articles coming up as I dive even deeper into the world of the Microsoft analytics stack, consider following me on Twitter where I post updates on what I'm learning, where I'm succeeding, and where I'm failing in my journey to better understand the world of data.