As we discussed in our webinar, A Data Scientist's Guide to Microsoft's Modern Data Stack, presented by our Microsoft Data expert Faheem Javed, the landscape of Microsoft data analytics is shifting rapidly by moving from the 'Classic' on-premises tools we’ve known for decades to a dynamic, cloud-native ecosystem. Navigating new terminology like 'Lakehouse,' determining when to use Microsoft Fabric versus Azure Databricks, and understanding the role of MLOps can be daunting, even for seasoned professionals.
To help clarify these concepts and assist you in planning your team's upskilling journey, we’ve compiled this FAQ guide with some short video clips from the webinar. It addresses the most pressing questions we encounter regarding architectural decisions, the evolution of the Data Scientist role, and how to modernize your data strategy for the future.
Table of Contents:
- What is the difference between the "Classic" Microsoft BI stack and the "Modern" Data Stack?
- Should I use Microsoft Fabric or Azure Databricks?
- What is a "Lakehouse" and how does it differ from a Data Warehouse?
- Can I perform Data Engineering or ETL without writing code?
- How does a Data Scientist visualize data if standard Power BI visuals aren't enough?
- How do I automate the deployment of Machine Learning models?
- How can I incorporate Generative AI and Large Language Models (LLMs) into my workflows?
- Where does Azure Data Factory (ADF) fit into the modern stack?
- How do I ensure data security and governance across these new tools?
- What is the underlying technology powering these modern analytics platforms?
-
What is the difference between the "Classic" Microsoft BI stack and the "Modern" Data Stack?
The "Classic" stack typically refers to on-premises tools like SQL Server Integration Services (SSIS), Analysis Services (SSAS), and Reporting Services (SSRS).
While these tools were great for simpler scenarios with limited data volume, the "Modern" stack (involving Cloud, Fabric, and Databricks) is designed to handle the "3 Vs":
- Large Volume (terabytes/petabytes),
- High Velocity (streaming data)
- High Variety (structured tables mixed with raw files)
The modern stack moves away from the "BI Developer" title toward specialized roles like Data Engineer and Data Scientist.
Watch this short video on why the classic BI model falls short.
-
Should I use Microsoft Fabric or Azure Databricks?
This is the most common architectural question we face.
- Microsoft Fabric is an all-in-one, SaaS (Software as a Service) analytics platform. It is ideal if you want a complete solution (Data Engineering, Science, and BI) without needing a full-time administrator to manage the infrastructure. This short video on Fabric dives in a deeper to this no-code option.
- Azure Databricks is often preferred by architects who want deep control over their Spark environment or are already heavily invested in the Databricks ecosystem. It uses the Unity Catalog for governance and supports both structured tables and raw files. Watch this short video from the webinar on Databricks for more information.
- Microsoft Fabric is an all-in-one, SaaS (Software as a Service) analytics platform. It is ideal if you want a complete solution (Data Engineering, Science, and BI) without needing a full-time administrator to manage the infrastructure. This short video on Fabric dives in a deeper to this no-code option.
-
What is a "Lakehouse" and how does it differ from a Data Warehouse?
In the classic stack, we used Data Warehouses, which store strictly structured data (rows and columns) for reporting. A Lakehouse combines a Data Warehouse with a Data Lake. This allows you to store and process both structured tables and raw files (like CSVs, Parquet, or images) in a single high-speed location. Both Fabric and Databricks rely heavily on Lakehouse architecture to support both BI and Machine Learning workloads.
This video clip from the webinar explains more about Lakehouse vs. Warehouse.
-
Can I perform Data Engineering or ETL without writing code?
Yes. Microsoft realizes not everyone is a hardcore programmer.
- In Microsoft Fabric: You can use Dataflows, which is a no-code/low-code tool identical to the Power Query editor found in Excel and Power BI.
- In Azure Machine Learning: There is an Automated ML (AutoML) feature that lets you train models without writing code, or a designer-based approach where you drag and drop activities.
This short video explains more about building ML models without code.
- In Microsoft Fabric: You can use Dataflows, which is a no-code/low-code tool identical to the Power Query editor found in Excel and Power BI.
-
How does a Data Scientist visualize data if standard Power BI visuals aren't enough?
While Power BI has great out-of-the-box visuals, Data Scientists often need specialized plots (like box plots or radar charts). You can use the Python visual within Power BI to write Python code and utilize libraries like matplotlib or seaborn directly in your reports. Alternatively, you can use Notebooks in Fabric or Databricks to visualize data using Python or R before it ever reaches a report.
See how you can use custom visuals in Power BI in this short video.
-
How do I automate the deployment of Machine Learning models?
We use a concept called MLOps (Machine Learning Operations), which is like DevOps but for data science. This allows you to automatically retrain and deploy your models whenever your source data changes. You can implement lifecycle management using tools available in Azure Machine Learning, Fabric, and Databricks.
This short video explains more about MLOps and governance.
-
How can I incorporate Generative AI and Large Language Models (LLMs) into my workflows?
As a Data Scientist in the modern stack, you aren't limited to traditional predictive models. You can now integrate Generative AI directly into your pipelines using Prompt Flow within Azure Machine Learning. This tool allows you to connect to Large Language Models (LLMs) like OpenAI, Gemini, or Claude to build intelligent applications. Moving forward, these capabilities are being integrated directly into Microsoft Fabric to create a one-stop solution for all AI workloads.
Watch this short video on Azure ML for working with LLMs.
-
Where does Azure Data Factory (ADF) fit into the modern stack?
Azure Data Factory is the enterprise-grade "successor" to SSIS in the cloud. While users can clean data themselves using Excel or Power BI (the self-service approach), ADF is designed for centralized, enterprise-wide Data Engineering. It provides a graphical interface to build robust ETL (Extract, Transform, Load) pipelines that move and clean data at scale. It is the best choice if your organization is already utilizing the Azure Cloud Platform and has dedicated administrators to manage the infrastructure.
Learn more about Azure Data Factory in this short video.
-
How do I ensure data security and governance across these new tools?
With data being accessed by more roles (Data Engineers, Scientists, and Analysts), security is critical. We use Microsoft Purview to implement comprehensive data governance. It allows you to manage your data estate, control who has access to specific datasets, and even implement data masking to protect sensitive information.
-
What is the underlying technology powering these modern analytics platforms?
Both Microsoft Fabric and Azure Databricks are built on top of Apache Spark, a massive parallel processing framework. Spark distributes your data workloads across multiple nodes (computers) running in parallel. This architecture is what allows these platforms to process insane amounts of data (terabytes or petabytes) at speeds that traditional SQL servers simply cannot match.
Watch this short video on Spark with Databricks.