I have over 26 Excel files, each named in the format DPI-YYYY-MM-DD (e.g., DPI-2023-01-06, DPI-2023-01-20, DPI-2023-02-03, etc.). Each file contains seven sheets (Sheet 1 to Sheet 7), and I need to load them into a Data Warehouse in MS Fabric. What would be the most efficient and ideal approach to accomplish this? Any suggestions?
It really depends on your skillset and the maintenance effort required. For a one-time task, you could upload the file into the lakehouse, use a notebook for the transformation, and then load the transformed data into both the lakehouse and the warehouse. However, if this is an ongoing project where files are uploaded to SharePoint monthly, you could use Dataflow Gen 2 to load the data into the warehouse, with the process orchestrated by a pipeline and a scheduled refresh.
Hi everyone, Now that we have CI/CD integration in place, has anyone tested it and figured out how to implement deployment rules or alternative to switch sources between stages ? Stephane
I encountered a challenge while deploying the semantic model connected to DFG2 from the development environment to another environment. Seem like the auto binding issue is there.
Hi all, what I'm trying to do here is directly connect my semantic model to Dataflow Gen 2 (CI/CD) instead of ingesting it via the warehouse. The reason for this is that I'd like to experiment with version control for the dataflow, and I don't really need the warehouse or Lakehouse solution. However, I'm encountering an issue: "When deploying the items below, any related dataflow must be included in the deployment or must already exist in the target folder" even though the Dataflow Gen 2 has already been deployed to the QA/Prod environments. Today, I'm already unable to connect to the Dataflow Gen 2 CICD via Power BI Desktop.😂
Leave a comment with the scariest thing you can say to a Microsoft Fabric Data Engineer / Data Analyst / Data Scientist / Fabric Security Engineer 👀 Scariest saying wins 🎃
Lot of changes coming to the certification from November onwards and few topics have been deleted like Pyspark is gone but KQL has been introduced https://learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/dp-600
The removal of PySpark from DP-600 is due to the introduction of the new DP-700 certification, which are more relevant if categorized under data engineering.
Hi BI Guru! Based on the attached image, in your experience, which option—A or B—would be the best choice for implementing a Fabric Data Lake and Data Warehouse? and Why? Thanks!