Activity
Mon
Wed
Fri
Sun
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Learn Microsoft Fabric

12.8k members • Free

Fabric Dojo 织物

341 members • $447/y

28 contributions to Learn Microsoft Fabric
Extracting ~5m rows from D365 F&O to Fabric.
Hi, I have a table containing around 5m rows. i am connecting to that table through OData feed. i need help with the following: when connecting to it with a dataflow gen2, it takes around 7-8 hours which then fails due to the timeout. when i tried connecting through a data pipeline, i create a cloud gateway following the documentation on Set up your OData connection from microsoft. i set the authentication method to OAuth2 and set up the organizational credentials, the same ones i used to connect within my dataflow gen2. when i tried connecting on my pipeline, i could not find that created connection gateway. how can i achieve a sucessful connection?
1 like • May 30
Have you tried Fabric Link? D365 F&O is on dataverse, so if you're trying to get base tables, you're fine. The problem is that most people work with entities, which are not supported with Fabric Link. You can recreate the entities in the Warehouse if you have the definitions, but I recommend just working off of the base tables when you can. It's an hourly trickle feed shortcut that you don't have to both ETLing. I will warn you that it requires manual refreshing in two places every 60 days or so ... Ask your MS account manager for the details. https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-view-in-fabric
Stored Proc vs. PySpark Notebook
My company is transitioning to Fabric from Azure synapse. Currently, our data team is debating about using Pyspark notebook or SQL Stored procs for our transformation of data. Our ultimate goal is to be platform agnostic and be more future proof. Meaning in the event that we would like to switch to other data platforms lng AWS or GCP or even snowflake, the transition would be much easier like a lift and shift migration. What would you recommend?
0 likes • May 8
@Piotr Prussak I think the issue with this "AI can fix it" perspective is that is doesn't consider the complexities of translating from one language to another. For example, if you're using correlated subqueries in your TSQL sprocs, good luck translating that to Spark SQL, even with AI. It still requires quite a bit of guidance and testing.
1 like • May 8
@Lori Keller As usual, I think Lori is spot on here. If the team's skillset is TSQL and they don't (and won't) have the bandwidth to learn PySpark and the nuances between TSQL and Spark SQL, stick with the sprocs. However, if there's room for and interest in learning PySpark, DO IT! It forces you to get back to basics and really evaluate what you're doing. Having the flexibility to do either notebooks or sprocs is a major advantage. There are also a lot of really great functions in PySpark that simply don't exist in TSQL. Deduping is incredibly easy in PySpark. Lateral column aliasing is SUPER convenient. explode is also a really nice PySpark function.
Quick question - delta tables
I am working on a function to clean and optimize multiple delta tables in a lakehouse. Do we OPTIMIZE before VACis UUM or there other way around? Which one is the best approach?
3 likes • Feb 18
Typically optimize then vacuum. I've never heard of anyone doing it in reverse. Vacuum removes the older, now unreferenced files. If you vacuum before optimize, those current small files are still present.
DAX Help - Trailing average for the previous 12 weeks
Our team needs a little DAX help from any experts out there... We are looking to do for each row a trailing avg for prev 12 weeks for a metric lifts per hour = successful lifts / time on clock. I attached an image of the grid and put in what our DAX query looks like at the moment. Code: Trailing_12_Week_Avg_Lifts_Per_Hour = var CurrentDate = MAX(dim_date[event_date]) var PreviousDate = CurrentDate - 84 var Result = CALCULATE( [lifts_per_hour], DATESINPERIOD( dim_date[event_date], MAX(dim_date[event_date]), -84, DAY ) ) RETURN Result Any assistance would be appreciated!
DAX Help - Trailing average for the previous 12 weeks
0 likes • Feb 13
@Justin Sweet Can you provide the DAX for your [lifts_per_hour]? If it's not something like lifts_per_hour = DIVIDE(SUM( 'Table'[Lifts]), SUM('Table'[Time])) That might be the issue. If you're trying to look at every individual day rather than the true average (total lifts / total time), in which case you'd need to being using AVERAGEX in your Trailing_12_Week_Avg_Lifts_Per_Hour.
ERROR: FAILED TO RENDER CONTENT
So, apparently I have been trying to ingest over 7 million rows from an API to a lakehouse table using a notebook, and due to rate limiting, it has taken quite a while (more than expected). Two challenges: 1. "Failed to render content" screen whenever I try to access the pipeline while it runs. 2. The pipeline times out after 12 hours Is this normal?
2 likes • Feb 12
@Wilfred Kihara Can you provide a little more context? What SKU are you running on? 7 million shouldn't be enough to challenge any SKU F8 and above. You mention both a notebook and a pipeline. Are you running the notebook from the pipeline? If so, have you successfully tested the notebook from outside the pipeline? If so, are you passing parameters into the notebook?
1-10 of 28
Anthony Kain
4
79points to level up
@anthony-kain-6916
Senior Data Consultant focused on enterprise data modernization. Connect with me on LinkedIn: https://www.linkedin.com/in/tonykain/

Active 8d ago
Joined Sep 9, 2024
INTJ
St. Louis, MO
Powered by