Activity
Mon
Wed
Fri
Sun
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Learn Microsoft Fabric

12.8k members β€’ Free

2 contributions to Learn Microsoft Fabric
Efficient Fabric storage
What is the most efficient way of storing data in the lakehouse in consideration of future rerunning the process and recovery from point of failure? Ingest data --> write as parquet format (back-up file)--> write to a delta table OR ingest data --> write directly as delta table?
0 likes β€’ May 22
@Sambhav R , Thanks for your input. I believe we may need to keep history for at least a month. So that means safer to write it implicitly to a parquet format still right?
Stored Proc vs. PySpark Notebook
My company is transitioning to Fabric from Azure synapse. Currently, our data team is debating about using Pyspark notebook or SQL Stored procs for our transformation of data. Our ultimate goal is to be platform agnostic and be more future proof. Meaning in the event that we would like to switch to other data platforms lng AWS or GCP or even snowflake, the transition would be much easier like a lift and shift migration. What would you recommend?
1 like β€’ Apr 25
@Piotr Prussak Thanks for your input. Ahhh right, I did not consider the consideration for external API or even sFTP integration. Good point on that. Agree that SQL is a bit generic and if we end up using SQL it should be standardized to use a SQL version that would be more generically supported so I guess stick with ANSI-SQL coding standards to be cross-platform agnostic?
0 likes β€’ Apr 29
@Lori Keller , Thank you for sharing your actual experience around this decision point. I appreciate it. I share the same sentiment of trying to learn something new with notebooks and all the capabilities it brings to the table especially with data transformation processes given there is some wiggle room with time. The team is now leaning towards pyspark notebooks + SQL to leverage the power and efficiency of notebooks but at the same time not totally re-writing established SQL transformations that were already written so as not to take more time in transforming it into python code. While tinkering around Fabric workspaces, I just noticed it would take around 2.5 minutes to spool a notebook when you just want a simple DML statement in a table but once spark cluster is up, processes are executed instantaneously.
1-2 of 2
Dexter Wagang
2
12points to level up
@dexter-wagang-5380
Data Architect

Active 126d ago
Joined Apr 21, 2025
Powered by