Data Cleansing
Any preferred approaches/tips to follow when performing cleansing on data?
As a first step data profiling makes sense where one can make use of public libraries for profiling e.g. ydata and great expectations to assist, after which you will get an idea how data is stored and used.
Apart from applying the basic cleansing operations for e.g. on text fields to remove special and non-ascii characters and also remove more than 2 consecutive spaces, what else would be suggested? I guess this depends on the data owners which could vary based on requirements.
More granular cleansing operations would require mappings to indicate a from and to value which could be put into a config table and then read via notebook processing when moving data between bronze and silver.
Are the any other comments/steps/tips to consider as part of a best-practice approach?
2
1 comment
Zach Bester
2
Data Cleansing
Learn Microsoft Fabric
skool.com/microsoft-fabric
Helping passionate analysts, data engineers, data scientists (& more) to advance their careers on the Microsoft Fabric platform.
Leaderboard (30-day)
Powered by