Using ChatGPT/ LLMs for learning Fabric (be careful!)
I get it, it's an attractive proposition. Type any technical question into a chat window and get an instant response. Unfortunately (at the moment), it's not quite as simple as that. I think we all know that ChatGPT & other large language models (LLMs) can hallucinate, i.e. confidently giving you answers that: - are wrong - are misleading - were maybe right 6 months ago, but now the answer is irrelevant/ not accurate. With Fabric, they are a few factors that increase the likelihood of hallucinations, that you need to be very aware of: - Fabric is fast moving - things change weekly, monthly. Therefore a feature/ method/ piece of documentation that was used in the last LLM training run 6 months ago, might no longer be relevant, or new features have superseded previous approaches. - Fabric is the evolution of previous Microsoft data products. This is good in some ways, but catastrophic for LLMs (and learners relying on LLMs). There is vastly more training data out on the internet for Azure Data Factory, for example, than Fabric Data Factory. Or Azure Synapse Data Engineering over Fabric Data Engineering. And yes there are similarities for how the old tools work vs the new tools, but you need to be super careful that the LLM generates a response for FABRIC Data Pipelines, rather than Azure Data Factory pipelines, for example. Or generates Fabric Data Warehouse compliant T-SQL code, rather than Azure SQL code. This is very difficult, unless you have knowledge of how both products work (which most learners/ beginners don't!). I'm not saying don't use LLMs for studying, just that you need to be super careful. I can think of two use cases that are lower risk, using LLM+Fabric for Spark syntax & KQL syntax generation. That's because Spark and KQL are very mature ecosystems, with lots of training data on the internet, and their syntax won't change too much over the months and years. Fabric Data Warehouse T-SQL code generation is more tricky/ risky because the way the Fabric Data Warehouse works is quite different to a conventional SQL Server (which is what most of the training data will be based on).