Search Tutorials


DP-600 Dataflows Gen2 and Data Pipelines | Microsoft Fabric Analytics Engineer | JavaInUse

DP-600 - Dataflows Gen2 and Data Pipelines

Dataflow Gen2 Overview

Dataflow Gen2 is a no-code/low-code data integration tool in Fabric built on Power Query. It allows you to connect to 150+ data sources, apply transformations using the Power Query UI, and write results to a destination.

Key improvements in Gen2 over Gen1:

  • Output Destination: Data can be written directly to a Lakehouse, Warehouse, Azure SQL, or other destination - not just the dataflow's internal storage
  • Staging: Uses Fabric Lakehouse as staging storage for intermediate results (faster, no Power BI dataset dependency)
  • Faster refresh: Can leverage Fabric capacity compute
  • Template saving: Dataflow queries can be saved as reusable templates

Power Query Transformations

Common transformations available in the Power Query Editor (Dataflow Gen2):

TransformationPurpose
Filter RowsKeep only rows that meet a condition
Remove DuplicatesRemove rows with duplicate values in selected columns
Fill Down / UpFill null values with previous/next non-null value
Replace ValuesFind and replace specific values in a column
Split ColumnSplit by delimiter or position
Group ByAggregate rows by key columns
Merge QueriesJoin two queries (equivalent to SQL JOIN)
Append QueriesUnion multiple queries (equivalent to SQL UNION ALL)
Pivot / UnpivotRotate columns to rows or rows to columns
Add Custom ColumnAdd a column using M formula
Change TypeConvert column to correct data type

All transformations generate M (Power Query Formula Language) code in the formula bar. Advanced users can edit M directly.

Output Destination

Dataflow Gen2 allows setting an Output Destination for each query. Supported destinations:

  • Fabric Lakehouse (? Delta table - most common)
  • Fabric Data Warehouse
  • Azure SQL Database
  • Azure Data Explorer

Write modes:

ModeBehavior
ReplaceTruncates the destination table and loads fresh data
AppendAdds new rows to the existing table
When loading to a Lakehouse via Dataflow Gen2, the output table is automatically created as a managed Delta table in the Lakehouse Tables section and becomes queryable via the SQL Analytics Endpoint.

Data Pipelines Overview

Fabric Data Pipelines (based on Azure Data Factory) orchestrate data movement and transformation activities. Unlike Dataflow Gen2 (focus: transformation), pipelines focus on orchestration - running activities in sequence or parallel, handling branching logic, looping, error handling, and scheduling.

A pipeline can call:

  • Copy Data activity
  • Dataflow Gen2 activity
  • Notebook activity
  • Stored procedure activity
  • Other pipelines (pipeline invoke)
  • Azure Function, Web activities

Common Pipeline Activities

ActivityPurpose
Copy DataMove data from source to sink. Supports 100+ connectors. Can use Fast Copy for high-throughput loads into Fabric Lakehouse/Warehouse.
ForEachIterate over an array and run a child activity for each element (e.g., process each file in a folder)
If ConditionBranch logic - run different activities based on true/false expression
LookupRead a single value or rowset from a source to use as input in subsequent activities
Get MetadataRetrieve properties of a file or folder (size, modified time, list of files)
UntilRepeat activities until a condition becomes true (polling loop)
WaitPause pipeline execution for n seconds
Set VariableStore a value in a pipeline variable for later use
DataflowRun a Dataflow Gen2 activity as part of the pipeline
NotebookExecute a Fabric Spark Notebook

Fast Copy

Fast Copy is a performance feature of the Copy Data activity in Fabric pipelines that uses a high-throughput direct path to load data into Fabric Lakehouse or Warehouse, bypassing the standard row-by-row writer. It:

  • Leverages Fabric's native Parquet/Delta ingestion path for bulk loading
  • Automatically activates when source and destination meet eligibility criteria (source is file, destination is Lakehouse/Warehouse)
  • Can load hundreds of GB per hour
  • Staging files are written to OneLake and committed atomically
Fast Copy is automatically used by the Copy Data activity when Fabric detects it can use the optimized path. No special configuration is required - Fabric switches to Fast Copy based on the connector and destination type.

Scheduling and Triggers

Pipelines and Dataflows can be triggered:

  • Scheduled: Run at a specific time/interval (every 15 minutes, daily at 2AM, etc.)
  • Manual: Run on demand from the UI or via REST API
  • Event-based (Pipeline only): Trigger on file arrival in OneLake or Azure Blob using Storage Events trigger

Dataflow Gen2 refresh is scheduled directly in the Fabric portal on the Dataflow settings page. Pipelines have their own scheduling configured in the pipeline designer.

Dataflow Gen2 vs Pipeline - When to Use

AspectDataflow Gen2Data Pipeline
Skill levelNo-code / low-code (Power Query UI)Configuration-based (ADF-style UI + expressions)
TransformationRich Power Query transformsMinimal - orchestrates other activities
OrchestrationNo loop/branch logicFull ForEach, If, Until, parallel activities
Best forETL: connect, transform, load to destinationELT orchestration: ingest raw, trigger notebooks, sequence activities
Calling notebooksCannot directly call notebooksCan call notebooks, dataflows, stored procedures

Typical pattern: A Fabric pipeline orchestrates the end-to-end flow - Copy Data (ingest) -> Notebook activity (transform) -> Semantic Model refresh - while Dataflow Gen2 handles self-contained transformation tasks for business users.

← Take DP-600 Practice Tests  |  Back to Study Topics

Popular Posts

��