DP-600 - Dataflows Gen2 and Data Pipelines

Quick Navigation

Dataflow Gen2 Overview
Power Query Transformations
Output Destination
Data Pipelines Overview
Common Pipeline Activities
Fast Copy
Scheduling and Triggers
Dataflow vs Pipeline - When to Use

Dataflow Gen2 Overview

Dataflow Gen2 is a no-code/low-code data integration tool in Fabric built on Power Query. It allows you to connect to 150+ data sources, apply transformations using the Power Query UI, and write results to a destination.

Key improvements in Gen2 over Gen1:

Output Destination: Data can be written directly to a Lakehouse, Warehouse, Azure SQL, or other destination - not just the dataflow's internal storage
Staging: Uses Fabric Lakehouse as staging storage for intermediate results (faster, no Power BI dataset dependency)
Faster refresh: Can leverage Fabric capacity compute
Template saving: Dataflow queries can be saved as reusable templates

Power Query Transformations

Common transformations available in the Power Query Editor (Dataflow Gen2):

Transformation	Purpose
Filter Rows	Keep only rows that meet a condition
Remove Duplicates	Remove rows with duplicate values in selected columns
Fill Down / Up	Fill null values with previous/next non-null value
Replace Values	Find and replace specific values in a column
Split Column	Split by delimiter or position
Group By	Aggregate rows by key columns
Merge Queries	Join two queries (equivalent to SQL JOIN)
Append Queries	Union multiple queries (equivalent to SQL UNION ALL)
Pivot / Unpivot	Rotate columns to rows or rows to columns
Add Custom Column	Add a column using M formula
Change Type	Convert column to correct data type

All transformations generate M (Power Query Formula Language) code in the formula bar. Advanced users can edit M directly.

Output Destination

Dataflow Gen2 allows setting an Output Destination for each query. Supported destinations:

Fabric Lakehouse (? Delta table - most common)
Fabric Data Warehouse
Azure SQL Database
Azure Data Explorer

Write modes:

Mode	Behavior
Replace	Truncates the destination table and loads fresh data
Append	Adds new rows to the existing table

When loading to a Lakehouse via Dataflow Gen2, the output table is automatically created as a managed Delta table in the Lakehouse Tables section and becomes queryable via the SQL Analytics Endpoint.

Data Pipelines Overview

Fabric Data Pipelines (based on Azure Data Factory) orchestrate data movement and transformation activities. Unlike Dataflow Gen2 (focus: transformation), pipelines focus on orchestration - running activities in sequence or parallel, handling branching logic, looping, error handling, and scheduling.

A pipeline can call:

Copy Data activity
Dataflow Gen2 activity
Notebook activity
Stored procedure activity
Other pipelines (pipeline invoke)
Azure Function, Web activities

Common Pipeline Activities

Activity	Purpose
Copy Data	Move data from source to sink. Supports 100+ connectors. Can use Fast Copy for high-throughput loads into Fabric Lakehouse/Warehouse.
ForEach	Iterate over an array and run a child activity for each element (e.g., process each file in a folder)
If Condition	Branch logic - run different activities based on true/false expression
Lookup	Read a single value or rowset from a source to use as input in subsequent activities
Get Metadata	Retrieve properties of a file or folder (size, modified time, list of files)
Until	Repeat activities until a condition becomes true (polling loop)
Wait	Pause pipeline execution for n seconds
Set Variable	Store a value in a pipeline variable for later use
Dataflow	Run a Dataflow Gen2 activity as part of the pipeline
Notebook	Execute a Fabric Spark Notebook

Fast Copy

Fast Copy is a performance feature of the Copy Data activity in Fabric pipelines that uses a high-throughput direct path to load data into Fabric Lakehouse or Warehouse, bypassing the standard row-by-row writer. It:

Leverages Fabric's native Parquet/Delta ingestion path for bulk loading
Automatically activates when source and destination meet eligibility criteria (source is file, destination is Lakehouse/Warehouse)
Can load hundreds of GB per hour
Staging files are written to OneLake and committed atomically

Fast Copy is automatically used by the Copy Data activity when Fabric detects it can use the optimized path. No special configuration is required - Fabric switches to Fast Copy based on the connector and destination type.

Scheduling and Triggers

Pipelines and Dataflows can be triggered:

Scheduled: Run at a specific time/interval (every 15 minutes, daily at 2AM, etc.)
Manual: Run on demand from the UI or via REST API
Event-based (Pipeline only): Trigger on file arrival in OneLake or Azure Blob using Storage Events trigger

Dataflow Gen2 refresh is scheduled directly in the Fabric portal on the Dataflow settings page. Pipelines have their own scheduling configured in the pipeline designer.

Dataflow Gen2 vs Pipeline - When to Use

Aspect	Dataflow Gen2	Data Pipeline
Skill level	No-code / low-code (Power Query UI)	Configuration-based (ADF-style UI + expressions)
Transformation	Rich Power Query transforms	Minimal - orchestrates other activities
Orchestration	No loop/branch logic	Full ForEach, If, Until, parallel activities
Best for	ETL: connect, transform, load to destination	ELT orchestration: ingest raw, trigger notebooks, sequence activities
Calling notebooks	Cannot directly call notebooks	Can call notebooks, dataflows, stored procedures

Typical pattern: A Fabric pipeline orchestrates the end-to-end flow - Copy Data (ingest) -> Notebook activity (transform) -> Semantic Model refresh - while Dataflow Gen2 handles self-contained transformation tasks for business users.

← Take DP-600 Practice Tests | Back to Study Topics

ï¿½ï¿½

Search Tutorials