DP-600 - Dataflows Gen2 and Data Pipelines
Quick Navigation
Dataflow Gen2 Overview
Dataflow Gen2 is a no-code/low-code data integration tool in Fabric built on Power Query. It allows you to connect to 150+ data sources, apply transformations using the Power Query UI, and write results to a destination.
Key improvements in Gen2 over Gen1:
- Output Destination: Data can be written directly to a Lakehouse, Warehouse, Azure SQL, or other destination - not just the dataflow's internal storage
- Staging: Uses Fabric Lakehouse as staging storage for intermediate results (faster, no Power BI dataset dependency)
- Faster refresh: Can leverage Fabric capacity compute
- Template saving: Dataflow queries can be saved as reusable templates
Power Query Transformations
Common transformations available in the Power Query Editor (Dataflow Gen2):
| Transformation | Purpose |
|---|---|
| Filter Rows | Keep only rows that meet a condition |
| Remove Duplicates | Remove rows with duplicate values in selected columns |
| Fill Down / Up | Fill null values with previous/next non-null value |
| Replace Values | Find and replace specific values in a column |
| Split Column | Split by delimiter or position |
| Group By | Aggregate rows by key columns |
| Merge Queries | Join two queries (equivalent to SQL JOIN) |
| Append Queries | Union multiple queries (equivalent to SQL UNION ALL) |
| Pivot / Unpivot | Rotate columns to rows or rows to columns |
| Add Custom Column | Add a column using M formula |
| Change Type | Convert column to correct data type |
All transformations generate M (Power Query Formula Language) code in the formula bar. Advanced users can edit M directly.
Output Destination
Dataflow Gen2 allows setting an Output Destination for each query. Supported destinations:
- Fabric Lakehouse (? Delta table - most common)
- Fabric Data Warehouse
- Azure SQL Database
- Azure Data Explorer
Write modes:
| Mode | Behavior |
|---|---|
| Replace | Truncates the destination table and loads fresh data |
| Append | Adds new rows to the existing table |
Data Pipelines Overview
Fabric Data Pipelines (based on Azure Data Factory) orchestrate data movement and transformation activities. Unlike Dataflow Gen2 (focus: transformation), pipelines focus on orchestration - running activities in sequence or parallel, handling branching logic, looping, error handling, and scheduling.
A pipeline can call:
- Copy Data activity
- Dataflow Gen2 activity
- Notebook activity
- Stored procedure activity
- Other pipelines (pipeline invoke)
- Azure Function, Web activities
Common Pipeline Activities
| Activity | Purpose |
|---|---|
| Copy Data | Move data from source to sink. Supports 100+ connectors. Can use Fast Copy for high-throughput loads into Fabric Lakehouse/Warehouse. |
| ForEach | Iterate over an array and run a child activity for each element (e.g., process each file in a folder) |
| If Condition | Branch logic - run different activities based on true/false expression |
| Lookup | Read a single value or rowset from a source to use as input in subsequent activities |
| Get Metadata | Retrieve properties of a file or folder (size, modified time, list of files) |
| Until | Repeat activities until a condition becomes true (polling loop) |
| Wait | Pause pipeline execution for n seconds |
| Set Variable | Store a value in a pipeline variable for later use |
| Dataflow | Run a Dataflow Gen2 activity as part of the pipeline |
| Notebook | Execute a Fabric Spark Notebook |
Fast Copy
Fast Copy is a performance feature of the Copy Data activity in Fabric pipelines that uses a high-throughput direct path to load data into Fabric Lakehouse or Warehouse, bypassing the standard row-by-row writer. It:
- Leverages Fabric's native Parquet/Delta ingestion path for bulk loading
- Automatically activates when source and destination meet eligibility criteria (source is file, destination is Lakehouse/Warehouse)
- Can load hundreds of GB per hour
- Staging files are written to OneLake and committed atomically
Scheduling and Triggers
Pipelines and Dataflows can be triggered:
- Scheduled: Run at a specific time/interval (every 15 minutes, daily at 2AM, etc.)
- Manual: Run on demand from the UI or via REST API
- Event-based (Pipeline only): Trigger on file arrival in OneLake or Azure Blob using Storage Events trigger
Dataflow Gen2 refresh is scheduled directly in the Fabric portal on the Dataflow settings page. Pipelines have their own scheduling configured in the pipeline designer.
Dataflow Gen2 vs Pipeline - When to Use
| Aspect | Dataflow Gen2 | Data Pipeline |
|---|---|---|
| Skill level | No-code / low-code (Power Query UI) | Configuration-based (ADF-style UI + expressions) |
| Transformation | Rich Power Query transforms | Minimal - orchestrates other activities |
| Orchestration | No loop/branch logic | Full ForEach, If, Until, parallel activities |
| Best for | ETL: connect, transform, load to destination | ELT orchestration: ingest raw, trigger notebooks, sequence activities |
| Calling notebooks | Cannot directly call notebooks | Can call notebooks, dataflows, stored procedures |
Typical pattern: A Fabric pipeline orchestrates the end-to-end flow - Copy Data (ingest) -> Notebook activity (transform) -> Semantic Model refresh - while Dataflow Gen2 handles self-contained transformation tasks for business users.
← Take DP-600 Practice Tests | Back to Study Topics