Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compositions

Compositions are reusable pipeline fragments that can be imported into multiple pipelines. They encapsulate common transform patterns – date derivations, address normalization, currency conversion – into self-contained, testable units.

Using a composition

A composition node in your pipeline references an external .comp.yaml file:

- type: composition
  name: fiscal_dates
  input: invoices
  use: "./compositions/fiscal_date.comp.yaml"
  config:
    start_month: 4

The use: field points to the composition definition file. The config: block passes parameters that customize the composition’s behavior for this specific invocation.

Composition definition file

A .comp.yaml file declares the composition’s interface – what fields it requires from upstream and what fields it produces:

# compositions/fiscal_date.comp.yaml
composition:
  name: fiscal_date
  description: "Derive fiscal year, quarter, and period from a date field"

  requires:
    - { name: invoice_date, type: date }

  produces:
    - { name: fiscal_year, type: int }
    - { name: fiscal_quarter, type: string }
    - { name: fiscal_period, type: int }

  params:
    - name: start_month
      type: int
      default: 1
      description: "First month of the fiscal year (1-12)"

Composition fields

FieldRequiredDescription
nameYesComposition identifier
descriptionNoHuman-readable purpose
requiresYesInput fields the composition needs from upstream (name + type)
producesYesOutput fields the composition adds to the record (name + type)
paramsNoConfigurable parameters with optional defaults

Advanced wiring

For compositions with multiple input or output ports, the node supports explicit port bindings:

- type: composition
  name: enrich_address
  input: customers
  use: "./compositions/address_normalize.comp.yaml"
  inputs:
    primary: customers
    reference: zip_lookup
  outputs:
    normalized: next_stage
  config:
    country_code: "US"
  resources:
    zip_database: "./data/zipcodes.csv"

Port and resource fields

FieldRequiredDescription
inputsNoMap of composition input ports to upstream node references
outputsNoMap of composition output ports to downstream node references
configNoParameter overrides (key-value pairs)
resourcesNoExternal resource bindings (file paths, connection strings)
aliasNoNamespace prefix for expanded node names (avoids collisions)

Complete example

pipeline:
  name: invoice_pipeline

nodes:
  - type: source
    name: invoices
    config:
      name: invoices
      type: csv
      path: "./data/invoices.csv"
      schema:
        - { name: invoice_id, type: int }
        - { name: customer_id, type: int }
        - { name: invoice_date, type: date }
        - { name: amount, type: float }

  - type: composition
    name: fiscal_dates
    input: invoices
    use: "./compositions/fiscal_date.comp.yaml"
    config:
      start_month: 4

  - type: transform
    name: final_enrich
    input: fiscal_dates
    config:
      cxl: |
        emit invoice_id = invoice_id
        emit customer_id = customer_id
        emit amount = amount
        emit fiscal_year = fiscal_year
        emit fiscal_quarter = fiscal_quarter

  - type: output
    name: result
    input: final_enrich
    config:
      name: result
      type: csv
      path: "./output/invoices_enriched.csv"

Current status

Note: Composition support is being built in Phase 16c. The YAML shape parses and validates, but compilation currently returns a diagnostic (E100) per composition node. The documentation above reflects the intended design. Full compilation and expansion will land when Phase 16c is complete.