Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

X12 Format

Clinker reads and writes ANSI ASC X12 interchanges alongside CSV, JSON, XML, fixed-width, and EDIFACT. An X12 interchange is a finite file with a three-tier envelope: an ISA..IEA interchange wraps one or more GS..GE functional groups, and each functional group wraps one or more ST..SE transaction sets. The reader streams one segment at a time and the writer reconstructs the three envelope tiers around emitted records.

The three tiers surface as nested document-context levels: the ISA interchange becomes the file-level $doc document, and each GS group and ST set opens a nested level whose $doc sections layer over the enclosing tiers. A body record therefore sees every enclosing tier’s fields through one $doc.<section>.<field> lookup.

Delimiters and the ISA header

Unlike EDIFACT’s optional UNA service-string advice, X12 declares its delimiters in a fixed-length 106-byte ISA header. Three delimiter bytes live at structural positions within it:

RoleSource in the ISA
Element (data) separatorThe byte immediately after the ISA tag
Sub-element (component) sep.ISA16, the last single-byte ISA element
Segment terminatorThe byte immediately after ISA16

The reader reads these three bytes from the header rather than assuming a fixed delimiter set, so an interchange that uses */:/~, |/^/newline, or any other producer-chosen delimiters parses correctly. The ISA13 interchange control number is located as the 13th element of the header split on the discovered element separator — structurally, not by an absolute byte offset — so producer padding quirks do not misalign it.

No escape character

X12 has no release/escape character (EDIFACT’s ? has no X12 equivalent). A data value that contains a delimiter byte is therefore unrepresentable. On output the writer rejects any element value carrying the element separator or the segment terminator with a precise error rather than silently corrupting the interchange; re-encode the value or choose delimiters the data does not contain.

The sub-element (component) separator inside an element (e.g. the : in a composite A:B:C) is kept as part of the element’s text and is not split — the positional element model works above component resolution, so a composite element round-trips unchanged.

Newlines between segments

Some producers insert CR/LF after each segment terminator for readability. Those bytes are insignificant and are stripped between segments; CR/LF that appears inside an element is preserved.

Record shape

Each non-service segment becomes one record under a fixed positional schema:

ColumnMeaning
seg_idThe segment tag (BEG, PO1, …)
set_refThe enclosing transaction set control number (ST02)
set_typeThe transaction set identifier code (ST01, e.g. 850)
e01, e02, …The segment’s positional data elements

Service segments (ISA, IEA, GS, GE, SE) are consumed by the reader to drive the envelope and validation — they are never emitted as body records. The ST segment that opens a transaction set is emitted as a body record (its seg_id is ST), carrying the set reference and type.

The number of eNN columns is controlled by the source max_elements option (default 32). A segment carrying more data elements than that is rejected with guidance rather than silently truncated. Absent trailing elements read as null.

nodes:
  - type: source
    name: orders
    config:
      name: orders
      type: x12
      glob: ./inbox/*.x12
      options:
        max_elements: 48      # widen the positional schema for exotic segments
      schema:
        - { name: seg_id, type: string }
        - { name: set_ref, type: string }
        - { name: e01, type: string }

Envelope sections over the three tiers

The interchange header ISA is extractable as a file-level document envelope section, exposing its positional elements to CXL as $doc.<section>.<field>. Use the segment extract rule with the section field names matching the positional keys e01, e02, …:

envelope:
  sections:
    interchange:
      extract: { segment: "ISA" }
      fields:
        e13: string          # interchange control number (ISA13)

The GS functional group and the ST transaction set surface automatically as the nested $doc sections functional_group and transaction_set, each keyed by positional eNN elements — no envelope declaration is needed for them. A Transform on any body record can read all three tiers at once:

emit isa13 = $doc.interchange.e13       # interchange control number
emit gs06  = $doc.functional_group.e06  # group control number (GS06)
emit st02  = $doc.transaction_set.e02   # set control number (ST02)

Only the ISA header is extractable as a declared envelope section. Trailer segments (SE, GE, IEA) arrive after the body they close and cannot become $doc fields without buffering the whole interchange — their control counts are instead validated inline by the reader (see below). A segment extract naming any tag other than ISA, or an xml_path / json_pointer extract against an X12 source, is rejected at startup.

Control-count validation

The reader validates the structural integrity claims carried in the trailers as they arrive, failing the run on a mismatch (a truncation or corruption signal):

  • SE segment count (SE01) — must equal the number of segments in the transaction set, counting the ST and SE themselves.
  • SE set control number (SE02) — must echo the opening ST02.
  • GE transaction-set count (GE01) — must equal the number of ST sets in the functional group.
  • GE group control number (GE02) — must echo the GS06.
  • IEA functional-group count (IEA01) — must equal the number of GS groups in the interchange.
  • IEA control number (IEA02) — must echo the ISA13.

A missing IEA at end of input is a truncation error; content after the IEA trailer is rejected.

Writing X12

An X12 Output node reconstructs the three-tier envelope around emitted records. Records map by the same positional columns (seg_id, set_ref, set_type, eNN); trailing null/empty elements are trimmed so no fabricated delimiters appear, and a column the writer does not recognize is an error (project the record to the X12 columns first). Engine-internal $-namespaced columns are excluded automatically.

nodes:
  - type: output
    name: out
    input: messages
    config:
      name: out
      type: x12
      path: ./out/result.x12
      options:
        interchange:
          ["00", "          ", "00", "          ", "ZZ", "SENDER         ",
           "ZZ", "RECEIVER       ", "240101", "1200", "U", "00401",
           "000000001", "0", "P", ":"]
        group_header: ["PO", "SENDER", "RECEIVER", "20240101", "1200", "1", "X", "004010"]
        set_type: "850"
        segment_newline: true

Output options:

OptionMeaning
interchangeLiteral ISA data elements (the 16 fixed-width ISA fields).
interchange_from_docName of a $doc section to echo the ISA elements from (round-trip).
group_headerLiteral GS01..GS08 elements (GS06 control number recomputed).
set_typeFallback ST01 set type when a record carries no set_type value.
segment_newlineWrite a newline after each segment terminator (default true).

Consecutive records are grouped into ST..SE transaction sets on set_ref transitions, and all sets are wrapped in a single GS..GE functional group. The writer recomputes the SE segment count, the GE transaction-set count, and the IEA functional-group count, and echoes the set, group, and interchange control numbers, so the output passes its own count validation on re-read.

interchange_from_doc echoes the header from a record’s document context. That context is populated by a source’s ISA envelope section (declare a segment: "ISA" envelope section on the source) and travels with every body record through the pipeline — including to a sink that sits directly downstream of the source with no intervening Transform. The reader stashes the complete, ordered ISA element list, so the reconstructed header is faithful. Supply interchange literal elements instead when the records have no source ISA section to echo.

Limitations

  • Charset. Element text is decoded as UTF-8. Non-UTF-8 interchanges are rejected explicitly rather than silently corrupted.
  • No escape character. X12 has no release mechanism, so a data value that contains a delimiter byte is rejected on output rather than corrupting the interchange.
  • One functional group on output. The writer wraps all transaction sets in a single GS..GE functional group; the reader handles any number of groups on input. A multi-group output shape requires multiple runs.
  • Output splitting. An interchange is a single ISA..IEA envelope and cannot be divided across files. An x12 output combined with a split: block is rejected at config-validation time (diagnostic E338) rather than emitting a structurally corrupt interchange.