X12 Format
Clinker reads and writes ANSI ASC X12 interchanges alongside CSV, JSON,
XML, fixed-width, and EDIFACT. An X12 interchange is a finite file with a
three-tier envelope: an ISA..IEA interchange wraps one or more GS..GE
functional groups, and each functional group wraps one or more ST..SE
transaction sets. The reader streams one segment at a time and the writer
reconstructs the three envelope tiers around emitted records.
The three tiers surface as nested document-context levels: the ISA
interchange becomes the file-level $doc document, and each GS group and
ST set opens a nested level whose $doc sections layer over the
enclosing tiers. A body record therefore sees every enclosing tier’s
fields through one $doc.<section>.<field> lookup.
Delimiters and the ISA header
Unlike EDIFACT’s optional UNA service-string advice, X12 declares its
delimiters in a fixed-length 106-byte ISA header. Three delimiter bytes
live at structural positions within it:
| Role | Source in the ISA |
|---|---|
| Element (data) separator | The byte immediately after the ISA tag |
| Sub-element (component) sep. | ISA16, the last single-byte ISA element |
| Segment terminator | The byte immediately after ISA16 |
The reader reads these three bytes from the header rather than assuming a
fixed delimiter set, so an interchange that uses */:/~,
|/^/newline, or any other producer-chosen delimiters parses correctly.
The ISA13 interchange control number is located as the 13th element of
the header split on the discovered element separator — structurally, not by
an absolute byte offset — so producer padding quirks do not misalign it.
No escape character
X12 has no release/escape character (EDIFACT’s ? has no X12
equivalent). A data value that contains a delimiter byte is therefore
unrepresentable. On output the writer rejects any element value carrying
the element separator or the segment terminator with a precise error rather
than silently corrupting the interchange; re-encode the value or choose
delimiters the data does not contain.
The sub-element (component) separator inside an element (e.g. the : in a
composite A:B:C) is kept as part of the element’s text and is not split —
the positional element model works above component resolution, so a
composite element round-trips unchanged.
Newlines between segments
Some producers insert CR/LF after each segment terminator for readability. Those bytes are insignificant and are stripped between segments; CR/LF that appears inside an element is preserved.
Record shape
Each non-service segment becomes one record under a fixed positional schema:
| Column | Meaning |
|---|---|
seg_id | The segment tag (BEG, PO1, …) |
set_ref | The enclosing transaction set control number (ST02) |
set_type | The transaction set identifier code (ST01, e.g. 850) |
e01, e02, … | The segment’s positional data elements |
Service segments (ISA, IEA, GS, GE, SE) are consumed by the
reader to drive the envelope and validation — they are never emitted as
body records. The ST segment that opens a transaction set is emitted
as a body record (its seg_id is ST), carrying the set reference and
type.
The number of eNN columns is controlled by the source max_elements
option (default 32). A segment carrying more data elements than that is
rejected with guidance rather than silently truncated. Absent trailing
elements read as null.
nodes:
- type: source
name: orders
config:
name: orders
type: x12
glob: ./inbox/*.x12
options:
max_elements: 48 # widen the positional schema for exotic segments
schema:
- { name: seg_id, type: string }
- { name: set_ref, type: string }
- { name: e01, type: string }
Envelope sections over the three tiers
The interchange header ISA is extractable as a file-level document
envelope section, exposing its positional elements to CXL as
$doc.<section>.<field>. Use the segment extract rule with the section
field names matching the positional keys e01, e02, …:
envelope:
sections:
interchange:
extract: { segment: "ISA" }
fields:
e13: string # interchange control number (ISA13)
The GS functional group and the ST transaction set surface
automatically as the nested $doc sections functional_group and
transaction_set, each keyed by positional eNN elements — no envelope
declaration is needed for them. A Transform on any body record can read
all three tiers at once:
emit isa13 = $doc.interchange.e13 # interchange control number
emit gs06 = $doc.functional_group.e06 # group control number (GS06)
emit st02 = $doc.transaction_set.e02 # set control number (ST02)
Only the ISA header is extractable as a declared envelope section.
Trailer segments (SE, GE, IEA) arrive after the body they close and
cannot become $doc fields without buffering the whole interchange — their
control counts are instead validated inline by the reader (see below). A
segment extract naming any tag other than ISA, or an xml_path /
json_pointer extract against an X12 source, is rejected at startup.
Control-count validation
The reader validates the structural integrity claims carried in the trailers as they arrive, failing the run on a mismatch (a truncation or corruption signal):
SEsegment count (SE01) — must equal the number of segments in the transaction set, counting theSTandSEthemselves.SEset control number (SE02) — must echo the openingST02.GEtransaction-set count (GE01) — must equal the number ofSTsets in the functional group.GEgroup control number (GE02) — must echo theGS06.IEAfunctional-group count (IEA01) — must equal the number ofGSgroups in the interchange.IEAcontrol number (IEA02) — must echo theISA13.
A missing IEA at end of input is a truncation error; content after the
IEA trailer is rejected.
Writing X12
An X12 Output node reconstructs the three-tier envelope around emitted
records. Records map by the same positional columns (seg_id, set_ref,
set_type, eNN); trailing null/empty elements are trimmed so no
fabricated delimiters appear, and a column the writer does not recognize is
an error (project the record to the X12 columns first). Engine-internal
$-namespaced columns are excluded automatically.
nodes:
- type: output
name: out
input: messages
config:
name: out
type: x12
path: ./out/result.x12
options:
interchange:
["00", " ", "00", " ", "ZZ", "SENDER ",
"ZZ", "RECEIVER ", "240101", "1200", "U", "00401",
"000000001", "0", "P", ":"]
group_header: ["PO", "SENDER", "RECEIVER", "20240101", "1200", "1", "X", "004010"]
set_type: "850"
segment_newline: true
Output options:
| Option | Meaning |
|---|---|
interchange | Literal ISA data elements (the 16 fixed-width ISA fields). |
interchange_from_doc | Name of a $doc section to echo the ISA elements from (round-trip). |
group_header | Literal GS01..GS08 elements (GS06 control number recomputed). |
set_type | Fallback ST01 set type when a record carries no set_type value. |
segment_newline | Write a newline after each segment terminator (default true). |
Consecutive records are grouped into ST..SE transaction sets on set_ref
transitions, and all sets are wrapped in a single GS..GE functional
group. The writer recomputes the SE segment count, the GE
transaction-set count, and the IEA functional-group count, and echoes the
set, group, and interchange control numbers, so the output passes its own
count validation on re-read.
interchange_from_doc echoes the header from a record’s document context.
That context is populated by a source’s ISA envelope section (declare a
segment: "ISA" envelope section on the source) and travels with every
body record through the pipeline — including to a sink that sits directly
downstream of the source with no intervening Transform. The reader stashes
the complete, ordered ISA element list, so the reconstructed header is
faithful. Supply interchange literal elements instead when the records
have no source ISA section to echo.
Limitations
- Charset. Element text is decoded as UTF-8. Non-UTF-8 interchanges are rejected explicitly rather than silently corrupted.
- No escape character. X12 has no release mechanism, so a data value that contains a delimiter byte is rejected on output rather than corrupting the interchange.
- One functional group on output. The writer wraps all transaction sets
in a single
GS..GEfunctional group; the reader handles any number of groups on input. A multi-group output shape requires multiple runs. - Output splitting. An interchange is a single
ISA..IEAenvelope and cannot be divided across files. Anx12output combined with asplit:block is rejected at config-validation time (diagnosticE338) rather than emitting a structurally corrupt interchange.