XProc 3.1: Invisible XML

Community Group Report

Latest editor’s draft:
https://spec.xproc.org/master/head/ixml/
Editor:
Norman Walsh
Participate:
GitHub xproc/3.0-steps
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML and HTML with automatic change markup courtesy of DeltaXML.


Abstract

This specification describes the p:ixml step for XProc 3.1: An XML Pipeline Language.

Status of this Document

This specification was published by the XProc Next Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

If you wish to make comments regarding this document, please send them to xproc-dev@w3.org. (subscribe, archives).

Note

This draft is a “last call” draft. This version is stable and will not be updated.

1. Introduction

This specification describes the p:ixml XProc step. A machine-readable description of this step may be found in steps.xpl.

Familarity with the general nature of [XProc 3.1] steps is assumed.

2. Step library

2.1. p:ixml

The p:ixml step performs Invisible XML processing per [Invisible XML]. It transforms a non-XML input into XML by applying the specified Invisible XML grammar.

Summary

Input portPrimarySequenceContent types
grammar ✔ text xml 
source✔  any -xml -html 
Output portPrimarySequenceContent types
result✔ ✔ any 
Option nameTypeDefault value
fail-on-errorxs:booleantrue()
parametersmap(xs:QName, item()*)?()
Errors
Error codeDescription
err:XC0205It is a dynamic error if the source document cannot be parsed by the provided grammar.
err:XC0211It is a dynamic error if more than one document appears on the grammar port.
err:XC0212It is a dynamic error if the grammar provided is not a valid Invisible XML grammar.
Implementation details
ImplementationDescription
DefinedIt is implementation-defined if other result formats are possible.
DefinedThe parameters are implementation-defined.
Declaration

<p:declare-step type="p:ixml">
     <p:input port="grammar" sequence="true" content-types="text xml"/>
     <p:input port="source" primary="true" content-types="any -xml -html"/>
     <p:output port="result" sequence="true" content-types="any"/>
     <p:option name="parameters" as="map(xs:QName, item()*)?"/>    
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

If no grammar is provided on the grammar port, the grammar for Invisible XML is assumed. If an XML or text grammar is provided it must be an Invisible XML grammar. It is a dynamic error (err:XC0212) if the grammar provided is not a valid Invisible XML grammar. It is a dynamic error (err:XC0211) if more than one document appears on the grammar port.

The source to be processed is usually text, but there’s nothing in principle that prevents an Invisible XML grammar from applying to an arbitrary sequence of characters.

The result should be XML. It is implementation-defined if other result formats are possible. (An implementation might, for example, provide a way for the p:ixml step to compile an Invisible XML grammar into some format that can be processed more efficiently.)

  • The parameters are implementation-defined. An implementation might provide parameters to select among different ambiguous parses or choose alternate representations.

  • If fail-on-error is true, the step will raise an error if the input cannot be parsed by the grammar. It is a dynamic error (err:XC0205) if the source document cannot be parsed by the provided grammar. If fail-on-error is false, no error will be raised.

    The Invisible XML specification provides a mechanism to identify failed parses in the output.

2.1.1. Examples

Several examples demonstrate features of the step.

2.1.1.1. Parsing an Invisible XML grammar

In this first example, no grammar is provided, so the pipeline parses the Invisible XML grammar on the source port and returns its XML representation:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.1">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="grammar"><p:empty /></p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This would produce an XML version of the grammar:

<ixml>
   <rule name="date">
      <alt>
         <option>
            <nonterminal name="s"/>
         </option>
         <nonterminal name="day"/>
         <nonterminal name="s"/>
         <nonterminal name="month"/>
         <option>
            <alts>
               <alt>
                  <nonterminal name="s"/>
                  <nonterminal name="year"/>
               </alt>
            </alts>
         </option>
      </alt>
   </rule>
   <!-- … remaining rules elided for brevity … -->
</ixml>
2.1.1.2. Parsing a date

If the grammar is provided on the grammar port, it can be used to parse input, the string “31 December 2021” in this case:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.1">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="grammar">
    <p:inline content-type="text/plain">
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">31 December 2021</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This would produce an XML version of the date:

<date><day>31</day><month>December</month><year>2021</year></date>
2.1.1.3. Failed parses

If a parse fails, the implementation must indicate this, but it may also provide information about where the processing failed.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.1">
<p:output port="result"/>

<p:ixml fail-on-error="false">
  <p:with-input port="grammar">
    <p:inline content-type="text/plain">
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">31 Mumble 2021</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

Here the output might be something like this:

<error xmlns:ixml="http://invisiblexml.org/NS"
       xmlns:ex="http://example.com/NS"
       ixml:state="failed" ex:lastChar="4">
<parse>
month ->  •  M  a  r  c  h
month ->  M  •  a  r  c  h
</parse>
<parse>
month ->  •  M  a  y
month ->  M  •  a  y
</parse>
</error>

In the case of failure, Invisible XML requires that the ixml:state attribute appear on the root element containing the token “failed”. It doesn’t constrain the implementation’s choice of the root element or the content of the document.

2.1.1.4. Ambiguous parses

An ixml grammar may be ambiguous. In the grammar below, there are three different possible ways to parse the input. By default, one of them is returned.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.1">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="grammar">
    <p:inline content-type="text/plain">
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return any one of these parses:

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><C><digits>123</digits></C></letters>

or

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><A><digits>123</digits></A></letters>

or

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><B><digits>123</digits></B></letters>

All are equally correct.

2.1.1.5. Ambiguous parse selection

An implementation might provide a parameter to allow the author to select a particular parse:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:ex="http://example.com/"
                version="3.1">
<p:output port="result"/>

<p:ixml parameters="map{'ex:select':2}">
  <p:with-input port="grammar">
    <p:inline content-type="text/plain">
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return:

<letters ixml:state="ambiguous"><X>a</X><A><digits>123</digits></A></letters>
2.1.1.6. Multiple ambiguous outputs

Or a processor might provide a parameter to return all of the parses.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:ex="http://example.com/"
                version="3.1">
<p:output port="result"/>

<p:ixml parameters="map{'ex:select':'all'}">
  <p:with-input port="grammar">
    <p:inline content-type="text/plain">
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline content-type="text/plain">a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return three documents:

<letters ixml:state="ambiguous"><X>a</X><C><digits>123</digits></C></letters>
<letters ixml:state="ambiguous"><X>a</X><B><digits>123</digits></B></letters>
<letters ixml:state="ambiguous"><X>a</X><A><digits>123</digits></A></letters>

As before, there is nothing standardized about the results in this case.

Document properties

No document properties are preserved.

3. Step Errors

This step can raise dynamic errors.

[Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space). For a more complete discussion of dynamic errors, see Dynamic Errors in XProc 3.0: An XML Pipeline Language.

If a step fails due to a dynamic error, failure propagates upwards until either a p:try is encountered or the entire pipeline fails. In other words, outside of a p:try, step failure causes the entire pipeline to fail.

The following errors can be raised by this step:

err:XC0205

It is a dynamic error if the source document cannot be parsed by the provided grammar.

See: p:ixml

err:XC0211

It is a dynamic error if more than one document appears on the grammar port.

See: p:ixml

err:XC0212

It is a dynamic error if the grammar provided is not a valid Invisible XML grammar.

See: p:ixml

A. Conformance

Conformant processors must implement all of the features described in this specification except those that are explicitly identified as optional.

Some aspects of processor behavior are not completely specified; those features are either implementation-dependent or implementation-defined.

[Definition: An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.]

[Definition: An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.]

A.1. Implementation-defined features

The following features are implementation-defined:

  1. It is implementation-defined if other result formats are possible. See Section 2.1, “p:ixml”.
  2. The parameters are implementation-defined. See Section 2.1, “p:ixml”.

A.2. Implementation-dependent features

This step has no implementation-dependent features.

    B. References

    [XProc 3.1] XProc 3.1: An XML Pipeline Language. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [Invisible XML] Invisible XML Specification, version 1.0. Steven Pemberton, editor. Version 2022-06-20.

    C. Glossary

    dynamic error

    A dynamic error is one which occurs while a pipeline is being evaluated.

    implementation-defined

    An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.

    implementation-dependent

    An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.

    D. Ancillary files

    This specification includes by reference a number of ancillary files.

    steps.xpl

    An XProc step library for the declared steps.