XProc 3.0: Invisible XML

Draft Community Group Report

Editor's Draft at (build 10)
Latest editor’s draft:
https://spec.xproc.org/master/head/ixml/
Editor:
Norman Walsh
Participate:
GitHub xproc/3.0-steps
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML and HTML with automatic change markup courtesy of DeltaXML.


Abstract

This specification describes the p:ixml step for XProc 3.0: An XML Pipeline Language.

Status of this Document

This document is an editor's draft that has no official standing.

This specification was published by the XProc Next Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

If you wish to make comments regarding this document, please send them to xproc-dev@w3.org. (subscribe, archives).

1. Introduction

This specification describes the p:ixml XProc step. A machine-readable description of this step may be found in steps.xpl.

Familarity with the general nature of [XProc 3.0] steps is assumed; for background details, see [XProc 3.0 Steps].

2. p:ixml

The p:ixml step performs Invisible XML processing per [Invisible XML]. It transforms a non-XML input into XML by applying the specified Invisible XML grammar.

<p:declare-step type="p:ixml">
     <p:input port="grammar" sequence="true" content-types="any"/>
     <p:input port="source" primary="true" content-types="any -xml -html"/>
     <p:output port="result" content-types="any"/>
     <p:option name="parameters" as="map(xs:QName, item()*)?"/>    
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

If no grammar is provided on the grammar port, the grammar for Invisible XML is assumed. If an XML or text grammar is provided it should be an Invisible XML grammar. If any other grammar format is provided, its interpretation is implementation-defined.

The source to be processed is usually text, but there’s nothing in principle that prevents an Invisible XML grammar from applying to an arbitrary sequence of characters.

The result should be XML. It is implementation-defined if other result formats are possible. (An implementation might, for example, provide a way for the p:ixml step to compile an Invisible XML grammar into some format that can be processed more efficiently.)

  • The parameters are implementation-defined. An implementation might provide parameters to select among different ambiguous parses or choose alternate representations.

  • If fail-on-error is true, the step will raise an error if the input cannot be parsed by the grammar. It is a dynamic error (err:XC0205) if the source document cannot be parsed by the provided grammar. If fail-on-error is false, no error will be raised.

    The Invisible XML specification provides a mechanism to identify failed parses in the output.

2.1. Example

The following pipeline parses an Invisible XML grammar and returns its XML representation:

Example 1. Parsing an ixml grammar
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.0">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="source">
    <p:inline>
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This would produce an XML version of the grammar:

<ixml>
   <rule name="date">
      <alt>
         <option>
            <nonterminal name="s"/>
         </option>
         <nonterminal name="day"/>
         <nonterminal name="s"/>
         <nonterminal name="month"/>
         <option>
            <alts>
               <alt>
                  <nonterminal name="s"/>
                  <nonterminal name="year"/>
               </alt>
            </alts>
         </option>
      </alt>
   </rule>
   <!-- … remaining rules elided for brevity … -->
</ixml>

Providing the “date” grammar allows the step to parse dates:

Example 2. Parsing a date with ixml
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.0">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="grammar">
    <p:inline>
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline>31 December 2021</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This would produce an XML version of the date:

<date><day>31</day><month>December</month><year>2021</year></date>

If a parse fails, the implementation must indicate this, but it may also provide information about where the processing failed.

Example 3. Failing to parse a date with ixml
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.0">
<p:output port="result"/>

<p:ixml fail-on-error="false">
  <p:with-input port="grammar">
    <p:inline>
date: s?, day, s, month, (s, year)? .
-s: -" "+ .
day: digit, digit? .
-digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9".
month: "January"; "February"; "March"; "April";
       "May"; "June"; "July"; "August";
       "September"; "October"; "November"; "December".
year: (digit, digit)?, digit, digit .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline>31 Mumble 2021</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

Here the output might be something like this:

<ixml xmlns:ixml="http://invisiblexml.org/NS"
      xmlns:ex="http://example.com/NS"
      ixml:state="failed" ex:lastChar="4">
<parse>
month ->  •  M  a  r  c  h
month ->  M  •  a  r  c  h
</parse>
<parse>
month ->  •  M  a  y
month ->  M  •  a  y
</parse>
</ixml>

There is nothing standard about this markup except the ixml:state attribute with the value “failed”.

An ixml grammar may be ambiguous. In the grammar below, there are three different possible ways to parse the input. By default, one of them is returned.

Example 4. Parsing an ambiguous grammar
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                version="3.0">
<p:output port="result"/>

<p:ixml>
  <p:with-input port="grammar">
    <p:inline>
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline>a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return any one of these parses:

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><C><digits>123</digits></C></letters>

or

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><A><digits>123</digits></A></letters>

or

<letters ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"><X>a</X><B><digits>123</digits></B></letters>

All are equally correct.

An implementation might provide a parameter to allow the author to select a particular parse:

Example 5. Selecting a particular parse
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:ex="http://example.com/"
                version="3.0">
<p:output port="result"/>

<p:ixml parameters="map{'ex:select':2}">
  <p:with-input port="grammar">
    <p:inline>
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline>a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return:

<letters><X>a</X><A><digits>123</digits></A></letters>

Or a processor might provide a parameter to return all of the parses.

Example 6. Selecting a particular parse
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:ex="http://example.com/"
                version="3.0">
<p:output port="result"/>

<p:ixml parameters="map{'ex:select':'all'}">
  <p:with-input port="grammar">
    <p:inline>
letters: X, (A; B; C) .
A: digits .
B: digits .
C: digits .
X: "a" .
digits: ["0"-"9"]+ .
    </p:inline>
  </p:with-input>
  <p:with-input port="source">
    <p:inline>a123</p:inline>
  </p:with-input>
</p:ixml>

</p:declare-step>

This might return:

<ixml parseCount='3'>
<letters><X>a</X><C><digits>123</digits></C></letters>
<letters><X>a</X><B><digits>123</digits></B></letters>
<letters><X>a</X><A><digits>123</digits></A></letters>
</ixml>

As before, there is nothing standardized about the results in this case.

2.2. Document properties

No document properties are preserved.

3. Step Errors

This step can raise dynamic errors.

[Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space). For a more complete discussion of dynamic errors, see Dynamic Errors in XProc 3.0: An XML Pipeline Language.

If a step fails due to a dynamic error, failure propagates upwards until either a p:try is encountered or the entire pipeline fails. In other words, outside of a p:try, step failure causes the entire pipeline to fail.

The following errors can be raised by this step:

err:XC0205

It is a dynamic error if the source document cannot be parsed by the provided grammar.

See: p:ixml

A. Conformance

Conformant processors must implement all of the features described in this specification except those that are explicitly identified as optional.

Some aspects of processor behavior are not completely specified; those features are either implementation-dependent or implementation-defined.

[Definition: An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.]

[Definition: An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.]

A.1. Implementation-defined features

The following features are implementation-defined:

  1. If any other grammar format is provided, its interpretation is implementation-defined. See Section 2, “p:ixml”.
  2. It is implementation-defined if other result formats are possible. See Section 2, “p:ixml”.
  3. The parameters are implementation-defined. See Section 2, “p:ixml”.

A.2. Implementation-dependent features

This step has no implementation-dependent features.

    B. References

    [XProc 3.0] XProc 3.0: An XML Pipeline Language. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [XProc 3.0 Steps] XProc 3.0 Steps: An Introduction. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [Invisible XML] Invisible XML Specification, version 1.0. Steven Pemberton, editor. Version 2022-06-20.

    C. Glossary

    dynamic error

    A dynamic error is one which occurs while a pipeline is being evaluated.

    implementation-defined

    An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.

    implementation-dependent

    An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.

    D. Ancillary files

    This specification includes by reference a number of ancillary files.

    steps.xpl

    An XProc step library for the declared steps.