XProc 3.0: validation steps

Editor's Draft

This Version:
https://xproc.github.io/3.0-steps/master/head/validation/
Latest Version:
http://spec.xproc.org/master/head/validation/
Editors:
Norman Walsh
Achim Berndzen
Gerrit Imsieke
Erik Siegel
Repository:
This specification on GitHub
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.


Abstract

This specification describes the p:validate-with-nvdl, p:validate-with-relax-ng, p:validate-with-schematron, and p:validate-with-xml-schema step for XProc 3.0: An XML Pipeline Language.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

This document is derived from XProc: An XML Pipeline Language published by the W3C.


1 Introduction

This specification describes the p:validate-with-relax-ng, p:validate-with-schematron, p:validate-with-xml-schema, p:validate-with-nvdl, and the generic p:validate steps. Each is independently optional. A machine-readable description of these steps may be found in steps.xpl.

Note

p:validate is not included yet. We should discuss Norm’s proposal and decide whether we want to include such a step. In addition, an idea was floating that such a generic step might also look at the xml-model processing instructions that might be prepended to a document. What makes it problematic might be that these PIs are missing after parsing the document, unless an extra effort is made to re-attach them to the document node after parsing. This could be a document property models that holds a c:models document with an XML representation of the xml-model PIs, but all kinds of document loading mechanisms (p:load, p:document, p:http-request) would need to be extended to process these prepended xml-model PIs when loading a document.

Familiarity with the general nature of [XProc 3.0] steps is assumed; for background details, see [XProc 3.0 Steps].

2 Common Options and Outputs

All steps in this specification provide a boolean option assert-valid. If any of the validated documents is found to be invalid according to the respective schema, and possibly other parameters that influence determination of validity, a dynamic error is raised.

Note

Historically, the validation steps (apart from p:validate-with-schematron) could only report errors by setting assert-valid to true and catching the errors. A c:errors document on the error port of the corresponding p:catch recovery pipeline had to be sent to an output, either verbatim or after postprocessing. Now, if assert-valid is false, an XVRL document will be available on the report port of the validation step. It uses the XVRL severity vocabulary to indicate whether the validation failed, and to which degree. This allows more nuance in reporting errors. Previously, assert-valid="true" on p:validate-with-schematron would always throw an error even if the reported findings were only intended as less severe, for example if the schema author used info or warning in sch:report/@role.

If no such error is raised, each step generates at least one validation report document on its report port. Unless another serialization format is requested, the mandatory report document should be valid with respect to the [XVRL] schema. There may be multiple report documents on the report port, including other vocabularies such as SVRL. If a step performs multiple validations, as may be the case for p:validate, all individual XVRL reports should be consolidated into a single XVRL report by the step.

Note

Should the XVRL spec be built using the XProc infrastructure? I added a preliminary biblioref that points to http://spec.xproc.org/xvrl/

Note

Should we allow also text and json content types for the report port? Only as an additional alternative, or instead of the XML report?

Should we add a parameter to p:cast-content-type that allows to convert between XML and JSON XVRL serialization formats (once JSON serialization is specified for XVRL)?

Each of the validation steps has a parameters option. The parameters supported by the validation steps and their semantics are implementation-defined, and they can be different for each validation step. Keys in the c namespace, http://www.w3.org/ns/xproc-step, namely c:compile and c:xvrl, hold maps themselves that control schema compilation and XVRL generation, respectively. Schema compilation is, for example, the process of converting a Schematron schema into an XSLT stylesheet. For the c:xvrl map, XProc implementations that implement any of the validation steps should also support the basic parameters that are defined in the [XVRL] specification, format and severity.

3 Validate with NVDL

The p:validate-with-nvdl step applies [NVDL] validation to the source document.

<p:declare-step type="p:validate-with-nvdl">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="nvdl" content-types="xml"/>
     <p:input port="schemas" sequence="true" content-types="text xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="application/xml json"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
</p:declare-step>

The source document is validated using the namespace dispatching rules contained in the nvdl document.

The dispatching rules may contain URI references that point to the actual schemas to be used. As long as these schemas are accessible, it is not necessary to pass anything on the schemas port. However, if one or more schemas are provided on the schemas port, then these schemas should be used in validation.

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input. The output of this step may include PSVI annotations.

Note

Should the step also provide the dtd-attribute-values and dtd-id-idref-warnings options for Relax NG validations? Is there a way to instruct Jing to use these options, maybe in NVDL extension attributes? Probably not in the foreseeable future.

3.1 Document properties

All document properties on the source port are preserved on the result port. No document properties on the schemas and nvdl ports are preserved. No document properties are preserved on the report port.

4 Validate with RELAX NG

The p:validate-with-relax-ng step applies [RELAX NG] validation to the source document.

<p:declare-step type="p:validate-with-relax-ng">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" content-types="text xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="application/xml json"/>
     <p:option name="dtd-attribute-values" select="false()" as="xs:boolean"/>
     <p:option name="dtd-id-idref-warnings" select="false()" as="xs:boolean"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
</p:declare-step>

The values of the dtd-attribute-values and dtd-id-idref-warnings options must be booleans.

If the schema document has an XML media type, then it must be interpreted as a RELAX NG Grammar. If the schema document has a text media type, then it must be interpreted as a [RELAX NG Compact Syntax] document for validation.

If the dtd-attribute-values option is true, then the attribute value defaulting conventions of [RELAX NG DTD Compatibility] are also applied.

If the dtd-id-idref-warnings option is true, then the validator should treat a schema that is incompatible with the ID/IDREF/IDREFs feature of [RELAX NG DTD Compatibility] as if the document was invalid.

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input, possibly augmented by application of the [RELAX NG DTD Compatibility]. The output of this step may include PSVI annotations.

Support for [RELAX NG DTD Compatibility] is implementation defined.

4.1 Document properties

All document properties on the source port are preserved on the result port. No document properties on the schema port are preserved. No document properties are preserved on the report port.

5 Validate with Schematron

The p:validate-with-schematron step applies [Schematron] processing to the source document.

<p:declare-step type="p:validate-with-schematron">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" content-types="xml"/>
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:output port="report" sequence="true" content-types="application/xml json"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
     <p:option name="phase" select="'#DEFAULT'" as="xs:string"/>   
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
</p:declare-step>

It is a dynamic error (err:XC0054) if the assert-valid option is true and any Schematron assertions fail or reports succeed.

The value of the phase option identifies the Schematron validation phase with which validation begins.

The parameters option provides name/value pairs which correspond to Schematron external variables, to parameters that influence code generation, or to parameters that influence SVRL to XVRL conversion.

There are multiple Schematron implementations. An XProc processor that implements this step should at least support the [Schematron Skeleton] implementation.

How the Schematron implementation is selected is implementation-defined. A processor might select an implementation based on the schema’s queryBinding attribute and/or provide configuration options.

The parameters map may contain two special entries, c:compile and c:xvrl, both are maps. If a code-generating implementation such as Skeleton is used, the entries of the c:compile map, for example allow-foreign, will be passed to the code generator.

The SVRL to XVRL conversion can be influenced by the entries of the c:xvrl map. Which parameters the SVRL to XVRL conversion supports is implementation-defined.

All other parameters of the parameters option will be passed to the generated code if applicable, or to a hypothetical native Schematron validator that does without code generation.

The result output from this step is a copy of the input.

In addition to the mandatory XVRL report, a Schematron Validation Report Language (SVRL) report should be provided on the report port.

The output of this step may include PSVI annotations.

5.1 Document properties

All document properties on the source port are preserved on the result port. No document properties on the schema port are preserved. No document properties are preserved on the report port.

6 Validate with XML Schema

The p:validate-with-xml-schema step applies [W3C XML Schema: Part 1] validity assessment to the source input.

<p:declare-step type="p:validate-with-xml-schema">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" sequence="true" content-types="xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="application/xml json"/>
     <p:option name="use-location-hints" select="false()" as="xs:boolean"/>
     <p:option name="try-namespaces" select="false()" as="xs:boolean"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
     <p:option name="mode" select="'strict'" as="xs:token" values="('strict','lax')"/>
     <p:option name="version" as="xs:string?"/>                    
</p:declare-step>

The values of the use-location-hints, try-namespaces, and assert-valid options must be boolean.

The value of the mode option must be an NMTOKEN whose value is either “strict” or “lax”.

Validation is performed against the set of schemas represented by the documents on the schema port. These schemas must be used in preference to any schema locations provided by schema location hints encountered during schema validation, that is, schema locations supplied for xs:import or xsi:schema-location, or determined by schema-processor-defined namespace-based strategies, for the namespaces covered by the documents available on the schemas port.

If xs:include elements occur within the supplied schema documents, they are treated like any other external documents (see [XProc 3.0]). It is implementation-defined if the documents supplied on the schemas port are considered when resolving xs:include elements in the schema documents provided.

The use-location-hints and try-namespaces options allow the pipeline author to control how the schema processor should attempt to locate schema documents necessary but not provided on the schema port. Any schema documents provided on the schema port must be used in preference to schema documents located by other means.

If the use-location-hints option is “true”, the processor should make use of schema location hints to locate schema documents. If the option is “false”, the processor should ignore any such hints.

If the try-namespaces option is “true”, the processor should attempt to dereference the namespace URI to locate schema documents. If the option is “false”, the processor should not dereference namespace URIs.

The mode option allow the pipeline author to control how schema validation begins. The “strict” mode means that the document element must be declared and schema-valid, otherwise it will be treated as invalid. The “lax” mode means that the absence of a declaration for the document element does not itself count as an unsuccessful outcome of validation.

If the step specifies a version, then that version of XML Schema must be used to process the validation. It is a dynamic error (err:XC0011) if the specified schema version is not available. If the step does not specify a version, the implementation may use any version it has available and may use any means to determine what version to use, including, but not limited to, examining the version of the schema(s).

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid. If the assert-valid option is false, it is not an error for the document to be invalid. In this case, if the implementation does not support the PSVI, p:validate-with-xml-schema is essentially just an “identity” step, but if the implementation does support the PSVI, then the resulting document will have additional type information (at least for the subtrees that are valid).

When XML Schema validation assessment is performed, the processor is invoked in the mode specified by the mode option. It is a dynamic error (err:XC0055) if the implementation does not support the specified mode.

The result of the assessment is a document with the Post-Schema-Validation-Infoset (PSVI) ([W3C XML Schema: Part 1]) annotations, if the pipeline implementation supports such annotations. If not, the input document is reproduced with any defaulting of attributes and elements performed as specified by the XML Schema recommendation.

6.1 Document properties

All document properties on the source port are preserved on the result port. No document properties on the schemas port are preserved.

7 Step Errors

This step can raise dynamic errors.

[Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space). For a more complete discussion of dynamic errors, see Dynamic Errors in XProc 3.0: An XML Pipeline Language.

If a step fails due to a dynamic error, failure propagates upwards until either a p:try is encountered or the entire pipeline fails. In other words, outside of a p:try, step failure causes the entire pipeline to fail.

The following errors can be raised by this step:

err:XC0011

It is a dynamic error if the specified schema version is not available.

See: Validate with XML Schema

err:XC0053

It is a dynamic error if the assert-valid option is true and the input document is not valid.

See: Validate with NVDL, Validate with RELAX NG, Validate with XML Schema

err:XC0054

It is a dynamic error if the assert-valid option is true and any Schematron assertions fail or reports succeed.

See: Validate with Schematron

err:XC0055

It is a dynamic error if the implementation does not support the specified mode.

See: Validate with XML Schema

A Conformance

Conformant processors must implement all of the features described in this specification except those that are explicitly identified as optional.

Some aspects of processor behavior are not completely specified; those features are either implementation-dependent or implementation-defined.

[Definition: An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.]

[Definition: An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.]

A.1 Implementation-defined features

The following features are implementation-defined:

  1. The parameters supported by the validation steps and their semantics are implementation-defined, and they can be different for each validation step. See Section 2, “Common Options and Outputs”.
  2. How the Schematron implementation is selected is implementation-defined. See Section 5, “Validate with Schematron”.
  3. Which parameters the SVRL to XVRL conversion supports is implementation-defined. See Section 5, “Validate with Schematron”.
  4. It is implementation-defined if the documents supplied on the schemas port are considered when resolving xs:include elements in the schema documents provided. See Section 6, “Validate with XML Schema”.

A.2 Implementation-dependent features

The following features are implementation-dependent:

    B References

    [XProc 3.0] XProc 3.0: An XML Pipeline Language. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [XProc 3.0 Steps] XProc 3.0 Steps: An Introduction. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [Schematron] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-3:2016(E) Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron 2016.

    [NVDL] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-4:2006(E) Document Schema Definition Languages (DSDL) — Part 4: Namespace-based Validation Dispatching Language (NVDL) 2006.

    [Schematron Skeleton] Schematron “Skeleton” Implementation 2017.

    [RELAX NG Compact Syntax] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2003/Amd 1:2006 Document Schema Definition Languages (DSDL) — Part 2: Grammar-based validation — RELAX NG AMENDMENT 1 Compact Syntax 2006.

    [RELAX NG DTD Compatibility] RELAX NG DTD Compatibility. OASIS Committee Specification. 3 December 2001.

    [W3C XML Schema: Part 1] XML Schema Part 1: Structures Second Edition. Henry S. Thompson, David Beech, Murray Maloney, et. al., editors. World Wide Web Consortium, 28 October 2004.

    C Glossary

    dynamic error

    A dynamic error is one which occurs while a pipeline is being evaluated.

    implementation-defined

    An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.

    implementation-dependent

    An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.

    D Ancillary files

    This specification includes by reference a number of ancillary files.

    steps.xpl

    An XProc step library for the declared steps.