XProc 3.0: validation steps

Draft Community Group Report

Editor's Draft at (build 456)
Latest editor’s draft:
http://spec.xproc.org/master/head/validation/
Editors:
Norman Walsh
Achim Berndzen
Gerrit Imsieke
Erik Siegel
Participate:
GitHub xproc/3.0-steps
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.


Abstract

This specification describes the p:validate-with-nvdl, p:validate-with-relax-ng, p:validate-with-schematron, and p:validate-with-xml-schema step for XProc 3.0: An XML Pipeline Language.

Status of this Document

This document is an editor's draft that has no official standing.

This specification was published by the XProc Next Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

If you wish to make comments regarding this document, please send them to xproc-dev@w3.org. (subscribe, archives).

This document is derived from XProc: An XML Pipeline Language published by the W3C.

1. Introduction

This specification describes the p:validate-with-relax-ng, p:validate-with-schematron, p:validate-with-xml-schema, and p:validate-with-nvdl steps. Each is independently optional. A machine-readable description of these steps may be found in steps.xpl.

Steps for validating non-XML documents, such as JSON, might be added in a future revision of this specification.

Familiarity with the general nature of [XProc 3.0] steps is assumed; for background details, see [XProc 3.0 Steps].

2. Common Options and Outputs

All steps in this specification provide a boolean option assert-valid. If any of the validated documents is found to be invalid according to the respective schema, and possibly other parameters that influence determination of validity, a dynamic error is raised.

Note

Historically, the validation steps (apart from p:validate-with-schematron) could only report errors by setting assert-valid to true and catching the errors. A c:errors document on the error port of the corresponding p:catch recovery pipeline had to be sent to an output, either verbatim or after postprocessing. Now, if assert-valid is false, an XVRL document will be available on the report port of the validation step. It uses the XVRL severity vocabulary to indicate whether the validation failed, and to which degree. This allows more nuance in reporting errors. Previously, assert-valid="true" on p:validate-with-schematron would always throw an error even if the reported findings were only intended as less severe, for example if the schema author used info or warning in sch:report/@role.

If no such error is raised, each step generates at least one validation report document on its report port. Unless another format is requested, the mandatory report document for all steps except p:validate-with-schematron should be an [XVRL] document. A report format may be requested by the report-format option. The supported values for the report-format option are implementation-defined. A processor should at least support the value “xvrl” for the XML validation steps and “svrl” for p:validate-with-schematron.

It is a dynamic error (err:XC0117) if a report-format option was specified that the processor does not support.

If a step performs multiple validations on a single document (for example, embedded Schematron validations in a Relax NG schema), all individual XVRL reports need to be consolidated into a single XVRL report by the step.

Each of the validation steps has a parameters option. The parameters supported by the validation steps and their semantics are implementation-defined, and they can be different for each validation step. A special key in the c namespace, http://www.w3.org/ns/xproc-step, called c:compile, can hold a map itself that controls schema compilation. Schema compilation is, for example, the process of converting a Schematron schema into an XSLT stylesheet. The c:compile map will be used as parameters for the compilation process.

Map entries in the xvrl namespace, http://www.xproc.org/ns/xvrl will be passed as parameters to the XVRL generation process. XProc implementations that implement any of the XML validation steps should support the basic parameters that are defined in the [XVRL] specification, xvrl:default-severity, xvrl:language, xvrl:map-to-severity, and xvrl:xpath-notation.

3. Validate with NVDL

The p:validate-with-nvdl step applies [NVDL] validation to the source document.

<p:declare-step type="p:validate-with-nvdl">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="nvdl" content-types="xml"/>
     <p:input port="schemas" sequence="true" content-types="text xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="xml json"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="report-format" select="'xvrl'" as="xs:string"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
</p:declare-step>

The source document is validated using the namespace dispatching rules contained in the nvdl document.

The dispatching rules may contain URI references that point to the actual schemas to be used. As long as these schemas are accessible, it is not necessary to pass anything on the schemas port. However, if one or more schemas are provided on the schemas port, then these schemas should be used in validation.

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input. The output of this step may include PSVI annotations.

Note

Should the step also provide the dtd-attribute-values and dtd-id-idref-warnings options for Relax NG validations? Is there a way to instruct Jing to use these options, maybe in NVDL extension attributes? Probably not in the foreseeable future.

Document properties

All document properties on the source port are preserved on the result port. No document properties on the schemas and nvdl ports are preserved. No document properties are preserved on the report port.

4. Validate with RELAX NG

The p:validate-with-relax-ng step applies [RELAX NG] validation to the source document.

<p:declare-step type="p:validate-with-relax-ng">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" content-types="text xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="xml json"/>
     <p:option name="dtd-attribute-values" select="false()" as="xs:boolean"/>
     <p:option name="dtd-id-idref-warnings" select="false()" as="xs:boolean"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="report-format" select="'xvrl'" as="xs:string"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
</p:declare-step>

The values of the dtd-attribute-values and dtd-id-idref-warnings options must be booleans.

If the schema document has an XML media type, then it must be interpreted as a RELAX NG Grammar. If the schema document has a text media type, then it must be interpreted as a [RELAX NG Compact Syntax] document for validation.

If the dtd-attribute-values option is true, then the attribute value defaulting conventions of [RELAX NG DTD Compatibility] are also applied.

If the dtd-id-idref-warnings option is true, then the validator should treat a schema that is incompatible with the ID/IDREF/IDREFs feature of [RELAX NG DTD Compatibility] as if the document was invalid.

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input, possibly augmented by application of the [RELAX NG DTD Compatibility]. The output of this step may include PSVI annotations.

Support for [RELAX NG DTD Compatibility] is implementation defined.

Document properties

All document properties on the source port are preserved on the result port. No document properties on the schema port are preserved. No document properties are preserved on the report port.

5. Validate with Schematron

The p:validate-with-schematron step applies [Schematron] processing to the source document.

<p:declare-step type="p:validate-with-schematron">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" content-types="xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="xml json"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
     <p:option name="phase" select="'#DEFAULT'" as="xs:string"/>   
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="report-format" select="'svrl'" as="xs:string"/>
</p:declare-step>

It is a dynamic error (err:XC0054) if the assert-valid option is true and any Schematron assertions fail or reports succeed.

Note

A Schematron validation with assert-valid="true" will fail if any validation message is produced by sch:assert or sch:report, even if the severity level of the failed assertion or the successful report is below a certain threshold, for example if there is only an info message. (The severity is conventionally conveyed by the @role attribute on sch:assert or sch:report.)

The value of the phase option identifies the Schematron validation phase with which validation begins.

The parameters option provides name/value pairs which correspond to Schematron external variables, to parameters that influence code generation, or to parameters that influence SVRL to XVRL conversion.

There are multiple Schematron implementations. How the Schematron implementation is selected is implementation-defined. A processor might select an implementation based on the schema’s queryBinding attribute and/or provide configuration options. In addition, the special parameter map entry c:implementation (value: QName) may be used to select a Schematron implementation that the processor supports. The list of supported Schematron implementations and their associated values is implementation-defined. If a requested implementation is not available, the processor may throw an error or select another implementation.

The parameters map may contain two special entries, c:compile and c:xvrl, both are maps. If a code-generating implementation such as [Schematron Skeleton] is used, the entries of the c:compile map, for example allow-foreign, will be passed to the code generator. Which parameters the c:compile map supports for a given Schematron implementation is implementation-defined.

If the Schematron implementation produces SVRL by default, the SVRL to XVRL conversion can be influenced by the entries of the c:xvrl map. The same map, with potentially another set of allowed keys and values, can be used to influence XVRL generation from another reporting language. Which parameters this conversion from native reporting format to XVRL supports is implementation-defined.

All other parameters of the parameters option will be passed to the generated code if applicable, or to a hypothetical native Schematron validator that does without code generation.

The result output from this step is a copy of the input.

In addition to the mandatory XVRL report, a Schematron Validation Report Language (SVRL) report should be provided on the report port.

The output of this step may include PSVI annotations.

Document properties

All document properties on the source port are preserved on the result port. No document properties on the schema port are preserved. No document properties are preserved on the report port.

6. Validate with XML Schema

The p:validate-with-xml-schema step applies [W3C XML Schema: Part 1] validity assessment to the source input.

<p:declare-step type="p:validate-with-xml-schema">
     <p:input port="source" primary="true" content-types="xml html"/>
     <p:input port="schema" sequence="true" content-types="xml"/>
     <p:output port="result" primary="true" content-types="xml html"/>
     <p:output port="report" sequence="true" content-types="xml json"/>
     <p:option name="use-location-hints" select="false()" as="xs:boolean"/>
     <p:option name="try-namespaces" select="false()" as="xs:boolean"/>
     <p:option name="assert-valid" select="true()" as="xs:boolean"/>
     <p:option name="parameters" as="map(xs:QName,item()*)?"/>     
     <p:option name="mode" select="'strict'" as="xs:token" values="('strict','lax')"/>
     <p:option name="version" as="xs:string?"/>                    
     <p:option name="report-format" select="'xvrl'" as="xs:string"/>
</p:declare-step>

The values of the use-location-hints, try-namespaces, and assert-valid options must be boolean.

The value of the mode option must be an NMTOKEN whose value is either “strict” or “lax”.

Validation is performed against the set of schemas represented by the documents on the schema port. These schemas must be used in preference to any schema locations provided by schema location hints encountered during schema validation, that is, schema locations supplied for xs:import or xsi:schema-location, or determined by schema-processor-defined namespace-based strategies, for the namespaces covered by the documents available on the schemas port.

If xs:include elements occur within the supplied schema documents, they are treated like any other external documents (see [XProc 3.0]). It is implementation-defined if the documents supplied on the schemas port are considered when resolving xs:include elements in the schema documents provided.

The use-location-hints and try-namespaces options allow the pipeline author to control how the schema processor should attempt to locate schema documents necessary but not provided on the schema port. Any schema documents provided on the schema port must be used in preference to schema documents located by other means.

If the use-location-hints option is “true”, the processor should make use of schema location hints to locate schema documents. If the option is “false”, the processor should ignore any such hints.

If the try-namespaces option is “true”, the processor should attempt to dereference the namespace URI to locate schema documents. If the option is “false”, the processor should not dereference namespace URIs.

The mode option allow the pipeline author to control how schema validation begins. The “strict” mode means that the document element must be declared and schema-valid, otherwise it will be treated as invalid. The “lax” mode means that the absence of a declaration for the document element does not itself count as an unsuccessful outcome of validation.

If the step specifies a version, then that version of XML Schema must be used to process the validation. It is a dynamic error (err:XC0011) if the specified schema version is not available. If the step does not specify a version, the implementation may use any version it has available and may use any means to determine what version to use, including, but not limited to, examining the version of the schema(s).

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid. If the assert-valid option is false, it is not an error for the document to be invalid. In this case, if the implementation does not support the PSVI, p:validate-with-xml-schema is essentially just an “identity” step, but if the implementation does support the PSVI, then the resulting document will have additional type information (at least for the subtrees that are valid).

When XML Schema validation assessment is performed, the processor is invoked in the mode specified by the mode option. It is a dynamic error (err:XC0055) if the implementation does not support the specified mode.

The result of the assessment is a document with the Post-Schema-Validation-Infoset (PSVI) ([W3C XML Schema: Part 1]) annotations, if the pipeline implementation supports such annotations. If not, the input document is reproduced with any defaulting of attributes and elements performed as specified by the XML Schema recommendation.

Document properties

All document properties on the source port are preserved on the result port. No document properties on the schemas port are preserved.

7. Step Errors

This step can raise dynamic errors.

[Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space). For a more complete discussion of dynamic errors, see Dynamic Errors in XProc 3.0: An XML Pipeline Language.

If a step fails due to a dynamic error, failure propagates upwards until either a p:try is encountered or the entire pipeline fails. In other words, outside of a p:try, step failure causes the entire pipeline to fail.

The following errors can be raised by this step:

err:XC0011

It is a dynamic error if the specified schema version is not available.

See: Validate with XML Schema

err:XC0053

It is a dynamic error if the assert-valid option is true and the input document is not valid.

See: Validate with NVDL, Validate with RELAX NG, Validate with XML Schema

err:XC0054

It is a dynamic error if the assert-valid option is true and any Schematron assertions fail or reports succeed.

See: Validate with Schematron

err:XC0055

It is a dynamic error if the implementation does not support the specified mode.

See: Validate with XML Schema

err:XC0117

It is a dynamic error if a report-format option was specified that the processor does not support.

See: Common Options and Outputs

A. Conformance

Conformant processors must implement all of the features described in this specification except those that are explicitly identified as optional.

Some aspects of processor behavior are not completely specified; those features are either implementation-dependent or implementation-defined.

[Definition: An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.]

[Definition: An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.]

A.1. Implementation-defined features

The following features are implementation-defined:

  1. The supported values for the report-format option are implementation-defined. A processor should at least support the value “xvrl” for the XML validation steps and “svrl” for p:validate-with-schematron. See Section 2, “Common Options and Outputs”.
  2. The parameters supported by the validation steps and their semantics are implementation-defined, and they can be different for each validation step. See Section 2, “Common Options and Outputs”.
  3. How the Schematron implementation is selected is implementation-defined. See Section 5, “Validate with Schematron”.
  4. The list of supported Schematron implementations and their associated values is implementation-defined. See Section 5, “Validate with Schematron”.
  5. Which parameters the c:compile map supports for a given Schematron implementation is implementation-defined. See Section 5, “Validate with Schematron”.
  6. Which parameters this conversion from native reporting format to XVRL supports is implementation-defined. See Section 5, “Validate with Schematron”.
  7. It is implementation-defined if the documents supplied on the schemas port are considered when resolving xs:include elements in the schema documents provided. See Section 6, “Validate with XML Schema”.

A.2. Implementation-dependent features

The following features are implementation-dependent:

    B. References

    [XProc 3.0] XProc 3.0: An XML Pipeline Language. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [XProc 3.0 Steps] XProc 3.0 Steps: An Introduction. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [Schematron] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-3:2016(E) Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron 2016.

    [NVDL] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-4:2006(E) Document Schema Definition Languages (DSDL) — Part 4: Namespace-based Validation Dispatching Language (NVDL) 2006.

    [Schematron Skeleton] Schematron “Skeleton” Implementation 2017.

    [RELAX NG Compact Syntax] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2003/Amd 1:2006 Document Schema Definition Languages (DSDL) — Part 2: Grammar-based validation — RELAX NG AMENDMENT 1 Compact Syntax 2006.

    [RELAX NG DTD Compatibility] RELAX NG DTD Compatibility. OASIS Committee Specification. 3 December 2001.

    [W3C XML Schema: Part 1] XML Schema Part 1: Structures Second Edition. Henry S. Thompson, David Beech, Murray Maloney, et. al., editors. World Wide Web Consortium, 28 October 2004.

    C. Glossary

    dynamic error

    A dynamic error is one which occurs while a pipeline is being evaluated.

    implementation-defined

    An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.

    implementation-dependent

    An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.

    D. Ancillary files

    This specification includes by reference a number of ancillary files.

    steps.xpl

    An XProc step library for the declared steps.