Extensible Validation Report Language (XVRL)

Draft Community Group Report

Editor's Draft at 20:06 UTC (build 14)
Latest editor’s draft:
http://spec.xproc.org/master/head/xvrl/
Editors:
Gerrit Imsieke
Matthieu Ricaud-Dussarget
Norman Walsh
Participate:
GitHub
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.


Abstract

This specification describes a unified vocabulary for validation reports. Its main focus is to express the findings of the most common XML validation languages, Schematron, XML Schema, DTD, and Relax NG. It is meant to be extensible in multiple ways. It should both be able to express the results of other XML validation methods and of validation methods that apply to non-XML formats such as JSON or RDF graphs (irrespective of their serialization format). Another extension axis is that it allows addition of custom attributes or elements. While XVRL at its core is specified in terms of an XML vocabulary with a Relax NG schema, there may also be non-normative serialization formats and schemas, namely a JSON serialization and schema.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

1. Introduction

XVRL provides a unified format for expressing possibly multiple validation methods, applied to possibly multiple documents. The need arises because not every validation language has a standardized report format, making it difficult to render the results of multiple validations in a single report.

2. XVRL Vocabulary

XVRL elements are in the namespace http://www.xproc.org/ns/xvrl. XVRL documents may contain elements in other namespaces at certain locations. The XVRL elements and attributes and their semantics are given in the following lists. More details about the XVRL grammar are encoded in the Relax NG Compact Syntax version of the XVRL schema, which is also normative.

2.1. Document Structure

detection element

A single finding, typically with an associated error code and/or message(s). A report element primarily contains detection elements. See Section 2.2, “Detection” for details.

digest element

report may contain a digest element in order to provide a summary of the detection elements. For the distinct severity levels, counts of the detection elements for a given level may be specified on digest, for example in an @error-count attribute. In addition, the @worst attribute may give the highest severity level that occurs in the detection elements that are contained by the digest’s parent element.

digest element may occur in addition to or instead of detection elements. If no detection element is included, a digest element must be included.

All information in digest is understood to be aggregated at some point from the actual detection elements. It is the responsibility of an XVRL creating/processing application to keep them up to date or to remove them when the underlying detection information is changed. A digest may be inserted either before or after the detection elements.

metadata element

Information about the time of validation, the validator used, the document(s) under test, etc. See Section 2.3, “Metadata” for details.

A single metadata element need not contain all relevant metadata. Metadata infomation will be inherited from surrounding reports/metadata elements, that is, if a given metadata does not provide validator information but the parent reports/metadata does, the parent’s metadata/validator will also pertain to the current metadata element’s siblings and their descendants, unless overridden further down.

report element

The result of a single validation method, typically using a single schema, typically applied to a single document (also referred to as the source document). The individual errors (or other findings) are included as detection elements.

Naming things…

Previously, what is called “detection” here was called “report”, while a collection of detections was called “validation-report”. Now this collection ist called “report”, while a collection of (new-terminology) reports is now called “reports” (previously: “validation-reports”). I changed the names because I didn’t think that both an individual finding and a collection of individual findings should both be called a “report”, with the prefix “validation-” discerning between both. Since the individual finding is also the result of a validation, there is no reason it couldn’t have been called “validation-report” in the first place. It took quite some time to come up with a term for the individual findings. Candidates were: “finding”, “observation”, “detection”, “incidence”, and “incident”. I’m willing to rename it to something that seems more fitting.

reports element

A collection of report elements. It may contain the same metadata information as a single report in order to denote common information, for example if all validations have been applied to the same document or if all validations use the same schema or validation engine.

reports elements may nest in order to group report elements with common sets of metadata.

2.2. Detection

As described in Section 2.1, “Document Structure”, detection is the main container for individual validation findings. It contains optional severity and code attributes, and the following elements in arbitrary order:

category element(s)

In order to filter or group messages for a formatted report, individual detections may be categorized according to arbitrary category systems, using the repeatable category element. Its optional attribute vocabulary can hold a string that designates the category system. There are no pre-defined values to choose from.

Categorization that applies to all detections in a report can be included in the report’s metadata.

code attribute

An error code. The term “error code” is used in a colloquial sense here. It need not relate to an error, but to any kind of message that has a distinctive identifying string.

context element

The purpose of this element is to present a piece of content that surrounds the element that the detection pertains to. It contains an optional location element, followed by (optional) arbitrary text or non-XVRL element content.

location element

Within a single detection element, the location in the source document that the validation error, warning, etc. pertains to is given by the location element’s attributes.

If not present, href is taken from the closest ancestor’s metadata/document/@href attribute. If there are multiple document/@href attributes in the closest ancestor’s metadata, the href attribute should not be omitted on location, or at least a disambiguating relative URI should be given in the location/@href attribute.

The attribute xpath contains an XPath expression that gives the location within the document. The in-scope value of the attribute xpath-default-namespace that is permitted on any element may give a namespace for the element names in this XPath expression. Apart from that, the Q{namespace-uri}local-name syntax should be used, but in-scope namespace prefixes or XPath predicates like [namespace-uri() = 'uri'] may also be used.

The attributes line and column may also be used to point at lines and columns in a textual representation of the source document.

The attribute octet-position may be used to give the byte position (1-offset) of the error. This may be useful for binary documents.

In order to support JSON document validations, the attributes jsonpath and jsonpointer may be used.

Giving multiple alternative pointers is not forbidden. However, it is beyond the scope of this specification to define mechanisms to enforce or check consistency between the attribute values. It is evident that jsonpointer or jsonpath are meaningless in the context of XML documents.

Other attributes are permitted if they are in a non-XVRL namespace.

message element

An error message that pertains to a detection. There may be multiple message elements in a single detection element, typically to convey localized versions of essentially the same message. A message may contain arbitrary markup in non-XVRL namespaces. Messages are typically generated for consumption by humans.

Note

Whenever the term “error message” is used in a colloquial sense (that is, not highlighted as the severity level “error” or as the XVRL element “message”) throughout this specification, a detection element with any @severity level, not necessarily “error”, and any number of localized messages is implied. Likewise, the term “error code” does not imply the severity level “error”.

provenance element

In multi-step conversion pipelines it is sometimes required to save a common origin location that a portion of the validated document is derived from. This may be necessary in order to patch back error messages of later conversion stages into the source document.

The optional provenance element within a detection conveys exactly this information, in a contained location element that points to the provenance location in the original source document. A provenance element may contain multiple location elements; it is up to processing applications to discern between different roles that they may have.

Although it is possible to omit the @href attribute in the contained location elements, this URI is not inherited from a containing element’s metadata/document/@href attribute.

severity attribute

The severity attribute is permitted on a detection element. XVRL establishes a finite set of error levels that correspond to the impact of a detected issue. Each detection element may have a severity level, from highest (worst impact) to lowest, of “fatal-error”, “error”, “warning”, or “info”. In addition, the severity attribute may have the value “unspecified” which is equivalent to omitting the attribute.

Note

Which severity level is attached to a given error code depends on, among other things, the audience that the validation report is prepared for. For Schematron’s SVRL output, the values of @role will typically translate to XVRL @severity attributes, but this mapping may be configured, see below.

summary element(s)

An abstract of a report, a reports collection, or an individual detection. This element is repeatable, for example, in order to support multiple natural languages. In the context of detection, it can serve as an abridged version of a full message that contains lengthy lists and the like.

supplemental element(s)

This repeatable element may contain arbitrary textual or non-XVRL element content. It may appear in metadata and in detection. Its role attribute may be used to further classify the purpose of its content. Like any other XVRL element, it may be localized using the xml:lang attribute.

Purposes can be, but are not limited to, conveying the SVRL source that the XVRL report was created from, or a disclaimer, a confidentiality statement, or introductory content that should be included in a rendered report.

2.3. Metadata

All elements in this section are optional within metadata. The order in which they appear is arbitrary. Some are repeatable.

creator element

Information about the tool that created the source document. There is no content, only the optional attributes @name, @version, and @invocation. The @invocation attribute is meant to hold a command line that contains the invocation of the program that was responsible for generating the source document. This information can be useful for later diagnosing dependencies between errors and command line options.

category element(s)

See category element(s)

document element(s)

The URI of a source document may be specified in the href attribute. In addition or instead, the document may be given as the element content. See also location element about inheritance of href.

schema element(s)

The URI of a schema document may be specified in the href attribute. In addition or instead, the document may be given as the element content. The namespace of the schema may be given in the attribute schematypens. The version of the schema language may be given in the attribute version.

summary element(s)

See summary element(s).

supplemental element(s)

See supplemental element(s).

timestamp element

>The content needs to be an xsd:dateTime value, for example “2017-12-04T12:21:37.381+01:00”.

title element(s)

The title of a report, a reports collection, or an individual detection. This element is repeatable, for example, in order to support multiple natural languages.

validator element

Information about the validation program that generated the report(s) or the underlying messages (if XVRL was not generated natively by the program). There are optional attributes @name and @version, both are arbitrary strings. Arbitrary text or element (in a non-XVRL namespace) content may be contained, for example to describe a configuration or to include an actual configuration file.

A. The XVRL Schema

# Schema for XML Validation Report Language; adapted from a proposal
# by Matthieu Ricaud incorporating some suggestions by Gerrit Imsieke
# See https://github.com/xproc/3.0-steps/issues/15

namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
namespace local = ""
default namespace xvrl = "http://www.xproc.org/ns/xvrl"

start = reports | report

xmllang.attr = attribute xml:lang { xsd:language }
xmlid.attr = attribute xml:id { xsd:ID }
xmlbase.attr = attribute xml:base { xsd:anyURI }
# Default namespace URI for location XPaths:
xpdns.attr = attribute xpath-default-namespace { xsd:anyURI }
anyother.attr = attribute * - (local:* | xvrl:* | xml:*) { text }
any.attr = attribute * - xvrl:* { text }
message.attr  = attribute (* - (xvrl:* | xml:*)) { text }


common.attr =
  xmllang.attr?
  & xmlid.attr?
  & xmlbase.attr?
  & xpdns.attr?
  & anyother.attr*

href.attr =
  attribute href { xsd:anyURI }
  
any.element = 
  element * - xvrl:* { (any.attr | text | any.element)* }

reports =
  element reports {
    common.attr,
    report.metadata,
    (report | reports | digest)+
  }
report =
  element report {
    common.attr,
    report.metadata,
    ((digest?, detection+) | (detection+, digest) | digest)
  }

## All information in digest is understood to be aggregated at some point from the actual detection elements. 
## It is the responsibility of an XVRL creating/processing application to keep them up to date or to remove them 
## when the underlying detection information is changed. If the individual detections are omitted, a digest must be present. 
## A digest may be inserted either before or after the detection elements.
digest =
  element digest {
    common.attr,
    attribute valid { "true" | "false" | "partial" | "undetermined" }?,
    attribute fatal-error-count { xsd:integer }?,
    attribute error-count { xsd:integer }?,
    attribute warning-count { xsd:integer }?,
    attribute info-count { xsd:integer }?,
    attribute unspecified-count { xsd:integer }?,
    attribute fatal-error-codes { list { token* } }?,
    attribute error-codes { list { token* } }?,
    attribute warning-codes { list { token* } }?,
    attribute info-codes { list { token* } }?,
    attribute unspecified-codes { list { token* } }?,
    attribute worst { "fatal-error" | "error" | "warning" | "info" | "nothing" | "unspecified" }?
  }
report.metadata =
  element metadata {
    common.attr,
    (timestamp?
     & validator?
     & creator?
     & document*
     & title*
     & summary*
     & category*
     & schema*
     & supplemental*)
  }

document = 
  element document {
    common.attr,
    href.attr?,
    (text | any.element)
  }
  
detection =
  element detection {
    common.attr,
    attribute severity { "info" | "warning" | "error" | "fatal-error" | "unspecified" }?,
    attribute code { text }?,
    (location?
     & provenance?
     & title*
     & summary*
     & category*
     & (let*, message*)
     & supplemental*
     & context?)
  }

let =
    element let {
        common.attr,
        attribute name { xsd:QName },
        (attribute value {xsd:string} | (text | any.element))*
    }

value-of =
    element value-of {
        attribute name { xsd:QName }
    }

location = element location { location.model }
location.model =
  xpdns.attr?,
  # XPaths may use the Q{namespace-uri}local-name notation.
  attribute xpath { text }?,
  # These are different syntaxes to address JSON documents.
  # JSON docs may be represented as XPath maps and arrays
  # and then addressed via, e.g., xpath=".(3)('foo')"
  # for the 3rd array item, which is a map, and then the map’s
  # value for the 'foo' key.
  attribute jsonpointer { text }?,
  attribute jsonpath { text }?,
  attribute href { xsd:anyURI }?,
  attribute line { xsd:positiveInteger }?,
  attribute column { xsd:positiveInteger }?,
  # For binary data:
  attribute octet-position { xsd:positiveInteger }?,
  anyother.attr*
provenance = element provenance { location+ }

message =
  element message {
    common.attr,
    (text | message.element)*
  }
message.element =
    element (* - (xvrl:* - xvrl:value-of)) {
      (message.attr | text | message.element | value-of)*
  }

supplemental =
  element supplemental { common.attr, (text | any.element)* }
context =
  element context {
  common.attr, location?, (text | any.element)* }
validator =
  element validator {
    common.attr,
    attribute name { text },
    attribute version { text }?,
    (text | any.element)*
  }
creator =
  element creator {
    common.attr,
    attribute name { text },
    attribute version { text }?,
    element invocation { text }?
  }
schema =
  element schema {
    common.attr,
    attribute href { xsd:anyURI }?,
    attribute schematypens { xsd:anyURI },
    attribute version { text }?,
    (text | any.element)?
  }
title = element title { common.attr, (text | any.element)* }
summary = element summary { common.attr, (text | any.element)* }
category =
  element category {
    common.attr,
    attribute vocabulary { xsd:token }?,
    (text | any.element)*
  }
timestamp = element timestamp { common.attr, xsd:dateTime }

B. Parameters for Controlling XVRL Generation

The following parameters should be understood by XVRL report generators when converting underlying validation reports, for example, from SVRL or from the XProc error vocabulary, c:errors etc.

xvrl:default-severity

When no severity is associated with a source vocabulary element that is mapped to detection, this property can be specified in order to assign a default severity to any of these source vocabulary constructs. It can be argued that the XProc error vocabulary, c:error, already conveys the severity level error. The view that this specification takes is to regard these messages as generic findings of severity “error”, but that the xvrl:default-severity may be given to override this.

Implementations are free to provide other parameters, in a different namespace, that permit a more detailed mapping, for example from error code to severity.

xvrl:serialization-format

Anticipates future alternative serialization If no value is given, xml is assumed. Other possible values may be, but are not limited to, json, rdf/xml, turtle.

xvrl:language

A space separated list of language abbreviations, typically according to ISO 639-1. The preferred language is given first, followed by fallback languages. The result is that localized elements within a detection will be reduced to messages, categories, or summaries in the preferred language. Example: de en instructs the XVRL generator to include German messages only and to use an English message when no German message is present. If no language matches for a given localizable element in a detection context, a corresponding element with the same attributes, but with no xml:lang attribute, should be included. Localizable elements with an xml:lang attribute that is not listed in this property should be ignored.

xvrl:map-to-severity

This parameter contains space-separated QNames that correspond to elements or attributes of an underlying reporting language, in particular SVRL attributes. A value of flag role instructs an SVRL to XVRL converter to preferentially map the SVRL flag attribute to the XVRL severity attribute. If it is not present or its value cannot be mapped, it should try to map the SVRL role value to XVRL’s severity.

The following attribute values are considered mappable, after folding the source value to lower case: “information”, “informational” map to “info”; “warn” maps to “warning”; “fatal” maps to ”fatal-error”. A conversion tool may consider other variants, including translations that correspond to the natural language of the corresponding error message, for mapping.

If the content of an (for example) SVRL attribute cannot be mapped, it should be attached to the corresponding XVRL detection either as a category or as a namespaced attribute (that is, role="foo" in SVRL may become svrl:role="foo" in XVRL, with xmlns:svrl="http://purl.oclc.org/dsdl/svrl").

xvrl:xpath-notation

This parameter controls how XPath attributes given in location elements should be structured. Possible values are “Q”, “namespace-uri“, and “name”.

Example: The path /TEI/text[1] in the namespace http://www.tei-c.org/ns/1.0 will be represented in these notations as follows:

Q

/Q{http://www.tei-c.org/ns/1.0}TEI/Q{http://www.tei-c.org/ns/1.0}text[1]

namespace-uri

/*:TEI[namespace-uri()='http://www.tei-c.org/ns/1.0']/*:text[namespace-uri()='http://www.tei-c.org/ns/1.0'][1]

This corresponds to the parameter setting full-path-notation=1 in the SVRL output of the Schematron skeleton implementation.

name

/tei:TEI/tei:text[1]

This corresponds to the parameter setting full-path-notation=2 in the SVRL output of the Schematron skeleton implementation. It takes namespace prefix declarations from the source document and it needs to copy these declarations to an appropriate location in the resulting XVRL document.

If the XVRL attribute xpath-default-namespace is present on an ancestor element, the namespace URI given in this attribute on the closest ancestor should be used to omit this namespace from the resulting XPath. If xpath-default-uri="http://www.tei-c.org/ns/1.0" is in force in a given context, the paths in any of the three notations should reduce to /TEI/text[1].

If an XVRL-generating application is unable to generate the preferred notation, any XPath notation that it can produce is acceptable.

C. XSLT Stylesheets (Non-Normative)

As a convenience, XSLT stylesheets will be made available for the following purposes:

SVRL → XVRL conversion

It is recommended that XProc processor vendors make this XSLT available under the import URI http://xproc.org/xvrl/xsl/svrl2xvrl.xsl.

c:errors → XVRL conversion

It is recommended that XProc processor vendors make this XSLT available under the import URI http://xproc.org/xvrl/xsl/c-errors2xvrl.xsl.

Filter/transform XVRL

It is recommended that XProc processor vendors make this XSLT available under the import URI http://xproc.org/xvrl/xsl/xvrl2xvrl.xsl.

All stylesheets accept the parameters given in Appendix B, Parameters for Controlling XVRL Generation.