XProc 3.0: file steps201820192020the Contributors to the XProc 3.0 Standard Step Library
specificationsxproc/3.0-stepsXProc NextXMLNorman WalshAchim BerndzenGerrit ImsiekeErik SiegelThis specification describes the file related steps
for
XProc 3.0: An XML Pipeline Language.This specification was published by the
XProc
Next Community Group. It is not a W3C Standard nor is it on
the W3C Standards Track. Please note that under the
W3C
Community Contributor License Agreement (CLA) there is a limited
opt-out and other conditions apply. Learn more about W3C Community and Business
Groups.
If you wish to make comments regarding this document, please
send them to
xproc-dev@w3.org.
(subscribe,
archives).
IntroductionThis specification describes the file related XProc steps.
A machine-readable description of
these steps may be found in
steps.xpl.
Familarity with the
general nature of
steps is assumed; for background details, see
.p:directory-listThe p:directory-list step produces a list of the contents
of a specified directory.Conformant processors must support directory paths whose
scheme is file. It is
implementation-defined what other schemes are
supported by p:directory-list, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support directory listing for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:directory-list in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if the absolute path does not
identify a directory.It is a
dynamic error if the contents of the directory
path are not available to the step due to access restrictions in the
environment in which the pipeline is run.If the option is true, the pipeline
author is requesting additional information about the matching entries,
see .The option may contain either the string “unbounded” or a string
that may be cast to a non-negative integer. An integer value of 0 means that only information
about the directory that is given in the option is returned. A of
1, which is the default, will effect that also information about the top-level directory’s
immediate children will be included. For larger values of , also the content of
directories will be considered recursively up to the maximum depth, and it will be included as children of the
corresponding c:directory elements.If present, the value of the or
option must be a sequence of strings, each
one representing a regular expressions as specified in ,
section 7.61 “Regular Expression Syntax”. It is a dynamic
error if a specified value is not a valid XPath regular
expression.The regular expressions will be matched against an item’s file system path relative to the
top-level path that was given in the option. If the item is a directory,
a trailing slash will be appended. The matching is done unanchored: it is a match if the
regular expression matches part of the relative item’s file system path. Informally: matching behaves like applying
the XPath matches#2 function, like in
matches($path, $regular-expression).Examples: A file file.txt in the directory specified by will remain
file.txt, a relative path dir1/file.txt will remain
dir1/file.txt, while a relative path dir1/dir2 will become
dir1/dir2/ if dir2 is a directory.Regular expressions that match a/a/b/file.txt are, for example,
^(\w+/){2,3}.+\.txt$, a/a/b/, or /file\.[^/]+$.If any pattern matches the slash-augmented relative path, the entry is included in the output.
If a directory’s path matches the inclusion regex, the directory’s content will not automatically be included, too.
They need to match, the regular expression, too. So the filter regex ^dir/ will match the directory
content but ^dir/$ won’t, and as a consequence the directory’s content will not be included in the result.If a relative path is matched by an include filter,
all its ancestor directories starting from the initial directory (but not their content if not included explicitly)
will be included, too.Sample Directory List Output for a Single FileFor a file a/a/b/file.txt below the initial directory
/home/jane, this output will be produced, omitting content that might be present in the
intermediate directories:<c:directory xml:base="file:///home/jane/" name="jane">
<c:directory xml:base="a/" name="a">
<c:directory xml:base="a/" name="a">
<c:directory xml:base="b/" name="b">
<c:file xml:base="file.txt" name="file.txt"/>
</c:directory>
</c:directory>
</c:directory>
</c:directory>If the pattern matches the slash-augmented relative path, the entry (and all of
its content in case of a directory) is excluded in the output.If both options are
provided, the include filter is processed first, then the exclude
filter. As a result, an item is included if it matches (at least) one
of the values and none of the
values.If no is given, that is, if is an empty
sequence, any item will be included in the result (unless it is excluded by ).There is no way to specify a list of values using attribute value
templates. If the option shortcut syntax is used to provide the
or option,
it will consist of a single regular expression. To specify a list of
regular expressions, you must use the p:with-option
syntax.
The option can be used to partially override the
content-type determination mechanism. This works just like with the
option of p:archive-manifest and
p:unarchive (see ), except
that the regular expression matching is done against the paths as used for the matching of the
and options.The result document produced for the specified directory path has a c:directory
document element whose base URI, attached as an xml:base attribute, is the absolute
directory path (expressed as a URI that ends in a slash) and whose name attribute
(without a trailing slash) is the last segment of the directory path. The same base URI is attached as the
resulting document’s base-uri property and, accordingly, as its document node’s base URI.Its contents are determined as follows, based on the entries in the directory identified by the directory path.
For each entry in the directory and subject to the rules that are imposed by the ,
, and options, a c:file, a
c:directory, or a c:other element is produced, as follows: A c:directory is produced for each subdirectory not determined to be special. Depending on the
values of the three options, it may contain child elements for the directory’s content.A c:file is produced for each file
not determined to be special.Any file or directory determined to be
special by the p:directory-list step may be output using a
c:other element but the criteria for marking a file as
special are implementation-defined.Each of the elements c:file, c:directory,
and c:other has a name attribute, whose
value is a relative IRI reference, giving the (local) file or
directory name.Each of these element also contains the corresponding resource’s URI in an xml:base
attribute, which may be a relative URI for any but the top-level c:directory element. In the case of
c:directory, it must end in a trailing slash. This way, users will always be able to compute the
absolute URI for any of these elements by applying fn:base-uri() to it.Directory list detailsIf is false, then only the
name and xml:base attributes are expected on
c:file, c:directory, or c:other
elements.If is true, then the pipeline author
is expecting additional details about each entry. The following attributes
should be provided by the implementation:content-typeThe content-type attribute contains
the content type of the respective file. The value “application/octet-stream”
will be used if the processor is not able to identify another content type.readable“true” if the entry is readable.writable“true” if the entry is writable.hidden“true” if the entry is hidden.last-modifiedThe last modification time of the entry, expressed as a lexical
xs:dateTime in UTC.sizeThe size of the entry in bytes.The precise meaning of these properties are
implementation-defined and may vary according
to the URI scheme of the .
If the value of an attribute is “false” or if it has no
meaningful value, the attribute may be omitted.Any other attributes on
c:file, c:directory, or c:other
are implementation-defined, but they must be in a namespace.Document propertiesBesides the content-type property,
the resulting document has a base-uri. Its value is identical to the top-level
element’s xml:base attribute, that is, to the directory’s URI.p:file-copyThe p:file-copy step copies a file or a directory to a given target.The p:file-copy step copies the file or directory named in to the new position specified in
. Any non existent directory contained in will be created before copying starts.
If the target is a directory, the step attempts to copy the file or directory into that directory,
preserving its base name. It is a dynamic error if the
option names a directory, but the option names a file.If the evaluates to false, no existing file will be changed.Conformant processors must support URIs whose
scheme is file for the and options
of p:file-copy.
It is implementation-defined what other schemes are
supported by p:file-copy, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-copy for a specified scheme.If or are relative, they are made absolute against the
base URI of the element on which they are specified
(p:with-option or p:file-copy in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-copy is not available to the step due to access restrictions
in the environment in which the pipeline is run.If no error occurs, the step returns a c:result element containing the absolute URI of the
target.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error. In the case of a recursive copy, processing stops at the first error.If an error occurs and is true, one of the following errors is
raised:It is a dynamic error if the resource referenced by the
option does not exist, cannot be accessed or is not a file or directory.It is a dynamic error the file or directory cannot be copied
to the specified location.Copying directoriesIf identifies a directory and
target also identifies a directory, or does not exist,
then the p:file-copy step attempts to copy the entire
directory tree identified by : the directory and
all of its descendants.In this case:If is false and an error occurs, no further copying
is attempted after the first error is detected.If is true, err:XC0157 does not apply to
descendants. A directory under may replace a file with the corresponding
name under .
Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-deleteThe p:file-delete step deletes a file or a directory.The p:file-delete step attempts to delete an existing file or directory named in
. If the named file or directory does not exist, the step just returns a
c:result element as described below.Conformant processors must support URIs whose
scheme is file for the option of p:file-delete.
It is implementation-defined what other schemes are
supported by p:file-delete, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-delete for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:file-delete in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-delete is not available to the step due to access restrictions
in the environment in which the pipeline is run.If specifies a directory, it can only be deleted if the option
is true or if the specified directory is empty.The step returns a c:result element containing the absolute URI of the file or directory.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error.If an error occurs and is true, one of the following errors is
raised:It is a dynamic error if the resource referenced by the
option cannot be accessed or is not a file or directory.It is a dynamic error if an attempt is made to delete a non-empty
directory and the option was set to false.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-infoThe p:file-info step returns information about a file, directory or other file system
object.The p:file-info step returns information about the file, directory or other file system object named
in the option.Conformant processors must support URIs whose
scheme is file for the option of p:file-info.
It is implementation-defined what other schemes are
supported by p:file-info, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-info for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:file-info in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-info is not available to the step due to access restrictions
in the environment in which the pipeline is run.If the option is a file: URI, the step returns:If option references a file: A c:file element with standard attributes (see
below).If option references a directory: A c:directory element with standard
attributes (see below).If option references any other file system object: Implementation defined (for example
an c:other or c:device element). It is advised to use the standard attributes (see below)
if applicable.The option can be used to partially override the
content-type determination mechanism for files. This works just like with the
option of p:archive-manifest and
p:unarchive (see ), except
that the regular expression matching is done against the absolute URI of the file.Each of the elements c:file, c:directory,
and c:other has a name attribute, whose
value is a relative IRI reference, giving the (local) file or
directory name.The following attributes are standard on a returned c:file or c:directory element. All
attributes are optional and must be absent if not applicable. Additional implementation-defined attributes may be
present, but they must be in a namespace.AttributeTypeDescriptionreadablexs:booleantrue if the object is readable.writablexs:booleantrue if the object file is writable.hiddenxs:booleantrue if the object is hidden.last-modifiedxs:dateTimeThe last modification time of the object expressed in UTC.sizexs:integerThe size of the object in bytes.content-typexs:stringThe content type, if the object is a file.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error.If an error occurs and is true, one of the following errors is
raised:It is a dynamic error if the resource referenced by the
option does not exist, cannot be accessed or is not a file, directory or other file
system object.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-mkdirThe p:file-mkdir step creates a directory.The p:file-mkdir create the directory named in the option. If this includes
more than one directory component, all of the intermediate components are created. If the directory already exists
the step just returns the c:result element as described below. The path separator is
implementation-defined.Conformant processors must support URIs whose
scheme is file for the option of p:file-mkdir.
It is implementation-defined what other schemes are
supported by p:file-mkdir, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-mkdir for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:file-mkdir in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-mkdir not available to the step due to access restrictions
in the environment in which the pipeline is run.The step returns a c:result element containing the absolute URI of the directory.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error.If an error occurs and is true, the following error is
raised:It is a dynamic error if the directory referenced by the
option cannot be created.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-moveThe p:file-move step moves a file or directory.The p:file-move step moves the file or directory named in to the new location
specified in . If the option specifies an
existing directory, the step attempts to move the file or directory into
that directory, preserving its base name. It is a dynamic error if the
option names a directory, but the option names a file.Conformant processors must support URIs whose
scheme is file for the and options
of p:file-move.
It is implementation-defined what other schemes are
supported by p:file-move, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-move for a specified scheme.If or are relative, they are made absolute against the
base URI of the element on which they are specified
(p:with-option or p:file-move in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-move is not available to the step due to access restrictions
in the environment in which the pipeline is run.If the option specifies a device or other special kind of object, the results are
implementation-defined.If the move is successful, the step returns a c:result element containing the absolute URI of the
target.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error.If an error occurs and is true, one of the following errors is
raised:It is a dynamic error if the resource referenced by the
option does not exist, cannot be accessed or is not a file or
directory.It is a dynamic error if the resource referenced by the
option is an existing file or other file system object.It is a dynamic error if the directory cannot be
moved to the specified location.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-create-tempfileThe p:file-create-tempfile step creates a temporary file.The p:file-create-tempfile creates a temporary file. The temporary file is guaranteed not to already exist
when the step is called.If the option is specified it must be the URI of an existing directory. The temporary file
is created here. If there is no option specified the location of the temporary file is
implementation defined, usually the operating system's default location for temporary files.Conformant processors must support URIs whose
scheme is file for the option of p:file-create-tempfile. It is
implementation-defined what other schemes are
supported by p:file-create-tempfile, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-create-tempfile for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:file-create-tempfile in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-create-tempfile cannot be completed due to access
restrictions in the environment in which the pipeline is run.If the option is specified, the filename will begin with that prefix. If the
option is specified, the filename will end with that suffix.If the option is true, an attempt will be made to automatically
delete the temporary file when the processor terminates the pipeline. No error will be raised if this is
unsuccessful.If the temporary file creation is successful, the step returns a c:result element containing the
absolute URI of this file.If an error occurs and is false, the step returns a
c:error element which may contain additional, implementation-defined, information about the nature of
the error.If an error occurs and is true, one of the following errors is
raised:It is a dynamic error if the resource referenced by the
option does not exist, cannot be accessed or is not a directory.It is a dynamic error if the temporary file could not be
created.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.p:file-touchThe p:file-touch step updates the modification
timestamp of a file.The p:file-touch step updates the modification
timestamp of the file specified in the
option. If the file specified by does not
exist, an empty file will be created at the given location.Conformant processors must support URIs whose
scheme is file for the option of p:file-touch. It is
implementation-defined what other schemes are
supported by p:file-touch, and what the interpretation
of ‘directory’, ‘file’ and ‘contents’ is for those schemes.It is a dynamic error if an
implementation does not support p:file-touch for a specified scheme.If is relative, it is made absolute against the
base URI of the element on which it is specified
(p:with-option or p:file-touch in the case of a
syntactic shortcut value). It is a dynamic
error if the base URI is not both absolute and valid according to .It is a
dynamic error if p:file-touch cannot be completed due to access
restrictions in the environment in which the pipeline is run.If the option is set, the file's
timestamp is set to this value. Otherwise the file's timestamp is set
to the current system's date and time.If the operation is successful, the step returns a c:result element containing the absolute URI
of the file.If an error occurs and is
false, the step returns a c:error element
which may contain additional, implementation-defined, information
about the nature of the error.If an error occurs and is
true, the following error is raised:It is a dynamic error if the resource referenced by the
option does not exist and cannot be created
or exists and cannot be accessed.Document propertiesThe resulting document has no properties
apart from content-type. In particular, it has no base-uri.Step ErrorsThese steps can raise
dynamic errors.
A dynamic
error is one which occurs while a pipeline is being
evaluated. Examples of dynamic errors include references to
URIs that cannot be resolved, steps which fail, and pipelines that
exhaust the capacity of an implementation (such as memory or disk
space). For a more complete discussion of dynamic errors, see
.
If a step fails due to a dynamic error, failure propagates
upwards until either a p:try is encountered or the entire
pipeline fails. In other words, outside of a p:try, step
failure causes the entire pipeline to fail.The following specific errors can be raised by these steps:ConformanceConformant processors must implement all of the features
described in this specification except those that are explicitly identified
as optional.Some aspects of processor behavior are not completely specified; those
features are either implementation-dependent or
implementation-defined.An
implementation-dependent feature is one where the
implementation has discretion in how it is performed.
Implementations are not required to document or explain
how implementation-dependent features are performed.An
implementation-defined feature is one where the
implementation has discretion in how it is performed.
Conformant implementations must document
how implementation-defined features are performed.Implementation-defined featuresThe following features are implementation-defined:Implementation-dependent featuresThe following features are implementation-dependent:ReferencesXProc 3.0XProc 3.0:
An XML Pipeline Language.
Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.
XProc 3.0 StepsXProc 3.0 Steps:
An Introduction.
Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.
XPath and XQuery Functions and Operators 3.1XPath and XQuery Functions and Operators 3.1. Michael Kay, editor.
W3C Recommendation. 21 March 2017RFC 3986RFC 3986:
Uniform Resource Identifier (URI): General Syntax.
T. Berners-Lee, R. Fielding, and L. Masinter, editors.
Internet Engineering Task Force. January, 2005.Glossarydynamic
errorA dynamic
error is one which occurs while a pipeline is being
evaluated.implementation-definedAn
implementation-defined feature is one where the
implementation has discretion in how it is performed.
Conformant implementations must document
how implementation-defined features are performed.implementation-dependentAn
implementation-dependent feature is one where the
implementation has discretion in how it is performed.
Implementations are not required to document or explain
how implementation-dependent features are performed.Ancillary filesThis specification includes by reference a number of
ancillary files.An XProc step library for the declared steps.