Notes on XQuery to XSLT

2 Copying Namespace Nodes

Considering how close XQuery and XSLT are in most other respects, the two systems differ by a remarkably large amount in their handling of namespace nodes.

2.1 Copying namespace nodes in XQuery

(Officially XQuery does not recognise namespace nodes in the data model, but when constructing (rather than querying) it is easier to think in terms of namespace nodes (which also aligns with the XSLT model.)

XQuery does not offer any control over the copying of namespace nodes associated with a specific element node but it does have a global declaration, declare copy-namespaces that allows the setting of two boolean flags "preserve" and "inherit" (so there are four possible copying modes, the default mode being implementation dependent.

Consider the following XQuery:

declare variable $x := <x xmlns:preserve="preserve"><z/></x>;
declare variable $y := <y xmlns:inherit="inherit"><z xmlns:a="a"/>{$x}</y>;

<a>{
for $n in in-scope-prefixes($y/z) order by $n return $n
  }|{
for $n in in-scope-prefixes($y/*/*) order by $n return $n
}</a>

Prepended with each of the four possible copy-namespace declarations.

declare copy-namespaces no-preserve,no-inherit;
declare copy-namespaces preserve,no-inherit;
declare copy-namespaces no-preserve,inherit;
declare copy-namespaces preserve,inherit;

The results returned in each case are:

<a>a inherit xml|xml</a>
<a>a inherit xml|preserve xml</a>
<a>a inherit xml|inherit xml</a>
<a>a inherit xml|inherit preserve xml</a>

Note that the copy-namespace declaration has no effect on the first clause, showing the namespaces in the directly nested direct element constructor. It does however affect the namespaces that are copied when the sequence $x is copied into the content of the newly constructed element as a result of evaluating the enclosed expression {$x}.

XQuery does not have any instruction to create namespace nodes, however then can be created using "namespace declaration attributes" on direct element constructors.

2.2 Copying namespace nodes in XSLT

XSLT has no global control over the namespace copying policy, it also offers no local control over namespace copying when sequences (typically generated by xsl:sequence) are implicitly copied into newly constructed elements. However it does have instructions (xsl:copy, xsl:copy-of to explictly copy nodes and these instructions take an attribute, copy-namespaces? = "yes" | "no" which more or less corresponds to the preserve flag in the XQuery declaration, and xsl:element and literal result elements take an attribute [xsl:]inherit-namespaces? = "yes" | "no" that more or less corresponds to the inherit flag in XQuery. XSLT also offers more possibilities for the creation of namespace nodes: in addition to the namespace declarations on literal result elements, there is an xsl:namespace instruction to explictly create namespace nodes (which means, if necessary, they can be created with newly created element nodes that have been generated by xsl:element. Use of xsl:element (even for elements names that are known statically, and so could be generated by literal result elements, provides an extra possibility to control the copying of namespace nodes (as by default xsl:element copies fewer namespace nodes than a literal result element).

2.3 Translating from XQuery namespace handling to XSLT

In order to achieve this a direct element constructor is translated to xsl:element if no-inherit is not set and a literal result element specifying xsl:inherit-namespaces="no" if no-inherit is set. Any namespace attributes on the direct element constructor are translated to both a namespace node on the literal result element, and an xsl:namespace instruction generating a namespace node as the first node in the element's content. The namespace node is needed on the xsl:element to ensure that the in statically known namespaces in the stylesheet correspond to the statically known namespaces in the original query. However xsl:element does not automatically produce namespace nodes in the result, so xsl:namespace is also needed.

The preserve namespace declaration is implemented in the following way: if no-preserve is set in the prolog, each enclosed expression in a direct element constructor is not directly generated as child nodes of the xsl:element or literal result element, instead it is first generated to a temporary tree that is then copied to the final result with xsl:copy-of with copy-namespaces="no".

3 White Space

This processor defaults to stripping boundary space, however if preserve is specified in the prologue all boundary space text nodes are copied to xsl:text elements in the generated stylesheet, and so produce space in the result of running the query.

The only other issue of note regarding white space is the white space is added between items of a sequence but not between adjacent enclosed expressions.

<a>{1,2}{3}</a>

This produces:

<a>1 23</a>

where a space has been added between 1 and 2, but not between this sequence and the 3 in the sequence generated by the next contained expression.

XSLT has an almost identical rule for attribute value templates, but doesn't really have an analogue of contained expressions for element content. xq2xsl currently generates the following, where an empty text node generated by <xsl:text/> is inserted after each contained expression, which prevents white space being inserted.

         <xsl:element name="a">
            <xsl:sequence select=" 1 "/>
            <xsl:sequence select=" 2 "/>
            <xsl:text/>
            <xsl:sequence select=" 3 "/>
            <xsl:text/>
         </xsl:element>

4 typeswitch

XQuery for some reason doesn't have a general multi-way conditional statement, but does have one restricted to testing the type of an expression. xq2xsl maps typeswitch to xslt's xsl:choose expression, testing on the value of "instance of" expressions. I need to check if this fully captures the semantics of typeswitch, especially the rules as to which errors a system may, or may not, raise in branches that are not executed. (It is probably as close as you can get in xslt anyway, but any differences will be documented here).

  typeswitch (2)
  case $zip as element(*)
       return 7
  case $postal as element(ll )
       return 8
  default $x return 9

  <xsl:variable as="item()*" name="xq:ts" select="( 2 ,())"/>
  <xsl:choose>
    <xsl:when test="$xq:ts instance of  element(  * )">
        <xsl:variable as=" element(  * )" name="zip" select="$xq:ts"/>
        <xsl:sequence select=" 7 "/>
     </xsl:when>
     <xsl:when test="$xq:ts instance of  element(  ll )">
        <xsl:variable as=" element(  ll )" name="postal" select="$xq:ts"/>
        <xsl:sequence select=" 8 "/>
     </xsl:when>
     <xsl:otherwise>
        <xsl:variable as="item()*" name="x" select="$xq:ts"/>
        <xsl:sequence select=" 9 "/>
     </xsl:otherwise>
  </xsl:choose>

The spurious trailing ",()" in the definition of the main variable xq:ts whose type is to be tested is there to ensure that the XSLT compiler does not statically determine its type (which would allow it to raise static errors on branches of the xsl:choose that could not arise). A smarter XSLT compiler would require more obfuscation here, or a translation with a schema-aware XSLT engine could perhaps detect the static type of the expression and omit any branches of the xsl:choose that could error in that case.

5 FLWOR: order by

The order by clause in a FLWOR expression is the one XQuery feature that does not easily translate to XSLT. This is essentially because it doesn't easily fit in the XQuery/XPath data model as its natural semantics requires nested sequences (called tuples in the XQuery documentation) but XPath sequences can not contain sequences, only atomic values. The XQuery Formal semantics also explictly omits to give semantics to Order By for the same reason.

The xq2xsl translation does (baring bugs) provide a full translation of a FLWOR expression into XSLT, however as translation of the most general case is rather verbose, the system detects some special cases and translates them more directly. These special cases will be discussed first.

5.1 FLWOR: XPath

If the convertor is in XPath mode, and detects a FLWOR expression that corresponds to an XPath For expression (specifically it has no "at" clause, no variable type declaration, no let clause, where or order by clauses) then the expression essentially translates to itself.

If any other FLWOR expression occurs in XPath mode, then the convertor switches to XSLT mode (which typically means generating a new function call).

5.2 FLWOR: XSLT, no tuples

If the convertor is in XSLT mode and the FLWOR expression has no order by clause, or if there is an order by clause but only one for-variable (and no let-variables) then the FLWOR expression is translated fairly naturally to a nested set of xsl:for-each instructions, and each clause of the order by is converted to an xsl:sort instruction (or two xsl:sort instructions if empty-first ordering is required).

5.3 FLWOR: XSLT, Cartesian Product

Note this case is not currently detected by the system, and FLWOR expressions that could be converted as described here are converted as described in the next section, which produces a more verbose and possibly less efficient translation.

Consider a general 2-variable for expression, in which the sequence over which the inner variable ranges does not depend on the outer variable. This means that we can assume that each for-variable is ranging over a sequence contained in a variable that has been calculated before the FLWOR expression.

for $i in $is,  $j in $js
order by f($i,$j)
return
g($i,$j)

for some functions f and g then this can be rewritten as:

let $ci :=count($is) return
let $cj :=count($js) return
for $n in (0 to $ci * $cj)
let $i :=$is[$n  mod $ci)+1]
let $j := $js[($n idiv $ci) +1]
order by f($i, $j)
return
g($i,$j)

This rewrite has produced a FLWOR expression with just one variable, so may be translated to XSLT for-each as described in the previous section.

Cases with more variables could be handled in exactly the same way, with appropriate div and mod expressions to calculate the required indices.

5.4 FLWOR: XSLT, Dependent Product (general case)

The general two variable case is:

 for $i in $is,
 $j in F($i)
 order by f($i,$j)
 return
 g($i,$j)

However this may be rewritten as:

let $one := (
 for $i at $ip in $is,
 $j at $jp in F($i)
 return
 encode($ip,$jp)
)
 return

 for $z in $one
 let $indexes:=decode($z)
 let $ip:=$indexes[1]
 let $i:= $is[$ip]
 let $jp:=$indexes[2]
 let $j:= (F($i))[$jp]
 order by f($i,$j)
 return
 g($i,$j)

where encode is anything that encodes a sequence of integers as a single item and decode is anything that gets the sequence of integers back. specifically the current convertor encodes a sequence of integers as a string using:
codepoints-to-string(($x1+32,$x2+32,...))
and decodes using
for $i in string-to-codepoints(.) return($i - 32)

Note this uses a single character per integer, which limits the method to a million or so entries per sequence, but could easily use more characters per integer, so this is not really a real restriction in the method

The main disadvantage of all this is that the expressions F($i) above that generate the sequences get evaluated multiple times and might be expensive, This could be optimised by the convertor (for example detecting the case that the sequences do not depend on the range variables, and so using the method in the previous section, but some of the optimisations are probably more easily done by the XSLT compiler. For example I believe Saxon does not evaluate variables that are never used, so there is no real need for the convertor to analyse the expressions and see if all the variables defined are needed.

6 Current Node

The major difference between an XQuery For expression in one variable, and an xsl:for-each is that the xsl:for-each changes the current item (.) to be the current item in the sequence being processed,whereas a FLWOR expression never changes the current item.

A simple example:

for $i in (1,2,3)
return
$i

This is encoded as:

   <xsl:variable name="xq:here" select="."/>
   <xsl:for-each select="( 1 , 2 , 3 )">
      <xsl:variable name="i" select="."/>
      <xsl:for-each select="$xq:here">
         <xsl:sequence select="$i"/>
      </xsl:for-each>
   </xsl:for-each>

The xq:here variable stores the current item before the For expression, and restores it inside the loop.

The main complication is that the current item is not always defined, but there is no test you can do within XSLT to check if it is defined, and if is not defined the above generates an error on the first line as the expression . generates an error in that case.

In XSLT the current item only becomes undefined in a function body, which isn't so bad as this case can be statically detected and it would be possible to avoid setting/restoring the current item in that case, however a harder case is the initial context, this is set (or not) outside the query and so there is no way of knowing if the current item will be set.

The solution taken here is that xq2xsl always ensures that there is a current item: The implicit initial context is never used, and instead any initial context has to be passed in as the parameter "input". (This name may be changed to the converter's own xq: namespace to avoid possible clash with any external variable named input in the query being converted.) If $input is not set it takes a default value (the integer 1). I believe this behaviour is conformant as an XQuery processor has pretty wide latitude over how the initial context is set up. So the initial template of an xq2xsl derived stylesheet always looks like this:

   <xsl:param name="input" as="item()" select="1"/>

   <xsl:template name="main">
      <xsl:for-each select="$input">
...
      </xsl:for-each>
   </xsl:template>

7 Built in types

Both XQuery and XSLT support two levels of support for schema types. A full processor can handle schema import and user defined types, however a basic processor can not. Unfortunately the conformance levels are not aligned. A Query processor even at the basic level of conformance has access to all the built in schema types, which means that they can be used in sequence type expressions as as constructor functions. A Basic XSLT processor on the other hand only has access to the primitive types (and xs:integer) other built in derived types are not available.

There is essentially nothing that can be done about this. The stylesheet has some experimental support to change any reference to a built in schema derived type to the nearest base type that is supported by a basic XSLT processor. This will allow queries to run without error and in many cases with acceptable results, however in some cases this will produce incorrect results and this translation perhaps should be turned off by default. (Currently it is turned on if the XSLT processor executing the conversion is not schema-aware.)

xq2xml is a personal project undertaken by David Carlisle (davidc "at" nag "dot" co "dot" uk). It is however distributed with the knowledge of, and from a web site controlled by, my employer NAG Ltd.