Dimitre 的个人资料XSLT: Riding the challen...日志列表网络 工具 帮助

日志


1月26日

Some experience with Tail-Recursion Optimization by XSLT 2.0 processors

It is wellknown that recursion is one of the main techniques associated with the functional programming style and this goes hand-in-hand with the principles that variables are immutable and once defined their values cannot be reassigned.  

Using recursion is elegant and beautiful most of the time, but it can have its drawbacks, too. One is that processing a long (with length N) list of items  the system may crash due to call-stack overflow. Depending on the available amount of memory and the specific language/runtime parameters this may happen at different recursion depths, but as a rule on a typical computer call-stack overflow is most often observed when N >= 1000.  

Thus, for any practical functional programming applications it is of paramount importance to be able to avoid call-stack overflows. There are two main ways of solving this problem: tail-recursion optimization and the DVC (Divide and Conquer) principle. Today I'd discuss tail-recursion optimization and the state of this technique in some existing and wellknown XSLT (2.0 or 1.0) processors.

Definition(Wikipedia): "Tail recursion (or tail-end recursion) is a special case of recursion that can be easily transformed into an iteration. Such a transformation is possible if the recursive call is the last thing that happens in a function".  

Transforming recursion into iteration (similar to a C-like "for" or "while" loop) not only completely eliminates the possibility of call-stack overflow, but also often leads to lightning-fast resulting code.  

This month I have been playing with an XSLT 2.0 implementation of a simple general LR parser as described in the Wikipedia. The implementation was straightforward and simple enough and the code was very short and compact. What was best, I had a tail-recursive variant of my code! Tested with Saxon 8.7.3 it also ran extremely fast, dispelling any doubts that XSLT Might not be the best language to use for such kind of processing.

Before starting to implement my XSLT 2.0 and XPath 2.0 parsers written in XSLT 2.0 (  :oO)  ) I needed to run a simple experiment just to be sure I was on the safe side from the very start. I needed to verify that Saxon will indeed recognize the tail-recursion and optimize it, so that it would always be possible to parse very long "sentences" described by a particular grammar. I started doubling the length (number of terminal symbols) of the input and unfortunately! and completely to my horror Saxon crashed with the dreaded "stack overflow" exception when the length reached 1600.  

Imagine my ("mixed" to put it mildly) feelings.  

I reported the issue to the saxon-help mailing list. Got assurances from Dr. Kay that he would look into the issue. Sent him the tail-recursive variant of my code. Then waited. As of today, the issue is not listed in the tracker for the Saxon project. I know that Dr. Kay is a busy person and during this time period he had been involved in the final stages of making the XSLT 2.0 spec an official W3C Recommendation -- something really a lot more important.  I believe that sooner or later this issue will be fixed in Saxon.

Anyway, I did try the other two known XSLT 2.0 implementations -- Altova and Gexslt. Both failed the same way as Saxon.  

Then I decided to test a much more simple XSLT tail-recursive application -- both on available XSLT 1.0 and XSLT 2.0 processors. The code to produce the sum of the numbers 1,2,3,..., N

where N=1000000 (one million) is listed below:

<xsl:stylesheet version="2.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                
 xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 xmlns:f="http://fxsl.sf.net/" 
 > 
 <xsl:output method="text"/>

      <xsl:template match="/"> 
        <xsl:sequence select="f:sum(0, 1, 1000000)"/> 
      </xsl:template>      

      <xsl:function name="f:sum"> 
        <xsl:param name="pAccumulated" as="xs:integer"/> 
        <xsl:param name="pFrom" as="xs:integer"/> 
        <xsl:param name="pTo" as="xs:integer"/>        

        <xsl:sequence select= 
         "if($pFrom gt $pTo) 
            then $pAccumulated 
            else                                            
              f:sum($pAccumulated + $pFrom, $pFrom + 1, $pTo) 
         " 
         /> 
      </xsl:function>
</xsl:stylesheet>

This transformation runs with Saxon 8.7.3 (J) for about 2.5 to 3 seconds. Altova crashes with stack overflow again.

The case with Gexslt was more complicated -- it was processing this transformation for a very long time (more than 2 minutes CPU) and was gradually consuming all the memory of the computer.

I tried a similar, template-based transformation with a number of XSLT 1.0 processors. As it happens, .NET 2.0 XslCompiledTransform is the champion in this group, producing the correct result almost instantaneously.

I want to finish this story in an optimistic tone.

I contacted Colin-Paul Adams -- the author/developer of Gexslt and he responded very quickly. Both the LR-parsing tail-recursion and the sum() tail recursion problem were fixed in a couple of days.

So, it appears that right at this moment Gexslt handles correctly and much better some kinds of tail-recursive transformations (such as of the LR-parsing type) than any other available XSLT 2.0 processor.

This is the beauty of having more than one competing products. Many thanks to Colin!

 

 

 

1月23日

XSLT 2.0 has arrived !

This is the title of the message of Dr. Michael Kay, I just saw in xsl-list.

 

XSLT 2.0 is a Recommendation of  W3C.

 

Wow...  A very great milestone has been reached. This is the programming language I really love and have been intensively using in my free time for the past few years.

 

Many thanks to personally to Dr. Kay and to all the people in the Working Groups who have worked hard all this time.

1月21日

Dr. Michael Kay on MetaStylesheets, On how, and why, XSLT can be used to transform XSLT (or XML Schema, or XQuery)

 

MetaStylesheets, On how, and why, XSLT can be used to transform XSLT (or XML Schema, or XQuery) -- presentation by Michael Kay at XML 2006

It’s quite often the case that someone asks about real-world scenarios or use-cases of using XSLT.

For those of us, who did not attend XML 2006 and Michael Kay’s presentation, here’s the text:

     http://2006.xmlconference.org/proceedings/26/presentation.pdf

And the slides:

    http://2006.xmlconference.org/proceedings/26/slides.pdff

I am amazed at the huge expressiveness of this 8-page presentation, describing in detail three large-scale real-world use-cases for dynamically generating XSLT stylesheets, comparing the scope and appropriateness of XSLT and XQuery, managing to explain the strengths of FXSL in just one sentence (I remember a question by one of my colleagues why and where the FXSL approach would be useful – after a 40+ minutes FXSL presentation), speaking about XProc, …, etc.

Enjoy.

1月9日

Update: Cascade deletions with XSLT

Within the first 24 hours of posting Cascade deletions with XSLT,  David Carlisle came up with improved XSLT 2.0 solution (below). David's solution main improvement is not so much in being just 37 lines long (3 lines down from the 40-line solution I originally posted) but in defining with an <xsl:key> all nodes to be deleted. So, the intersect operator from my solution is not used in David's, which should translate in better speed. Thank you, David.

 

 Cascade deletions -- XSLT 2 (David Carlisle)

 

<xsl:stylesheet version="2.0
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/> 

 <xsl:key name="x" match="class
         use="@name"/> 

<xsl:key name="x" match="beast
         use="@class"/> 

<xsl:key name="beastbytype" match="beast
         use="@type"/> 

<xsl:key name="x" match="breed
 use="key('beastbytype',@type)/@class"/> 

<xsl:key name="breadbyname" match="breed
         use="@name"/> 

<xsl:key name="x" match="pet" use=
    "key('beastbytype',
          key('breadbyname',@breed)
                          /@type
          )
           /@class"/> 

<xsl:template match="*">
 <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:apply-templates/>
 </xsl:copy>
</xsl:template> 

<xsl:template match="key('x','Aves')"/> 

</xsl:stylesheet>

1月6日

Cascade deletions with XSLT

 

A few days ago this question popped up on comp.text.xml:

 

I have an xml document in which elements are hierarchically related to
each other conceptually.  Unfortunately, the hierarchical relationship
is not modelled in the schema (i.e., the elements are "flattened" in
the document".  I have a case in which I want to remove a high level
element using xslt and want all the related lower-level elements to be
removed as well. Is there an easy way to do this in a template?

For example, imagine a relationship of elements describing animals
starting with "animal" at the top of the hierarchy and ending in "pets"
at the bottom:

animal => class => beastie => breed => pet

Given the xml below, say I wanted to remove the class named "Aves" and
wanted  to cause a cascading removal of all other elements related to
"Aves".  Is there a simple way to do this using xsl that helps me avoid
writing specific templates for each child node of "animals"?

<?xml version="1.0" encoding="utf-8"?>
<animals>

   <classes>
      <class name="Mammalia"/>
      <class name="Aves"/>
   </classes>

   <beasties>
      <beast type="cat" class="Mammalia"/>
      <beast type="dog" class="Mammalia"/>
      <beast type="bird" class="Aves"/>
   </beasties>

   <breeds>
      <breed name="collie" type="dog"/>
      <breed name="beagle" type="dog"/>
      <breed name="persian" type="cat"/>
      <breed name="siamese" type="cat"/>
      <breed name="parakeet" type="bird"/>
      <breed name="crow" type="bird"/>
   </breeds>

   <pets>
      <pet name="rover" breed="collie"/>
      <pet name="fluffy" breed="persian"/>
      <pet name="tweety" breed="parakeet"/>
   </pets>

</animals>

The desired xml output would be this:

<?xml version="1.0" encoding="utf-8"?>
<animals>

   <classes>
      <class name="Mammalia"/>
   </classes>

   <beasties>
      <beast type="cat" class="Mammalia"/>
      <beast type="dog" class="Mammalia"/>
   </beasties>

   <breeds>
      <breed name="collie" type="dog"/>
      <breed name="beagle" type="dog"/>
      <breed name="persian" type="cat"/>
      <breed name="siamese" type="cat"/>
   </breeds>

   <pets>
      <pet name="rover" breed="collie"/>
      <pet name="fluffy" breed="persian"/>
   </pets>

</animals>

 

Here is  the first solution that came to my mind (one for XSLT 1.0 and for XSLT 2.0):

 Cascade Deletes (XSLT 1.0), 49 lines

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform
">
 <xsl:output omit-xml-declaration="yes" indent="yes
"/>
 <xsl:strip-space elements="*"/>     

 <xsl:param name="pClassToDel" select="'Aves'"/>     

 <xsl:key name="kBeastByType" match="beast"
 
         use="@type"/>     

 <xsl:key name="kBreedByName" match="breed
          use="@name"/>     

 <xsl:template match="node()|@*" name="identity">
   <xsl:copy
>
      <xsl:apply-templates select="node()|@*
"/>
   </xsl:copy
>
 </xsl:template>     

 <xsl:template match="class">
   <xsl:if test="not(@name = $pClassToDel)
">
      <xsl:call-template name="identity
"/>
   </xsl:if
>
 </xsl:template>     

 <xsl:template match="beast">
   <xsl:if test="not(@class = $pClassToDel)
">
      <xsl:call-template name="identity
"/>
   </xsl:if
>
 </xsl:template>     

 <xsl:template match="breed">
   <xsl:if test

      "
not(key('kBeastByType', @type)/@class 
          = 
           $pClassToDel)">

      <xsl:call-template name="identity"/>
   </xsl:if>

 </xsl:template>     

 <xsl:template match="pet">
   <xsl:if test

      "
not(key('kBeastByType', 
                key('kBreedByName', @breed)/@type 
                )/@class 
          = 
            $pClassToDel)">

      <xsl:call-template name="identity"/>
   </xsl:if
>
 </xsl:template>
</xsl:stylesheet>

 

        Cascade Deletes (XSLT 2.0), 40 lines

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>     

 <xsl:param name="pClassToDel"select="'Aves'"/>     

 <xsl:key name="kDeleted" match="beast" use=
       "string(@class = $pClassToDel)
"/>      
 

 <xsl:key name="kDeleted" match="class"
    
use="string(@name=$pClassToDel)"/>       

<xsl:key name="kDeleted" match="breed" use=
  "string(key('kBeastByType',@type)/@class
          =
          $pClassToDel)
"/>

 <xsl:key name="kDeleted" match="pet" use=
  "
string(key('kBeastByType',
               key('kBreedByName',@breed)/@type
              )
               
/@class
              =
              $pClassToDel)"

/> 

 <xsl:key name="kBeastByType" match="beast"
         
use="@type"/>     

 <xsl:key name="kBreedByName" match="breed
          use="@name"/>      

 <xsl:template match="node()|@*" name="identity">
   <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>     

<xsl:template match=
       
"*[. intersect key('kDeleted', 'true')]"/>
</xsl:stylesheet>

 

Both transformations produce the wanted results:

<animals>
  <classes>
    <class name="Mammalia"/>
  </classes>

  <beasties>
    <beast type="cat" class="Mammalia"/>
    <beast type="dog" class="Mammalia"/>
  </beasties>

  <breeds>
    <breed name="collie" type="dog"/>
    <breed name="beagle" type="dog"/>
    <breed name="persian" type="cat"/>
    <breed name="siamese" type="cat"/>
  </breeds>

  <pets>
    <pet name="rover" breed="collie"/>
    <pet name="fluffy" breed="persian"/>
  </pets>
</animals>

 

The solutions above are the first that came into mind -- probably the XSLT community will come with new, even better ones.