XQuery by Example

Friday 16 July 2010

testing migration from 0.9-ml to 1.0-ml

Chances are that ever so often you'll come across some legacy XQuery code in 0.9-ml that should really be upgraded to 1.0-ml for the obvious reasons, and everyone seams to be ever so scared of touching the code, because "it works".

So ultimately your duty is to make sure the code operates exactly as it did before. Ideally you should run each function in each module on the legacy code, and on the refactured code, and then compare the output. At the very least you should run tests on entry point functions in your modules.

Now that's easier said than done. If you have followed a TDD methodology from the outset, great, all you have to do is to run the tests against the refactured code, and bob is your oncle, voila . . . ready to go. But as we all know, often the case is that no tests are available for the legacy code, or tests are targeted at abstracted java DAO interfaces, and really don't dive deep enough into the fragments returned by your xquery.

Some of the queries may even return large xml fragments and making it difficult to compare the output visually, and very expensive to use fn:deep-compare().

If you come across one or more of the above scenarios, then here's how I have dealt with it: First I create a sub-folder containing a clone of the original modules in ML, containing the refactured code. Then you need to import both modules into a single test module.

Note that both sets of modules will need different namespaces even if just temporarily. Only then the test module can import both modules also using different namespace prefixes.

You can then run the same functions simultaneously on both the legacy code, and the refactured code. Assign a variable for the outcome of each and wrap the function call in a xdmp:quote() and then in a xdmp:md5(). Now you can do a simple string comparison of the md5 of both results, and they should match. If they don't your test has failed.

Here's an example:


xquery version "1.0-ml";

import module namespace old = "ns:old-module" at "/folder/module.xqy";
import module namespace new = "ns:new-module" at "/folder/1.0-ml/module.xqy";

declare variable $item1 := old:function1("param 1");
declare variable $item2 := new:function1("param 1");

xdmp:md5(xdmp:quote($item1)) eq xdmp:md5(xdmp:quote($item2))
=> true

You can make a function of it and bang it in a test module that can be re-used in the rest of your tests . . .


xquery version "1.0-ml";

module namespace t = "ns:migration-test-suite";

declare function t:assertEquals($item1 as item()*, $item2 as item()) as xs:string
{if(xdmp:md5(xdmp:quote($item1)) eq xdmp:md5(xdmp:quote($item2))) then "passed" else "failed"};

Tuesday 13 July 2010

MarkLogic cts:query serialization

If you work with MarkLogic you've surely come across cts functions (Built-In: Search), cts:query, and its closest counterparts.

One useful feature available in recent releases is the serialization of cts queries, which allows for the conversion of cts:query types to xml fragments, and the other way round.

If you take a look at the code below, you'll notice the local:mySearch() function which takes a cts:query as it's only parameter. You'll also note cts:search() is used inside this function and references the $query parameter abstracting the query executed by cts:search(). Most interestingly you'll also find a conditional statement invoking an xpath on the $query variable. "But the parameter is strongly typed to cts:query!" . . . you ask. Well that's because it is possible to serialize what would otherwise be "cts:element-value-query(xs:QName("filename"), "myFile", ("lang=en"), 1)" to xml, just by embedding the query in a parent element.

let $query as element(query) := element query {$query}

It is equally possible to do the reverse operation, simply by invoking an xpath returning cts elements in your fragment, wrapped in a cts:query() function.

cts:query($query/*)

This feature also allows for cts queries to be built dynamically as xml fragments rather than concatenated strings.


xquery version '1.0-ml';

declare function local:mySearch($query as cts:query) as element(response)
{
 let $query as element(query) := element query {$query}
 let $searchResults as element(record)* := cts:search(/record, cts:query($query/*))
 
 return 
 element response 
 {
  $query,
  element results {
   if($query/cts:element-value-query/cts:element/text() eq "filename") 
   then $searchResults//element1
   else $searchResults//element2
  }
 }
};

declare function local:searchByFilename($filename as xs:string) as element(response)
{local:mySearch(cts:element-value-query(xs:QName("filename"), $filename))};

declare function local:searchByVolume($volume as xs:string) as element(response)
{local:mySearch(cts:element-value-query(xs:QName("volume"), $volume))};


local:searchByFilename("myFile")

returns


<response>
    <query>
        <cts:element-value-query xmlns:cts="http://marklogic.com/cts">
            <cts:element>filename</cts:element>
            <cts:text xml:lang="en">myfile</cts:text>
        </cts:element-value-query>
    </query>
    <query>cts:element-value-query(xs:QName("filename"), "myfile", ("lang=en"), 1)</query>
    <results>
        <element1>some content</element1>
        <element1>some other content</element1>
    </results>
</response>

Friday 13 February 2009

Basic Custom functions

The heart and soul of XQuery relies on an intercut network of functions distributed throughout library modules, offering reusable functionality, rivaling most procedural languages.

If you're familiar with XSLT 2, you probably had the chance to use functions in a very similar format to the way functions are implemented in XQuery.

Ultimately the main goal of functions is to group functionality in smaller portions, facilitating reusability of code. When you call a function you can pass values known as parameters to its local scope. Functions can either return something, or can return the equivalent of void in other languages in the format of an empty sequence.


declare function local:myFunction() as xs:string
{"myFunction"};

local:myfunction()

The above code will return the string "myFunction"

The declare function bit is self explanetory, local:myFunction is the namespace and function name respectively. The local namespace is a reserved namespace prefix built into XQuery, and traditionally used for locally declared functions (I'll probably write another blog explaining modules and external custom functions).

The (), represents an empty sequence of parameters. In this example no parameters are required by the function signature.

"as xs:string" is the returned value type, in this case a simple string. You don't need to strongly type the return type of a function, but in my opinion, it keeps the code clean and forces run time cast exceptions, if the values returned by a function aren't what you're expecting, making debugging much easier.

Ok, lets look at a more complex example:

XML Input


<?xml version="1.0" encoding="utf-8"?>
<geo:location xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
    <geo:lat>51.667795</geo:lat>
    <geo:long>0.040683</geo:long>
</geo:location>

XQuery


xquery version "1.0";

declare namespace geo="http://www.w3.org/2003/01/geo/wgs84_pos#";
declare namespace georss="http://www.georss.org/georss";

declare function local:get-georss($_latitude as xs:float, $_longitude as xs:float) as element(georss:point)
{
    element georss:point {fn:concat($_latitude, " ", $_longitude)}
};


<rss version="2.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:georss="http://www.georss.org/georss">
    <channel>
        <title>Some RSS feed with geodata</title>
        <link>http://www.foo.com/locations</link>
        <description>Geo RSS location info</description>
        <pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
        <lastBuildDate>Sat, 7 Feb 2009 22:24:09 -0800</lastBuildDate>
        <item>
        <title>High Beach</title>
        <link>http://www.foo.com/location?lat={fn:data(/geo:location/geo:lat)}&long={fn:data(/geo:location/geo:long)}</link>
        <description>High Beach, Epping Forest, London, UK</description>
        <pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
        {
            (
                local:get-georss(
                                    xs:float(/geo:location/geo:lat), 
                                    xs:float(/geo:location/geo:long)
                                ),
                /geo:location/geo:lat,
                /geo:location/geo:long
            )
        }
        </item>
    </channel>
</rss>

returns


<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:georss="http://www.georss.org/georss"
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
    version="2.0">
    <channel>
        <title>Some RSS feed with geodata</title>
        <link>http://www.foo.com/locations</link>
        <description>Geo RSS location info</description>
        <pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
        <lastBuildDate>Sat, 7 Feb 2009 22:24:09 -0800</lastBuildDate>
        <item>
            <title>High Beach</title>
            <link>http://www.foo.com/location?lat=51.667795&long=0.040683</link>
            <description>High Beach, Epping Forest, London, UK</description>
            <pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
            <georss:point>51.667793 0.040683</georss:point>
            <geo:lat>51.667795</geo:lat>
            <geo:long>0.040683</geo:long>
        </item>
    </channel>
</rss>

This time the function local:get-georss() accepts two parameters, the first parameter an xs:float representing the latitude, and the second, again an xs:float representing the longitude. The function local:get-georss() must return a georss:point, element.
The body of the function first creates the element georss:point, then concatenates the latitude, and longitude values as it's text node separated by an empty space character.

To call local:get-georss(), you'll notice the values passed have been converted to the data types required for each parameter, in this case xs:float. When the query invokes the following function call to local:get-georss(), it returns <georss:point>51.667793 0.040683</georss:point>


local:get-georss(
                 xs:float(/geo:location/geo:lat), 
                 xs:float(/geo:location/geo:long)
                )

Friday 6 February 2009

FLOWR - A Beauty Of Nature

If you ever plan to write XQuery the FLOWR expression will in a way become your bread and butter.

What does FLOW stand for :

F - for
L - let
O - order by
W - where
R - return

You could say it is just a conventional procedural language for loop with a couple of SQL like behavior features, in "order by" and "where". Here's an example


for $child in ("Merlyn", "Kai", "Eden")
return <child name="{$child}"/>

the above FLOWR expression returns


<child name="Merlyn"/>
<child name="Kai"/>
<child name="Eden"/>

Let's now try a more complex example using some of the other features of FLOWR


let $children as xs:string* := ("Merlyn", "Kai", "Eden")
return 
element children
{
    for $child as xs:string at $position in $children
    let $older-sibling as xs:string? := $children[($position - 1)]
    where fn:contains($child,"e")
    order by $position descending
    return element child 
           {
               (
                   attribute name {$child},
                   if ($older-sibling)
                   then attribute older-sibling {$older-sibling}
                   else ()
               )
           }
}

This FLOWR expression returns :


<children>
   <child name="Eden" older-sibling="Kai"/>
   <child name="Merlyn"/>
</children>

A number of interesting things are happening in this expression. The goal is to get a sequence of child elements in reverse order, containing the attribute older-sibling representing the child's oldest sibling. Oh to throw in some random complexity, we
only want children who contain the character "e".

First the variable $children is declared, bound to the xs:string data type. You'll notice the * character appended, this represents one or more occurrences of that type, in other words a sequence of strings, equivalent to an array or vector of strings in other languages. $children will hold a sequence of my children names in descending order as Merlyn is my oldest son, and Eden my youngest daughter.

The next step is to create the children element as the root element, holding the result sequence returned by the FLOWR expression. Inside it resides our beloved FLOWR.

The FLOWR expression starts by declaring the $child variable bound to a single string in each iteration of the loop.

Then you may notice the "at $position" variable declaration, this variable holds the item position in the loop's input sequence. If you're used to XSLT, you'll find this to be the equivalent to context()/position() in an xsl:template match.

Lastly, you'll notice that in this example we're using the $children variable instead of the hard coded sequence expression. The $children variable could be a reference to some xml document, or even the output of a module's function (we'll cover that in a later blog), the result of an XPath expression, etc.

Now that the "F" in FLOWR as been initiated, it is possible to start filtering and modeling it's output.

We'll first use the "L" in FLOWR, by declaring the variable $older-sibling, which will hold a reference to the older sibling of each child, using the $position value to select the equivalent to it's preceding-sibling axis. Only in this case we're dealing with strings, so can't use axis, if the item() sequence was a node rather than an atomic type, then we could use axis.

Next the "W" in FLOWR. This is where FLOWR allows to express filters in the input sequence, and where our FLOWR example filters children that contain the "e" character, using the fn:contains() function. To be fair I usually prefer to use predicates in the input sequence, . . . $children[fn:contains(.,"e")] . . . same thing.

I'm not sure if the "O" in FLOWR was swapped round for aesthetic reasons, but the fact is that if you define the "order by" before the "where" you'll get a - syntax error in #...scending where fn:contains#: expected "return", found "null"
So order by $position descending, is self explanatory, gime stuff in reverse order . . .

And now to the meat. The big "R", return me stuff. Here that's where we format the output of the FLOWR expression. In this case we are building a sequence of children elements, containing a sequence of attributes, in this case the name, and older-sibling attributes.

You may notice in the output that Merlyn hasn't got an older-sibling attribute. That's thanks to the conditional statement in the attribute sequence that checks weather $older-sibling exists.
Because the $children[($position - 1)] expression doesn't return anything (...or returns an empty sequence = false()), and the "?" character in the type declaration of $older-sibling is allowing the possibility for an empty sequence to be returned, then the older-sibling attribute won't be present in the child element for "Merlyn".
Of course Merlyn is my oldest son, and hasn't got any older siblings . . . not that I'm aware anyway.

So there, what a beautiful FLOWR can do for you in XQuery. Enjoy . . .

Wednesday 4 February 2009

Hello World

To conform to tradition, when you're learning a new language, the first lesson has to be the Hello World application.

So here it is:

"hello world"

If you just add the above line to a text file, usually with the *.xqy, *.xq or *.xquery extension, then initiate a transformation using your favorite XQuery processor, you should see the string "hello world" in the output.

What processor to use
well I would go for something like Saxon, if you're just querying static XML files on your file system, or passing a standalone XML file to the processor at run time.

If you really want to see XQuery at it's best I'd recommend using an XML database as the repository for your XML content. or that you can try MarkLogic or eXist amongst others.

Most of the examples you'll see in future blogs are based on MarkLogic, as it is the processor I have the most experience with, but given the fact that ML has just gone closer to full compliance with W3C 1.0 XQuery standards, chances are that the examples you'll come across are as applicable in any other XQuery processor.

Now back to our code . . . What does "hello world" really mean?
Well the value between the quotes represents a item() of type xs:string. XQuery uses the same Data model as XPath 2.0, so if you're familiar with XSLT and XPath 1.0, then you'll find it really easy to understand the concepts of XQuery.

For a better understanding of data types in XQuery have a look at the XQuery 1.0 and XPath 2.0 Data Model.

Just to keep it simple I'll give you an example of a query, that interrogates an XML file, passed as the input xml to saxon.

The input xml:
<groups><group id="1"/><group id="2"/></groups>

the xquery file : my-lovely-query.xqy

/groups/group

This query returns:
<group id="1"/>
<group id="2"/>

The above represents a sequence of items of type element(group). This means the XPath expression "/groups/group" returned all "group" child element nodes of the root element "groups".

I think this illustrates nicely the very basics of XQuery. I'll get into more detail as new blogs are added.