Friday, 13 February 2009

Basic Custom functions

The heart and soul of XQuery relies on an intercut network of functions distributed throughout library modules, offering reusable functionality, rivaling most procedural languages.

If you're familiar with XSLT 2, you probably had the chance to use functions in a very similar format to the way functions are implemented in XQuery.

Ultimately the main goal of functions is to group functionality in smaller portions, facilitating reusability of code. When you call a function you can pass values known as parameters to its local scope. Functions can either return something, or can return the equivalent of void in other languages in the format of an empty sequence.


declare function local:myFunction() as xs:string
{"myFunction"};

local:myfunction()


The above code will return the string "myFunction"

The declare function bit is self explanetory, local:myFunction is the namespace and function name respectively. The local namespace is a reserved namespace prefix built into XQuery, and traditionally used for locally declared functions (I'll probably write another blog explaining modules and external custom functions).

The (), represents an empty sequence of parameters. In this example no parameters are required by the function signature.

"as xs:string" is the returned value type, in this case a simple string. You don't need to strongly type the return type of a function, but in my opinion, it keeps the code clean and forces run time cast exceptions, if the values returned by a function aren't what you're expecting, making debugging much easier.

Ok, lets look at a more complex example:

XML Input

<?xml version="1.0" encoding="utf-8"?>
<geo:location xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<geo:lat>51.667795</geo:lat>
<geo:long>0.040683</geo:long>
</geo:location>


XQuery

xquery version "1.0";

declare namespace geo="http://www.w3.org/2003/01/geo/wgs84_pos#";
declare namespace georss="http://www.georss.org/georss";

declare function local:get-georss($_latitude as xs:float, $_longitude as xs:float) as element(georss:point)
{
element georss:point {fn:concat($_latitude, " ", $_longitude)}
};


<rss version="2.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:georss="http://www.georss.org/georss">
<channel>
<title>Some RSS feed with geodata</title>
<link>http://www.foo.com/locations</link>
<description>Geo RSS location info</description>
<pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
<lastBuildDate>Sat, 7 Feb 2009 22:24:09 -0800</lastBuildDate>
<item>
<title>High Beach</title>
<link>http://www.foo.com/location?lat={fn:data(/geo:location/geo:lat)}&long={fn:data(/geo:location/geo:long)}</link>
<description>High Beach, Epping Forest, London, UK</description>
<pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
{
(
local:get-georss(
xs:float(/geo:location/geo:lat),
xs:float(/geo:location/geo:long)
),
/geo:location/geo:lat,
/geo:location/geo:long
)
}
</item>
</channel>
</rss>


returns

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:georss="http://www.georss.org/georss"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
version="2.0">
<channel>
<title>Some RSS feed with geodata</title>
<link>http://www.foo.com/locations</link>
<description>Geo RSS location info</description>
<pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
<lastBuildDate>Sat, 7 Feb 2009 22:24:09 -0800</lastBuildDate>
<item>
<title>High Beach</title>
<link>http://www.foo.com/location?lat=51.667795&long=0.040683</link>
<description>High Beach, Epping Forest, London, UK</description>
<pubDate>Sat, 7 Feb 2009 22:24:09 -0800</pubDate>
<georss:point>51.667793 0.040683</georss:point>
<geo:lat>51.667795</geo:lat>
<geo:long>0.040683</geo:long>
</item>
</channel>
</rss>


This time the function local:get-georss() accepts two parameters, the first parameter an xs:float representing the latitude, and the second, again an xs:float representing the longitude. The function local:get-georss() must return a georss:point, element.
The body of the function first creates the element georss:point, then concatenates the latitude, and longitude values as it's text node separated by an empty space character.

To call local:get-georss(), you'll notice the values passed have been converted to the data types required for each parameter, in this case xs:float. When the query invokes the following function call to local:get-georss(), it returns <georss:point>51.667793 0.040683</georss:point>


local:get-georss(
xs:float(/geo:location/geo:lat),
xs:float(/geo:location/geo:long)
)

Friday, 6 February 2009

FLOWR - A Beauty Of Nature

If you ever plan to write XQuery the FLOWR expression will in a way become your bread and butter.

What does FLOW stand for :

F - for
L - let
O - order by
W - where
R - return

You could say it is just a conventional procedural language for loop with a couple of SQL like behavior features, in "order by" and "where". Here's an example

for $child in ("Merlyn", "Kai", "Eden")
return <child name="{$child}"/>

the above FLOWR expression returns

<child name="Merlyn"/>
<child name="Kai"/>
<child name="Eden"/>

Let's now try a more complex example using some of the other features of FLOWR

let $children as xs:string* := ("Merlyn", "Kai", "Eden")
return
element children
{
for $child as xs:string at $position in $children
let $older-sibling as xs:string? := $children[($position - 1)]
where fn:contains($child,"e")
order by $position descending
return element child
{
(
attribute name {$child},
if ($older-sibling)
then attribute older-sibling {$older-sibling}
else ()
)
}
}


This FLOWR expression returns :

<children>
<child name="Eden" older-sibling="Kai"/>
<child name="Merlyn"/>
</children>


A number of interesting things are happening in this expression. The goal is to get a sequence of child elements in reverse order, containing the attribute older-sibling representing the child's oldest sibling. Oh to throw in some random complexity, we
only want children who contain the character "e".

First the variable $children is declared, bound to the xs:string data type. You'll notice the * character appended, this represents one or more occurrences of that type, in other words a sequence of strings, equivalent to an array or vector of strings in other languages. $children will hold a sequence of my children names in descending order as Merlyn is my oldest son, and Eden my youngest daughter.

The next step is to create the children element as the root element, holding the result sequence returned by the FLOWR expression. Inside it resides our beloved FLOWR.

The FLOWR expression starts by declaring the $child variable bound to a single string in each iteration of the loop.

Then you may notice the "at $position" variable declaration, this variable holds the item position in the loop's input sequence. If you're used to XSLT, you'll find this to be the equivalent to context()/position() in an xsl:template match.

Lastly, you'll notice that in this example we're using the $children variable instead of the hard coded sequence expression. The $children variable could be a reference to some xml document, or even the output of a module's function (we'll cover that in a later blog), the result of an XPath expression, etc.

Now that the "F" in FLOWR as been initiated, it is possible to start filtering and modeling it's output.

We'll first use the "L" in FLOWR, by declaring the variable $older-sibling, which will hold a reference to the older sibling of each child, using the $position value to select the equivalent to it's preceding-sibling axis. Only in this case we're dealing with strings, so can't use axis, if the item() sequence was a node rather than an atomic type, then we could use axis.

Next the "W" in FLOWR. This is where FLOWR allows to express filters in the input sequence, and where our FLOWR example filters children that contain the "e" character, using the fn:contains() function. To be fair I usually prefer to use predicates in the input sequence, . . . $children[fn:contains(.,"e")] . . . same thing.

I'm not sure if the "O" in FLOWR was swapped round for aesthetic reasons, but the fact is that if you define the "order by" before the "where" you'll get a - syntax error in #...scending where fn:contains#: expected "return", found "null"
So order by $position descending, is self explanatory, gime stuff in reverse order . . .

And now to the meat. The big "R", return me stuff. Here that's where we format the output of the FLOWR expression. In this case we are building a sequence of children elements, containing a sequence of attributes, in this case the name, and older-sibling attributes.

You may notice in the output that Merlyn hasn't got an older-sibling attribute. That's thanks to the conditional statement in the attribute sequence that checks weather $older-sibling exists.
Because the $children[($position - 1)] expression doesn't return anything (...or returns an empty sequence = false()), and the "?" character in the type declaration of $older-sibling is allowing the possibility for an empty sequence to be returned, then the older-sibling attribute won't be present in the child element for "Merlyn".
Of course Merlyn is my oldest son, and hasn't got any older siblings . . . not that I'm aware anyway.

So there, what a beautiful FLOWR can do for you in XQuery. Enjoy . . .

Wednesday, 4 February 2009

Hello World

To conform to tradition, when you're learning a new language, the first lesson has to be the Hello World application.

So here it is:

"hello world"


If you just add the above line to a text file, usually with the *.xqy, *.xq or *.xquery extension, then initiate a transformation using your favorite XQuery processor, you should see the string "hello world" in the output.

What processor to use
well I would go for something like Saxon, if you're just querying static XML files on your file system, or passing a standalone XML file to the processor at run time.

If you really want to see XQuery at it's best I'd recommend using an XML database as the repository for your XML content. or that you can try MarkLogic or eXist amongst others.

Most of the examples you'll see in future blogs are based on MarkLogic, as it is the processor I have the most experience with, but given the fact that ML has just gone closer to full compliance with W3C 1.0 XQuery standards, chances are that the examples you'll come across are as applicable in any other XQuery processor.

Now back to our code . . . What does "hello world" really mean?
Well the value between the quotes represents a item() of type xs:string. XQuery uses the same Data model as XPath 2.0, so if you're familiar with XSLT and XPath 1.0, then you'll find it really easy to understand the concepts of XQuery.

For a better understanding of data types in XQuery have a look at the XQuery 1.0 and XPath 2.0 Data Model.

Just to keep it simple I'll give you an example of a query, that interrogates an XML file, passed as the input xml to saxon.

The input xml:
<groups><group id="1"/><group id="2"/></groups>

the xquery file : my-lovely-query.xqy

/groups/group


This query returns:
<group id="1"/>
<group id="2"/>

The above represents a sequence of items of type element(group). This means the XPath expression "/groups/group" returned all "group" child element nodes of the root element "groups".

I think this illustrates nicely the very basics of XQuery. I'll get into more detail as new blogs are added.