Virtual Mako (VMako): Aggregating Mako Servers
A VMako is a single Mako server that acts as a front-end to multiple remote Mako instances. The single VMako instance maps its (virtual) collections onto a user-defined set of remote Mako collections. This reduces the complexity for the client application, presenting a single virtualized interface to a number of distinct, federated Makos. The primary purpose for a Virtual Mako is to enable distributed query execution. In a Virtual Mako, requests are broken down into sub-queries and sent to appropriate remote Makos. Responses are then aggregated at the VMako and returned to the client. Additionally, collections added to a VMako can be distributed across the remote Makos, thus reducing the load carried by any single instance. The default "ingestor" class uses a round-robin algorithm for distributing load across the federated Makos.
As in the other QuickStarts, before instantiating a VMako, you will want to edit the configuration
file. There is a localhost-vmako-config.xml file in the /conf directory.
It contains the configuration to stand up a basic VMako instance. In the
<MobiusNetworkServiceDescriptor ...> tag, edit the "hostname" attribute as
appropriate. Also set the "serviceId" in the <serviceIdentifier ...> tag,
immediately below it.
A VMako aggregates other Mako services. Because of this, it needs no configuration for MySQL. Instead it can either be instantiated alone and then you can use the Mako Viewer or command line utilities to manually aggregate other Makos "within" it. Or you instantiate the VMako so that it reads a cache file and automatically "ingests" collections from already running Makos.
In the config file, the resource element whose "name" attribute is set to "vmakoConfig"
contains a sub-element called <vmako-storage ...> which takes only a
"file" attribute. In the distribution's localhost-vmako-config.xml
file, that attribute is set to "vmako-storage.xml".
Let's start the VMako
./startMako.bat ../conf/localhost-vmako-config.xml
If you have left the <vmako-storage ...> element with its default
"file" value, then any collections you add (via the Mako Viewer or the
command line) will be recorded in the vmako-storage.xml file. The file will be created
in the /scripts directory by default (since that is where the
startMako script is). But you can also specify a path when
setting the value for "file". The collections you add to your VMako during
the session will be recorded and, the next time you start the VMako, the file will
be read in and the collections ingested.
In the /conf directory, there is an example VMako cache file
called vmako-cache-example.xml. To eliminate the need
to manually add collections, we can load the remote Makos and
collections specified in this file when we start our VMako.
You'll notice that the vmako-cache-example.xml file has two
child elements below the root <vmako> element. The first child
is <xml-data-services>. It contains tags that reference the
Makos that you desire to aggregate.
The second child (<virtual-collections>) contains two sub-elements. The
first (<virtual-collection ...>) specifies the name of a virtual
collection that you will use to run queries against. Queries run against that
virtual collection will be executed on the aggregated collections specified in
the other sub-element, <xml-collection-handle ...>. Each of these collection handle elements
takes the service ID of a Mako specified as a "xml-data-service" and the name
of the collection on the host where it resides (specified by the "collectionName" attribute).
The value of the "collectionName" attribute must specify a real collection on the host described by the "serviceId". Collection names do not have to be identical to the name of the virtual collection. In fact, it will usually be the case that the names are different. For example, you might set up a VMako to aggregate data from a number of differently purposed Makos: You want simultaneously to query references to a specific allele from a Makos referencing data from cancer genetics, proteomics, and SNP studies. You craft your single XPath query so that it returns the appropriate set of attributes from the multiple, separate, virtually aggregated stores.
Let's set up our example. In the localhost-vmako-config.xml file, change the
<vmako-storage ...> element's "file" attribute so that it points
at your cache. For example, ../conf/vmako-cache-example.xml.
Edit the vmako-cache-example.xml file and point the "serviceId" attribute within the
<xml-data-service ...> at a remote Mako service. In the
<xml-collection-handle ...> tag, repeat the service id and
fill out the "collectionName" attribute with the XPath path to a
collection on that server that you would like to
aggregate under your VMako. In the <virtual-collection ...>
tag, fill out the "name" attribute with a virtual collection
name of your choosing (below we have used "TestVirtualCollection").
Restart the VMako, as above. You will see output similar to the following:
INFO - Nov 5, 2004 1:02:27 PM -- Adding service MAKO://dc01 INFO - Nov 5, 2004 1:02:27 PM -- Adding service MAKO://dc02 INFO - Nov 5, 2004 1:02:28 PM -- Adding the virtual collection TestVirtualCollection INFO - Nov 5, 2004 1:02:28 PM -- Server started on localhost at Fri Nov 05 13:02:28 EST 2004 INFO - Nov 5, 2004 1:02:28 PM -- Setting Service Identifier to localhost INFO - Nov 5, 2004 1:02:28 PM -- Listening using the protocol TCP on the host 0.0.0.0, port 3940
Now, any XPath queries performed via command line or viewer against the
TestVirtualCollection collection will, in fact, be run against the collections on
the remote nodes you configured in the vmako-cache-example.xml file (in
this case above, dc01 and dc02).