Exalt Library Reference Documentation

Exalt is a C++ library for compression of XML documents. The library is an academic project, and is intended to serve as a platform for experiments rather than as a basis for industrial-strength applications. The main focus of the implementation was not on speed and memory sparing issues. The goal was to create a framework for XML data compression, which would be easy to use and to extend. Please keep this in mind.

Exalt uses Expat, the excellent XML parser written by James Clark.

Exalt is free software, licensed under the GNU General Public License, Version 2. However, some parts of the sources (the arithmetic coding routines, to be exact) originate from Alistair Moffat (http://www.cs.mu.oz.au/~alistair/), who made them only available for academic purposes.

The library has been written by Vojtech Toman. For latest news and downloads, please visit the Exalt project home page.

Building and Installing
Using the Command-line application
Using the Library

Building and Installing

Exalt uses the GNU autotools. In order to build and install, you'll need to run the configure shell script to configure the Makefiles and headers for your system.

If you are happy with all the defaults that configure picks for you, and you have permission on your system to install into /usr/local, you can install Exalt with this sequence of commands:

$ ./configure
$ make
$ make install

There are some options that you can provide to the configure script, but the most important one is the --prefix option. You can find out all the options available by running configure with just the --help option.

By default, the configure script sets things up so that the library gets installed in /usr/local/lib and the associated header files in /usr/local/include. But if you were to give the option, --prefix=/home/me/mystuff, then the library and headers would get installed in /home/me/mystuff/lib and /home/me/mystuff/include respectively.

Since the library is loaded dynamically, it is important to ensure that the LD_LIBRARY_PATH environment variable contains the entry /home/me/mystuff/lib.

Using the Command-line application

The distribution comes with a simple and easy to use command-line application that demonstrates the functionality of the library. It is named exalt and allows you to compress and decompress XML files in a convenient way.

After installation, you can test the application by typing:

$ exalt -h

This command executes the application and display the usage information. (If nothing happened, please check your PATH environment variable.)

The arguments of exalt are file names and options in any order. The possible options are:

-s suf (or --suffix suf) - Use the suffix .suf on compressed files. Default suffix is .e
-d (or --decompress) - Decompress file(s)
-f (or --force) - Overwrite files, do not stop on errors
-c (or --stdout) - Write on standard input
-a (or --adaptive) - Use the adaptive model for compression
-x (or --erase) - Erase source files
-e enc (or --encoding enc) - Set the decompressed data encoding to enc
-l (or --list-encodings) - List the recognized (not necessarily supported !) encodings
-v (or --verbose) - Be verbose
-m (or --print-models) - Display the element models. This option makes sense only if the adaptive model is turned on. (Beware: the models may be huge!)
-g (or --print-grammar) - Display the generated grammar (Beware: the grammar may be huge!)
-V (or --version) - Display the version number
-L (or --license) - Display the version number and the software license
-h (or --help) - Display the usage information

The default action is to compress. If no file names are given, or if a file name is '-', exalt compresses or decompresses from standard input to standard output.

For example, if you want to compress file.xml in the verbose mode using the adaptive model, and if you wish also to display the generated grammar, use the following command:

$ exalt -a file.xml -v -g

If everything went all right, a file named file.xml.e will be created.

It is also easy to use exalt as an filter:

$ cat file.xml.e | exalt -d -c | more

Using the Library

In the following text you will find out how to use the functionality of the library in a C++ program. Because of the simple and clean interface of the library, it is a quite straightforward process.

Sample Application

In this section, a sample application using the Exalt library is presented. It takes a name of a XML file on the command line, and compresses that file on standard output (be sure to redirect standard output to some file or to /dev/null to avoid terminal confusion). The main functionality of the Exalt library is encapsulated in the ExaltCodec class, so the only thing you have to do is to create an instance of this class and to call an appropriate method of it. The methods used most often are encode() and decode(). In their basic variants, they both take two arguments: the name of the input file and the name of the output file. If the name of the input file is NULL, standard input is used. Similarly, if the name of the output file is NULL, the standard output is used.

When (de)compressing, a variety of errors can occur (the input data is not well-formed XML, some of the files doesn't exist, etc.). To report these errors, Exalt uses the mechanism of C++ exceptions. Each exception is derived from ExaltException, so handling this exception will handle all the other exceptions. For more detailed description of the exceptions used by Exalt, please refer to the API documentation.

So let's have a look at the example code:

#include <exaltcodec.h>

int main(int argc, char **argv)
{
  ExaltCodec codec;

  if (argc < 2)
    return 1;
      
  try
    {
      codec.encode(argv[1], 0);
    }

  catch (ExaltException)
    {
      cerr << "Failed to compress " << argv[1];
      return 1;
    }

  return 0;
}

The header file exaltcodec.h is the only header you have to include to use the library.

Save the example source as example.cpp. To compile, you have to tell the compiler where to find the library and the headers. If Exalt has been installed in /home/me/mystuff, then you should pass following options to the compiler: -I/home/me/mystuff/include -L/home/me/mystuff/lib. Next to that, the linker should be instructed to link against the Exalt library. This can be achieved by the option -lexalt.

Note: If Exalt has been installed in the standard location (the default), you probably have not to specify the options mentioned above (except -lexalt, of course).

$ g++ -o example example.cpp -I/home/me/mystuff/include -L/home/me/mystuff/lib -lexalt

If everything went all right, a sample application has been built. So let's test it on some XML data:

$ ./example sample.xml > tmp

If the file sample.xml exists in the current directory, and the XML data is well-formed, the compressed data are be written to the file tmp. If you compare the sizes of sample.xml and tmp, the latter file should be smaller. :-)

Using the PUSH Interface

The Exalt library offers a functionality to work in two main modes: in the PULL mode and in the PUSH mode.

The PULL interface means that the input data is read from the input stream by the Exalt codec. This is useful mainly in the occasions when you are (de)compressing some files. (The sample example presented in the previous section demonstrates the use of the PUSH interface.)

The PUSH interface means that the application "feeds" the Exalt codec with the data. This mode can be used for compression (not for decompression) of the data that is dynamically generated. In order to use the PUSH interface, you have to use these two methods of the ExaltCodec class: initializePushCoder() and encodePush().

The initializePushCoder() method MUST be called before any calls to encodePush() and initializes the coder in the PUSH mode. In its basic variant, the method requires a name of an output file as a parameter.

The encodePush() method encodes given chunk of XML data. The method has three parameters: a pointer to the data, the length of the data, and a flag indicating the last chunk of data.

If you attempt to use the PUSH coder in the PULL mode (or vice versa), the ExaltCoderIsPushException (or ExaltCoderIsPullException) is raised.

Below you can see a snippet of code that demostrates the PUSH functionality of the library:

...

ExaltCodec codec;
int length;
bool finished = false;

codec.initializePushCoder(fileName);

while (!finished)
  {
    data = generateData(&length, &finished);
    codec.encodePush(data, length, finished);
  }

...

Using the SAX Interface

Exalt can act (with some limitations) as an ordinary SAX parser on the compressed XML data. It can read the stream of compressed data and emit SAX events to the application. The SAX interface is similar to that of the Expat XML parser.

To use the SAX event facilities, you have to inherit the SAXReceptor class and reimplement appropriate event handling methods (for details, please refer to the API documentation). The second step is to use a special variant of the decode() method of the ExaltCodec class, which takes a pointer to an instance of SAXReceptor instead of the name of the output file. The optional parameter of this method is a generic pointer to the user data structure. This pointer is passed to the handlers of the receptor.

The example below demonstrates how to handle the startElement SAX event:

#include <exaltcodec.h>

class MySAXReceptor : public SAXReceptor
{
public:
  void startElement(void *userData, const XmlChar *name, const XmlChar **attr)
  {
    cout << "Element " << name << endl;

    if (attr)
      for (int i = 0; attr[i]; i += 2)
        cout << "Attribute " << attr[i] << " has value " << attr[i+1] << endl;
  }
};


int main(int argc, char **argv)
{
  ExaltCodec codec;
  MySAXReceptor receptor;

  if (argc < 2)
    return 1;
      
  try
    {
      codec.decode(argv[1], &receptor);
    }

  catch (ExaltException)
    {
      cerr << "Failed to decompress " << argv[1];
      return 1;
    }

  return 0;
}

Changing the Default Options

There are various options that affect the behaviour of the library. In most occasions, there is no need to change the default settings, because the library works quite fine without any user/programmer assistance.

The library uses a static class ExaltOptions for setting and reading options. This class contains methods setOption() and getOption() for setting the option values, or for reading the option values respectively. The possible options and their values are listed below:

ExaltOptions::Verbose - Determines whether the library should be verbose. In the verbose mode, some textual information is displayed on the standard error output. Possible values:
- ExaltOptions::Yes - Be verbose
- ExaltOptions::No - Don't be verbose (default)
ExaltOptions::Model - Determines what model is used for the compression
- ExaltOptions::SimpleModel - Use the simple model for compression
- ExaltOptions::AdaptiveModel - Use the adaptive model for compression (default)
ExaltOptions::PrintGrammar - Determines whether to display the grammar generated from the input data. Beware: The grammar may be huge!
- ExaltOptions::Yes - Display the generated grammar
- ExaltOptions::No - Don't display the generated grammar (default)
ExaltOptions::PrintModels - Determines whether to display the element models generated from the input data. The models are displayed only when using the adaptive model. Beware: The models may be huge!
- ExaltOptions::Yes - Display the element models
- ExaltOptions::No - Don't display the element models (default)
ExaltOptions::Encoding - Determines the encoding of the decompressed data
- The MIB of the encoding (see the API documentation for details). The default encoding is either Encodings::UTF_8 or Encodings::UTF_8 (depends on the configuration of the Expat parser)

The options are set by the static method setOption() of the class ExaltOptions. To turn the verbose mode on, for example, you should call ExaltOptions::setOption(ExaltOptions::Verbose, ExaltOptions::Yes)

You can also read the values of the options with the static method getOption(). The call to ExaltOptions::setOption(ExaltOptions::Verbose) will return the current value of the "verbose" option.

Input and Output Devices

In the preceding text, the work with files was only discussed. The data was read from some file and written to another. However, the library allows you to use any "device" you desire, such as the network, some database, etc. In order to make this possible, the library works with so called IO devices. From the library's point of view, file is nothing but a special type (and the most common one) of IO devices.

There exists an abstract class IODevice that defines the functionality (see the API documentation) that every device has to implement. Using this class and the C++ inheritance mechanism, it is simple to create new devices.

How to use the new device? It is quite straightforward, since the encode(), decode() and initializePushCoder() methods of the ExaltCodec class exist also in variants that accept pointers to the input devices as their parameters. Below you can see an example:

...

codec.encode(inputDevice, outputDevice);

...

Exalt Library Reference Documentation

Table of Contents