ACAP Tutorial

I've tried to write this such that when you come to read the RFC it'll make more sense. But please bear in mind that this is not a canonical document, mistakes - both my misunderstanding, and deliberate simplifications - will undoubtedly exist.

Contents

Data structure and layout
What data can be stored, and how it is stored.
Protocol overview
What ACAP looks like.
Data Operations
How clients can access data

Data structure and storage

ACAP is, fundamentally, a protocol involved with storing, and accessing, data. Unlike most protocols, what data can actually be stored is left somewhat undefined. The best that can be said is that, because there's no method of fetching only a portion of some data, it's best suited to relatively small amounts of data. Of course, if the client application expects larger amounts, which will clog connections, then that's fine.

This chapter is intended to describe the layout with which ACAP stores data. Hopefully, after reading this, you'll have some idea about how to store your own data within ACAP, or at least how to understand how and why other people store their data the way they do.

Introduction and overview

At the top level, clients will deal with datasets. Dataset contain one or more entries, and these entries contain one or more attributes. The attributes actually contain all the data, so we'll start there and work upwards.

To begin with, let's pretend we're using ACAP to store our incredibly simplistic phone book. We want to have, basically, names and numbers. This should not actualy be used - it's a very restrictive design, and moreover, there's no proper RFC for it.

Attributes

An attribute is, essentially, a bit of data. It's named, primarily so we know what sort of data to expect, and also so we can extract it. The names are constructed out of namespaces, to avoid two semantically different attributes sharing the same name and thus being confusing.

In some ways, an attribute can be considered like a field in a record, or a property in OOP.

All the data is either a string - a bit of text or possibly binary data - or a list of strings. Which it is can easily enough be discovered by a client, but so far, all attributes are currently defined to either be a string or a list, and never either. I suggest you do the same.

In order to hold both the name, and the data, as well as other information like the data's size, an attribute actually consists of metadata, which in turn has both a name and a value.

The standard defines a handful of metadata items, and it's unlikely you'll come across more:

attribute
The name of the attribute. Always a string. I think of these as, and often refer to them as, the type of the attribute.
value
The actual data. Either a string, or a list of them, as (so far) detirmined by the attribute name.
size
The size of the data. If the data is a string, this will be an integer. If the data is a list, it will be a list of integers.
acl
The Access Control List, represented as a list of strings. Each string holds the username and a list of rights.
myrights
The list of rights that actually apply to "us", right now. It's a string, abnd pretty obviously has to be calculated on the fly.

For our purposes, we'll need a couple of sorts of attribute, each with their own name. We'll need a name of the person, and a number for them. To avoid clashing with anyone else's "name" and "number", we'll call them "simplephonebook.name" and "simplephonebook.number".

The "simplephonebook." prefix indicates who defined these attributes, and which datasets they belong in. Vendor-specific prefixes start with "vendor.", followed by the tag, followed by another ".", such as "vendor.infotrope.", and both these sorts of prefix have to be registered formally, either in a standard or with IANA, respectively. So we're being a bit naughty by inventing our own.

Attribute names can also be prefixed with "site.", to indicate local, site specific meanings, or "user.", followed by the username, followed by ".", to indicate they only have useful meaning for that user.

Entries

For obvious reasons, attributes need to be grouped into a set. In our case, we don't just want a collection of names, and a colelction of numbers - we want a collection of names and numbers. Entries provide this gluing. Each entry is a set of attributes, which each sort of attribute (or name, or "attribute" metadata value) appearing only once.

When clients access the ACAP server, they'll search for matching entries, and pull over entire entries, or selected attributes from them. As such, an entry can be considered somewhat like a row in SQL, or an object in OOP. Clients can, unlike SQL, request specific metadata from the attributes, too.

Entries also have standard attributes, which hold information about the entry as a whole. These attributes have no namespace prefix in their names, because they're valid in any entry.

entry
This attribute holds the name of the entry as a whole. The value will be unique in each dataset, and a string. Moreover, the "acl" metadata has some special properties. We'll go into this more later.
modtime
This attribute holds the time of the last change - or last possible change - of the entry. If this changes, it's possible that some of the attributes have changed. Because of inheritance, none of the attributes may have actually changed at all.
subdataset
This is a list of datasets which belong in this entry. More on this one later. This attribute is not always present.

Our entries will consist of the "entry", which we'll store some opqaue unique identifier in, and our two attributes we defined earlier. We could save ourselves an attribute by storing the person's name in the "entry", but this can lead to problems. Firstly, "entry" is actually a bit restricted in what you can put there, and secondly if we spelt a name wrong in our client, or indeed the person simply changed their name, then we'd need to rename the entry. Renaming entries is generally a bad idea, and probably best avoided. Some server implementations don't even support the concept.

Datasets

Datasets are a collection of entries, and clients search through specific datasets to restrict their search scope to what interests them. Generally, they'll be only searching through user's personal datasets, and generally only through, say, the user's personal phonebook.

As such, we can consider them somewhat similar to SQL tables, or collections of objects. But our analogies are getting rather stretched now.

Each dataset contains at least one entry. This entry, which has an entry name of the empty string, contains information about the dataset as a whole, and allows us to change certain properties of the dataset, and certain default properties for the entries within it.

Because this special entry is of a special type, it has special attributes prefixed with "dataset.", to indicate they're of relevence only to datasets:

dataset.acl
This attribute holds an ACL, in the same format as the "acl" metadata for an attribute. It provides a default ACL for all attributes within the dataset.
dataset.acl.*
Attributes prefixed with "dataset.acl." provide default ACLs for specific attribute types, so "dataset.acl.entry" would provide a default ACL for all "entry" attributes, and "dataset.acl.simplephonebook.name" would provide default attributes for all "simplephonebook.name" attributes within the dataset.
dataset.inherit
This controls inheritance of the dataset. Feel free to ignore this one for a bit.

Clients generally can completely ignore dataset attributes, since everything will be handled transparently in any case. Getting the "myrights" metadata for attributes that interest you will be enough to tell you what rights you have, regardless of which ACL you've ended up using.

In fact, the actual ACL values are only of interest to clients acting as management agents, and not clients which are simply after the data. Because of a few subtle problems with ACLs and inheritance, ACLs are likely to contain implementation-specific identifiers anyway, meaning that a client cannot really know what the ACL means at all.

Dataset Classes

In order to help clients know what the data actually means - the semantics, if you prefer - dataset classes exist. In our example, our dataset class is "simplephonebook", and indicates to clients that entries stored whtin a dataset which has a class "simplephonebook" are "simplephonebook" entries, and they'll most likely have "simplephonebook." prefixed attributes.

Dataset classes are only there to help clients, though. In general, a server will not care whether the attribute you're trying to use actually should exist in a dataset of that class or not. They're there for semantic aid, not syntactic.

Dataset paths

Keen eyes will spot that there's no attribute in the magic "" entry of a dataset to indicate what class the dataset is, but that's okay, because clients can tell this from the path.

Moreover, the path gives us much more information than just the class, it tells us whose data it is, too. Paths consist of an optional heirarchy or namespace identifier, which tells us how the rest of the path is laid out, the dataset class, and the owner of the dataset, which is either the single part "site", meaning site-wide data, or "user", "group", or "host", followed by the identifier for the user, group, or host which owns the data. The order of class and owner depends on the heirarchy. There's currently only two, the default heirarchy, which organises the datasets by class, and the "byowner" heirarchy, which organises them first by owner.

Individual bits of the path are laid seperated by "/", and they always begin with a "/", as well. Traditionally, they always end with a "/", as well. Some examples in the standard don't always have the terminating "/", but you should alwayssupply it. Servers, on the other hand, have to try to cope with a missing one.

A dataset under another dataset in the path tree is known as a subdataset. If there is one, you'll find an entry with a subdataset attribute containing a "." somewhere in the list. In many dataset classes, the use of subdatasets has a specific semantic meaning, over and above merely keeping entries seperate from each other.

Some typical paths:

/simplephonebook/user/dwd/
My phonebook.
/byowner/user/dwd/simplephonebook/
Exactly the same dataset as above, but addressing it with the byowner heirarchy.
/simplephonebook/~/
The simplephonebook for the current user. The "~" path component is expanded to whoever's logged in, so if I were the currently logged in user for this connection, this would be yet another name for the same dataset.
/byowner/~/
A dataset containing entries indicating all the dataset classes the current user has available. There's no equivalent for this outside of /byowner/.
/bywoner/site/
A dataset containing entries indicating all the site dataset classes in use. Note that individual users may be using dataset classes over and above these, this just indicates what site settings are actually defined.
/test/user/dwd/comparator/
A dataset with the class of "test" (Which is non-standard, and therefore naughty) owned by me, which I've put under the master "test" dataset, which I've called "comparator". You might guess this might be for comparator test data, but I couldn't possibly comment.
/test/
A dataset which will contain entries indicating the next level of datasets available. This might contain entries called "site","user", "group" and/or "host". All entries here will have subdataset attributes, and not much else.

Clients are sometimes interested in a specific dataset, and other times interested in all the subdatasets, too. Therefore the protocol allows you to search either just a single dataset, or subdatasets in addition, down to a certain depth. Clients don't need permissions on the "subdataset entries" to gain access to the datasets themselves.

Dataset Inheritance

If "site" entries can exist as well as "user" and "group" entries, then it'd be useful for a client to get the whole lot at once. ACAP deals with this in a rather unique way - it allows one dataset to inherit from another.

In other words, consider the case where you work in acompany with an ACAP server, which stores all it's telephone details in our simplephonebook dataset class. The company defines various useful phone numbers for everyone in the company in the "site" dataset for the class, "/simplephonebook/site/". For people working in the accounts department, the "/simplephonebook/group/accounts/" has all the useful contact details.

For an accountant this would mean searching through "/simplephonebook/~/", "/simplephonebook/group/accounts/", and "/simplephonebook/site/" every time. That's not very optimal, and moreover, gives the client a lot more to think about - not least, the client has to know whether you're an accountant or not. Obviously this is something you don't really want to think about very often, if at all.

Instead, we simply set tha "dataset.inherit" attribute of the "" entry in "/simplephonebook/~/" to point to "/simplephonebook/group/accounts/", and then do similar tricks with the accounts phonebook to make it inherit from the "/simplephonebook/site/" dataset.

Now, searching through "/simplephonebook/~/" gets us not only our own personal phone book, but the group's, and the entire site's, without any thoguht at all. If we make changes to our own, it doesn'tinterfere with anyone else's, but equally, if a "site" phone number changes, then the change will be visible in our own. Unless we've overridden it, of course.

Similarly, for a typical configuration class, such as "option", the "site" dataset can provide default settings for the organisation or ISP, and the "group" can override these defaults to cater for certain classifications of users or custoemrs, and finally individuals can simply override them - if the ACL allows - to fine-tune them to their own needs.

Data inheritance is, without a shadow of a doubt, a fine feature. It's also viciously complicated to get to grips with.

Protocol Overview

On the network itself, ACA looks strikingly similar to IMAP, but with the syntax cleaned up quite a bit. If you've already written any kind of IMAP client, writing a simple ACAP client will probably take you all of ten minutes or so. THis is by design - ACAP was originally designed as IMSP, or Internet Mail Support Protocol - by much the same people who have been working on IMAP.

Essentially, every command has a tag. This is basically an identifier to the command instance. Data coming back from the server is either tagged - so you know which command it came from - or untagged - because either it doesn't matter, or it didn't come from any specific comamnd.

This differs from IMAP, where pretty much everything except completion is untagged - in ACAP, it's easy to run multiple commands at the same time. Pipelining? HTTP, eat your heart out.

Also gone from IMAP are some of the more curious syntactic arrangements, square brackets and the like. IMAP was originally designed to be easy to parse in LISP. For some strange reason, this was less important with ACAP, and it's more language neutral, meaning it's easy to parse in any language.

ACAP is not connectionless - like IMAP, you log in, and stay logged in until you're done, then log out. But that's effectively it for state.

Once logged in (via SASL only), you can either perform a SEARCH or a STORE. These either read data or write data respectively, and they both operate atomically - that is, a SEARCH will show you a snapshot of the data, and a STORE will either all work, or all fail. This makes it easy for clients to operate on the data.

Because clients obviously do care if the data changes while they're trying to use it, STORE allows you to specify it should fail if changes have happens since a specific date, and SEARCH allows you to make Contexts, which deserve a chapter themselves.

STORE

You'll need to refer to the RFC for the precise syntax, but effectively, the arguments to the STORE command are one or more store-lists. Each store-list operates on a single entry, but can change multiple attributes. It will create any dataset it needs to in order to satisfy what you've asked, although you can prevent this if you want - the entire STORE command will then fail if it needs to create an entry for that store-list.

The only real gotcha with these, for a client author, is that you can't "add" to a list value for an attribute. In other words, if you've an attribute whose value is a list, and you simply want to add another value to that list, then you have to speficy the entire list again. Obviously, this is a fine time to say you only want the change to take effect if the entry wasn't changed.

SEARCH

Again, you'll need to refer to the RFC for syntax.

This has several gotchas for the client author. The big one being that the syntax of the data you get back changes on three counts depending on what sort of data you asked for, and how you asked for it. This is a right royal pain, and while I can understand why the more complex syntax exists, I can't understand why the simpler one exists. But anyway, you have to cope.

Searching with no RETURN specified returns no results at all - at first this may seem useless, but it's useful with enumerated contexts.

CONTEXTs

Again, RFC for syntax. This isn't a command, though, it something you can create with a SEARCH.

Contexts fall into two types. An ordinary context is simply a saved set of results of the server. Useful if your client wants a snapshot, or alternatively, if you want to be able to fetch smaller chunks of your search via the RANGE search key, which lets you pull some of the results back, rather then all of them at once. To do this you'll need to make the context ENUMERATEd.

The other type is NOTIFY contexts, which aren't snapshots at all, but update, and more importantly, let the client know they've updated. This allows a client to "register an interest" in entries matching a search key, and then get updates on them as they change. Different servers appear to handle these updates with different speeds - the Cyrus smlacapd batches them up for a while, whereas Infotrope gets them to the client as fast as possible. ENUMERATEd contexts will update the order as well.

Everything Else

Many of the other commands can be more or less directly translated into a SEARCH or a STORE, and are mostly there for clients to save themselves from having to understand too much of what's really going on. For instance, a SETACL is directly translated to a STORE in Infotrope, a MYRIGHTS is a (highly optimised) SEARCH, and a LISTRIGHTS is almost a highly optimised SEARCH. Use them instead of the SEARCH/STORE equivalent where you can, but note that a MYRIGHTS over many entries would be easier to handle with a SEARCH.

There are a few exceptions: LANG, STARTTLS (where supported), AUTHENTICATE, and LOGOUT all control aspects of the session, and FREECONTEXT/UPDATECONTEXT are both to do with fiddling with contexts directly. GETQUOTA is unsupported everywhere, as far as I can tell.