1. Introduction

Plaid is a platform, which means that it is not a complete app on its own. Rather, it provides an implementation of the "boring" parts of a linguistic annotation app, including a user system, a database layer, and more. Building an app with Plaid allows you to focus your attention on just the interesting parts of your app including the UI and logic which is specific to your annotation framework.

The purpose of this document is to teach you how to think about and use Plaid. If you’d rather just dive into a real app programmed with Plaid, check out the UD editor example app.

2. Key Concepts

2.1. Projects and Layers

In Plaid a project is a collection of documents which all have the same data model. Plaid’s data model is configurable on a per-project basis, allowing you to have as many or as few annotation types as you would like. For example, perhaps in one project you might only annotate a single part-of-speech tag per word, while in another you might want a POS tag, a lemma, and a gloss for each word. These projects would have different configurations in order to appropriately accommodate the annotation needs of each.

The means by which this configuration is performed is the layer. (If you have used ELAN before, note that this is similar to ELAN’s tiers.) Loosely speaking, a layer corresponds to a single "type" of annotation: for instance, in the example above where we want to collect a POS tag, lemma, and gloss for each word, we would have three span layers associated with the project. We will discuss layers more shortly.

2.2. Users and Permissions

Plaid offers a built-in user system which allows individuals to log in with a password. By default, each project is private, and one of three permissions levels is needed in order to interact with one:

  • Maintainers have full privileges for working with projects: they may edit documents and also modify the project’s configuration.

  • Writers may edit documents belonging to a project, but may not modify the project’s configuration.

  • Readers may only read documents belonging to a project—​they may not make any edits to the project’s documents or configuration.

Additionally, a global admin role exists, which allows the user to see and edit all data in the Plaid instance.

2.3. Clients

Plaid’s functionality is exposed for development as a REST API. This allows you to use Plaid from any programming language with an HTTP client library.

Additionally, we provide official clients for two programming languages, Python and JavaScript. These clients provide an API for interacting with Plaid that is idiomatic for each programming language and frees you from concern about low-level details of HTTP requests. Here is an example of how to use the Python client:

client = PlaidClient("http://localhost:8085", "<SECRET_TOKEN>")

projects = client.projects.list()
print("Available projects:", projects)

first_project_id = projects[0]["id"]
first_project = client.projects.get(first_project_id, include_documents=True)
print("First project:", first_project)

first_document_id = first_project["documents"][0]["id"]
first_document = client.documents.get(first_document_id, include_body=True)
print("First document:", first_document)

The same, in JavaScript:

client = PlaidClient("http://localhost:8085", "<SECRET_TOKEN>")

projects = client.projects.list()
console.log("Available projects:", projects)

firstProjectId = projects[0].id
firstProject= client.projects.get(firstProjectId, true)
console.log("First project:", firstProject)

firstDocumentId = firstProject.documents[0].id
firstDocument= client.documents.get(firstDocumentId, true)
console.log("First document:", firstDocument)

2.4. Time-traveling Database

Plaid uses the database XTDB to store data. An unusual feature of this database is that it is immutable. This means that whenever a value inside the database changes, the old value is not lost: rather, the old state of the database is still accessible for reading, even after a new version has been created. This allows Plaid to provide time travel, allowing you to access all historical states of your documents and projects.

It is simple to make use of this in the client: simply include an ISO-8601 time as the final argument of any .get method. (In Python, you may also use the keyword argument as_of.) Consider a simple example where we are modifying the name of a document:

# Read original name
client.documents.get(document_id)["name"]
#=> "Old Name"

# Update name
client.documents.update(document_id, name="New Name")

# Read current name
client.documents.get(document_id)["name"]
#=> "New Name"

# Read old name by providing time in the past
client.documents.get(document_id, "2025-07-08T09:00:00Z")["name"]
#=> "Old Name"

Additionally, an audit log provides an account of who performed every write to the database:

for entry in c.documents.audit("35424d64-a077-4a29-8006-5a0c3b76aedb"):
    time = entry["time"]
    username = entry["user"]["username"]
    description = "; ".join([o["description"] for o in entry["ops"]])
    print(f"{username}, {time}: {description}")

# Output:
# Luke G, 2025-07-05T09:13:39.614Z: Create document "Document 1" in project 0f0f0574-ae5a-4060-814c-c5bbdce14d67
# Luke G, 2025-07-09T20:27:59.611Z: Update document 35424d64-a077-4a29-8006-5a0c3b76aedb name to "New Document Name"

2.5. Real-time Messaging

Plaid offers a simple system for real-time communication on a per-project basis. This is intended to support two purposes:

  • Ad hoc client-to-client features which you will implement on top of this communication channel, such as chat between individual annotators or interaction with non-human clients such as AI models.

  • Audit log listening, allowing clients to receive immediate notice whenever a change has been made to any document in the project. (Note that these are sent automatically by Plaid.)

This functionality is exposed in two simple functions in the client. The send_message/sendMessage function allows a client to broadcast a message to all clients in the project:

client.messages.send_message(project_id, {"purpose": "ping", "message": "ping"})

Note that the second positional argument, the body, can be any JSON value.

On the other end, a client may listen like so. Note that there are two arguments for the message. event_type is "message" for data sent via send_message/sendMessage by another client, and "audit-log" for audit log notifications. Consider an example of listener setup:

def on_event(event_type, event_data):
    if event_type == "message":
        sender = event_data["user"]
        time = event_data["time"]
        contents = event_data["data"]
        print(f"User {sender} sent data `{contents}` at {time}")
    elif event_type == "audit-log":
        user = event_data["user"]
        time = event_data["time"]
        op = event_data["ops"][0]
        op_type = op["type"]
        document_id = op["document"]
        description = op["description"]
        print(f"User {user} performed operation `{op_type}` on document {document_id} at {time}: '{description}'")


client.messages.listen(project_id, on_event)

After the send_message invocation we just saw, this on_event function would produce the following output:

User user1@example.com sent data `{'purpose': 'ping', 'message': 'ping'}` at 2025-07-09T20:14:36.168Z

And suppose that another client executed the following code:

client.documents.update("35424d64-a077-4a29-8006-5a0c3b76aedb", name="New Document Name")

The listener’s code above would print this:

User user1@example.com performed operation `document:update` on document 35424d64-a077-4a29-8006-5a0c3b76aedb at 2025-07-09T20:27:59.616Z: 'Update document 35424d64-a077-4a29-8006-5a0c3b76aedb name to "New Document Name"'

3. Layer Types

Each project contains a configuration of layers which define a schema for all documents in the project. Each layer holds a single kind of annotation, and each project may have any number of each kind of layer. For instance, you might have two span layers: one for POS tags, and another for lemmas.

3.1. An Example

Suppose we’re working on a project where all we are doing is POS-tagging. The configuration of the project’s layers (in a simplified JSON representation) would look something like this:

{
  id: "1cce50df",
  name: "Example Project",
  textLayers: [
    {
      id: "6283144f",
      name: "Text",
      tokenLayers: [
        {
          id: "d1cc124f",
          name: "Words",
          spanLayers: [
            {
              id: "ad0f5f2c",
              name: "POS tags"
            }
          ]
        }
      ]
    }
  ]
}

This layer structure prescribes the structure of individual documents. Consider a document where we have POS tagged the sentence "Fido barks":

{
  id: "01d01a27",
  name: "Document 1",
  project: "1cce50df",
  textLayers: [
    {
      id: "6283144f",
      name: "Text",
      text: { id: "9cfafcc6", document: "01d01a27", body: "Fido barks" },
      tokenLayers: [
        {
          id: "d1cc124f",
          name: "Words",
          tokens: [
            { id: "54383a26", text: "9cfafcc6", begin: 0, end: 4 },
            { id: "a8758db2", text: "9cfafcc6", begin: 5, end: 10 }
          ],
          spanLayers: [
            {
              id: "ad0f5f2c",
              name: "POS tags",
              spans: [
                { id: "4ed828ea", value: "NOUN", tokens: [ "54383a26" ] },
                { id: "b4ef8082", value: "VERB", tokens: [ "a8758db2" ] }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Notice the following:

  • Each layer has a corresponding kind of data in the document: the text layer has a text, the token layer has tokens, and the span layer has spans.

  • The layers are dependent on each other: the text layer is a dependent of the project, the token layer is a dependent of the text layer, and the span layer is a dependent of the token layer. This is a reflection of conceptual dependencies: tokens are defined as atomized substrings of a text, and spans are defined as groupings of one or more tokens.

  • Each individual entity—​whether it is a layer or some data within that layer—​has a unique ID

  • Entities refer to others with these IDs—​for instance, each span’s tokens value has a list of tokens which constitute that span.

We will continue discussing this example in more detail below.

3.2. Projects and Documents

A project is the root of a layer configuration and has a name.

{
  id: "1cce50df",
  name: "Example Project",
  textLayers: [/* ... */]
}

A project has many documents, and each has a name and a unique ID:

{ id: "01d01a27", name: "Document 1", project: "1cce50df" }

3.3. Texts

For each text layer, each document may have at most one text, which consists of a single string. This string holds all the text which is to be analyzed in dependent layers. A text object looks something like this:

{ id: "9cfafcc6", document: "01d01a27", body: "Fido barks" }

3.4. Tokens

For each token layer, each document may have many tokens, which are defined as substrings of a text:

{ id: "54383a26", text: "9cfafcc6", begin: 0, end: 4 }
{ id: "a8758db2", text: "9cfafcc6", begin: 5, end: 10 }

Note the following:

  1. begin and end must form valid substring indices for the given text.

  2. Zero-length tokens where begin == end are valid.

  3. Tokens may overlap.

  4. Plaid sorts tokens by begin when determining their linear order in the document. For tokens with identical begin, Plaid uses the optional prevalence value wherever available, such that tokens with lower precedence appear earlier in linear order.

Tokens are intended to serve ast he basic units for further linguistic analysis using spans and relations.

3.5. Spans

For each span layer, each document may have many spans, which are groupings of one or more tokens which have a single value:

{ id: "4ed828ea", value: "NOUN", tokens: [ "54383a26" ] }
{ id: "b4ef8082", value: "VERB", tokens: [ "a8758db2" ] }

There are no restrictions on spans, other than that they must hold at least one token, and that they all must belong to the span layer’s parent token layer.

3.6. Relations

For each relation layer, each document may have many relations, which are directed edges between two spans with a label. Both spans must belong to the relation layer’s parent span layer. For example, if we wanted to extend the example above with a syntactic dependency relation between "Fido" and "barks" expressing that "Fido" is the subject, we could have a relation like this:

{ id: "2f6080ff", source: "b4ef8082", target: "4ed828ea", value: "nsubj" }

3.7. Vocabs

The four basic layer types (text, token, span, and relation) are all project-specific and cannot be used in more than one project. The fifth layer type, the vocab layer, can be used across projects. As its name suggests, this layer is intended for recording occurrences of lexical entries.

The vocab layer itself has a name:

{ id: "2b75b0f9", name: "English" }

The vocab layer has vocab items, which represent lexical entries, each with a canonical form:

{ id: "da8d4549", form: "Fido" }
{ id: "b5c6e64c", form: "bark" }

Finally, vocab links are used to indicate occurrences of lexical entries. Recall the tokens from before:

// "Fido"
{ id: "54383a26", text: "9cfafcc6", begin: 0, end: 4 }
// "barks"
{ id: "a8758db2", text: "9cfafcc6", begin: 5, end: 10 }

We can create links between them and the above vocab items with vocab links like so:

{ vocabItem: "da8d4549", tokens: [ "54383a26" ] }
{ vocabItem: "b5c6e64c", tokens: [ "a8758db2" ] }

Notice that multiple tokens may be specified, allowing for multi-word and non-contiguous lexical items.

3.8. Metadata and Config

It is often desirable to enrich an entity with additional information—​for instance, you might want to record some information about the annotator’s confidence in whether a certain span value is correct. Additionally, you might want to do the same with a layer in order to e.g. specify what values are acceptable for spans in a given layer. To accommodate this, Plaid allows arbitrary data to be stored in the config attribute for layer types (project, text layer, token layer, span layer, relation layer, vocab layer) and in the metadata attribute for data types (document, text, token, span, relation, vocab item, vocab link).

You might use a config to store legal tags for a given layer:

{ id: "...", name: "POS Tags", config: { tags: ["NOUN", "VERB", /* ... */] } }

As for metadata, one use could be to store information about whether a human or an AI system produced the annotation, and in the latter case, store additional information about the system’s predictions:

// Human-made POS tag
{
  id: "...",
  tokens: [/*...*/],
  value: "NOUN",
  metadata: {
    userId: "human-user-id"
  }
}
// System-made POS tag
{
  id: "...",
  tokens: [/*...*/],
  value: "NOUN",
  metadata: {
    userId: "system-user-id",
    systemName: "stanza==1.10.1",
    tagProb: 0.8412,
    tagProbs: {
      NOUN: 0.8412,
      PROPN: 0.0966,
      ADJ: 0.00141,
      /* ... */
    }
  }
}
Note
Config and metadata values are schema-less, meaning that keys and values are unconstrained. If you’d like to see schema support for either of these, please feel free to open an issue.

4. Data Integrity

In collaborative annotation projects, it is crucial to take steps to ensure that data never reaches an invalid state. Plaid provides a few different means for maintaining data integrity, so that you may have confidence that your data will never become corrupt.

4.1. Core Data Integrity Constraints

In the previous section, we noted the constraints which Plaid enforces on each data type. Plaid guarantees that the database will never violate these, no matter what, by ensuring that invalid entities are never created, and often by deleting structures which are indirectly rendered invalid by another change. Consider these examples:

  • If a relation’s source span is deleted, then Plaid deletes the relation as well, because a relation must have a span on either end in order to remain valid.

  • If a few characters are deleted in a text, then all token indexes are updated to maintain validity: tokens containing those characters will shrink or get deleted (if they turn into zero-length tokens), and not containing those characters which are anchored to subsequent text will have their indices decremented by the number of deleted tokens.

  • If a span’s only token is deleted, then the span will deleted, along with any dependent relations.

These invariants have been incorporated into Plaid because of their broad desirability in linguistic annotation. However, some invariants will vary by annotation framework. For example, it is quite common to want a span layer’s spans to be in one-to-one correspondence with tokens in the parent token layer. This is not directly enforced by Plaid, but Plaid provides you with three mechanisms which allow you to enforce your own data integrity constraints.

4.2. Strict Mode

Multiple users may edit the same document simultaneously, and in some cases, undesirable conflicts may occur as users fail to take into account each other’s work. Suppose, for example, that one user is editing a sentence’s lemmas, and the other is editing a sentence’s POS tags. If the lemma editor doesn’t know that a certain POS tag has changed, they might make the wrong decision about which lemma to assign. Plaid clients' optional strict mode causes edits to fail when someone other than the current user has made an edit. Consider this exact scenario in code:

client1.spans.update(lemmaSpanOneId, "lemmaOne")
client2.spans.update(posTagSpanTwoId, "posTagTwo")
// Works fine
client1.spans.update(lemmaSpanTwoId, "lemmaTwo")

When client 1 executes the second lemma span’s value, unless they happened to have loaded the document anew after client 2’s change, they will not be aware of the new POS tag for the second word.

Strict mode causes requests to fail when someone other than the user in strict mode has edited a document since strict mode began. If client 1 had initiated strict mode at the beginning, then the second request would have failed:

client1.enterStrictMode(documentId)
client1.spans.update(lemmaSpanOneId, "lemmaOne")
client2.spans.update(posTagSpanTwoId, "posTagTwo")
// Fails with HTTP 409, since client 2 made a change
client1.spans.update(lemmaSpanTwoId, "lemmaTwo")
// Exit strict mode when desired
client1.exitStrictMode()

This failure gives client 1 the opportunity to reload the document only when it is necessary, allowing them to reconsider the current state of the document before making changes.

4.3. Locking

Sometimes a more heavyweight solution is needed. A lock gives a user exclusive permission to write to a document, preventing all other users from changing its contents. Locks have a 60 second expiration timer by default, and they may be released early or renewed by either explicit renewal or any write to the locked document. Consider:

client2.checkLock(documentId);
// => HTTP 204
client1.acquireLock(documentId);
// -> { userId: "client1", expiresAt: 1752260966446 }
client2.checkLock(documentId);
// -> { userId: "client1", expiresAt: 1752260966446 }
client1.releaseLock(documentId);
// -> HTTP 204
client2.checkLock(documentId);
// -> HTTP 204

Locks are useful for situations where a concurrent edit by another user could yield an invalid state with respect to data integrity constraints beyond what is enforced in Plaid’s core. They should be used only where necessary in order to minimize the risk of invariant violations stemming from concurrent modifications.

4.4. Atomic Batches

Finally, you may also submit multiple requests in batches. Batches are atomic meaning that we guarantee that either they will all succeed or all fail. This is a very useful guarantee whenever you have sophisticated data integrity requirements that must be orchestrated using more than one request.

Here is an example of how to submit a batch using the JavaScript client:

client.beginBatch();
client.documents.update(documentIdOne, "New Name for Doc 1");
client.documents.update(documentIdTwo, "New Name for Doc 2");
try {
    const result = await client.submitBatch();
    console.log("Batch success!")
    for (const response of result) {
        console.log(response);
    }
} catch (e) {
    console.error(`Batch failed: ${e}`)
}

Or in Python:

client.begin_batch()
client.documents.update(document_id_one, "New Name for Doc 1")
client.documents.update(document_id_one, "New Name for Doc 2")
try:
    result = client.submit_batch()
    print("Batch success!")
    for response in result:
        print(response)
except Exception as e:
    print(f"Batch failed: {e}")

Owing to some implementation details, no other writes may be executed while a batch is being processed, so only use them where necessary.

4.5. Trust

For all constraints which you might attempt to enforce using the three mechanisms described above, there is, of course, nothing stopping a malicious user from circumventing them and submitting changes which invalidate your formalism-specific data integrity constraints. For example, if you have a document where you want all spans to be in one-to-one correspondence with tokens, a malicious user could simply circumvent your UI code entirely and craft a malicious request to e.g. create multiple spans associated with a single token. Since there is no server-side validation for this constraint, the server will happily execute it and advance into a database state that does not violate any core data integrity constraints but does violate your formalism-specific constraints.

This is a fundamental limitation of Plaid which was deliberately adopted in order to support frontend-only development. You therefore must only grant write privileges only to users who you trust not to circumvent client-side validation guardrails. Fortunately, we think this is not an onerous imposition in most real-world circumstances.

5. Development

For information on how to work on Plaid itself (not an app which uses Plaid), see the development guide.