Plaid Query Language

Note	This is the reference for Plaid’s query language: the full clause set, the wire shape, and the engine’s limits. For an example-led introduction, see the Querying chapter of the manual.

A query searches Plaid data across every project you can read, in one request. You describe a pattern, the entities (spans, tokens, relations, vocab items) you are looking for and the relationships between them, and the engine returns every match. It is exposed as POST /api/v1/query and as a query() method on the official clients. Examples here use the Python client, with a JavaScript snippet where the two differ (see Clients and casing).

1. Mental model

A query is a pattern you match against your data: you name the things you are looking for, say how they relate, and Plaid returns every combination that fits.

You declare variables (?s, ?t, …) for the entities you want.
You write a list of clauses. Each clause either describes a single entity (a span on the pos layer with value NOUN, call it ?s) or relates two of them (?s covers ?t).
All clauses must hold at once (an implicit AND); reusing a variable in two clauses joins them, so the name must stand for the same thing in both places.
You ask for some of those variables back, as ids, full entities, or a count.

Disjunction (or) and negation (not) are written as explicit clauses, and the seq shorthand covers the common "this token, then that token" case. All of them are built from the same handful of basic clauses.

from plaid_client import PlaidClient
client = PlaidClient.login(
    "http://localhost:8085", "you@example.com", "password")

# every NOUN immediately followed by a VERB, across all readable projects
result = client.query({
    "find": ["?s1", "?s2"],
    "where": [
        ["span", "?s1", {"layer": "pos", "value": "NOUN"}],
        ["span", "?s2", {"layer": "pos", "value": "VERB"}],
        ["covers", "?s1", "?t1"], ["covers", "?s2", "?t2"],
        ["precedes", "?t1", "?t2"],
    ],
})
# -> {"return": "ids", "columns": ["s1", "s2"],
#     "results": [["<noun-id>", "<verb-id>"], ...],
#     "count": N, "truncated": false}

2. Request shape

A query is a JSON object. The wire keys are below; the clients recase them from your language’s idiom (see Clients and casing).

Key Required Meaning

Key	Required	Meaning
`find`	usually	Non-empty list of variables to return, in column order. Omitted only with an aggregate `return` (see Return shapes).
`where`	yes	Non-empty list of clauses (the pattern).
`scope`	no	Restrict to specific projects, by id (see Scope and access control). Default: every project you can read.
`limit`	no	Max rows. Default 1000, hard cap 100,000 (see Result-size guardrails).
`order-by`	no	Sort the rows (see Ordering). Without it, row order is unspecified.
`return`	no	`ids` (default), `entities`, `count`, or an aggregate spec (see Return shapes).
`bindings`	no	Substitute `?name` placeholders with literals you supply (see Bindings).

find

usually

Non-empty list of variables to return, in column order. Omitted only with an aggregate return (see Return shapes).

where

yes

Non-empty list of clauses (the pattern).

scope

Restrict to specific projects, by id (see Scope and access control). Default: every project you can read.

limit

Max rows. Default 1000, hard cap 100,000 (see Result-size guardrails).

order-by

Sort the rows (see Ordering). Without it, row order is unspecified.

return

ids (default), entities, count, or an aggregate spec (see Return shapes).

bindings

Substitute ?name placeholders with literals you supply (see Bindings).

Note	For legibility, examples in this reference write a layer as a readable name (e.g. `pos`, `UPOS`) where a real query carries the layer’s UUID — layers are addressed by id only (see Layer addressing). Read `"layer": "pos"` as "the id of the layer called pos".

3. Variables

A variable is a string beginning with ?: "?s1", "?token", "?head". The name is arbitrary; what matters is that the same name in two clauses refers to the same entity (that is the join). Every variable in find must be bound by some positive clause in where. Names beginning with ?__ are reserved for the engine; a find variable may not use one.

4. Bindings

If some value, like the ID of a layer, is known before the query is executed, you may put it into the query as a literal, like the span layer ID below:

{
  "find": ["?s"],
  "where": [["span", "?s", {"layer": "0194e8a1-…-uuid", "value": "NOUN"}]],
}

However, it’s often more convenient to use bindings instead, where a variable takes the value’s place, and the variable is separately bound in a "bindings" map:

{
  "find": ["?s"],
  "where": [["span", "?s", {"layer": "?lyr", "value": "?tag"}]],
  "bindings": {"?lyr": "0194e8a1-…-uuid", "?tag": "NOUN"}
}

This is a convenience for you: these two example queries are behaviorally identical, and under the hood, before the latter query is executed, it is transformed into the former query by having all bound variables substituted with their bound values.

A binding value is a scalar (string / number / boolean) or a non-empty list of scalars. A list in a value position matches any of them (see Value matching):

# with bindings {"?tags": ["NOUN", "PROPN"]}
["span", "?s", {"layer": "UPOS", "value": "?tags"}]

Bindings are strict. Every binding must be referenced, and a placeholder may only stand where a literal is allowed; using one where a variable is required (in find, as the entity slot of a clause, in order-by) is an error.

5. Entity clauses

An entity clause is [kind, "?var", {constraints}]. The kind picks the entity type (and its table and access-control scope); the constraint map filters it.

Kind Constraints Notes

Kind	Constraints	Notes
`span`	`layer`, `value`, `doc`, `metadata`	`value` matches the stored annotation value.
`token`	`layer`, `value`, `doc`, `begin`, `end`, `metadata`	`value` is the token’s surface text (the slice of the text body); `begin` / `end` are 0-based Unicode code-point offsets (begin inclusive, end exclusive).
`relation`	`layer`, `value`, `doc`, `source`, `target`, `metadata`	`source` / `target` may be inline variables (see Relationship clauses).
`vocab`	`layer`, `form`, `metadata`	Vocab items are global; scoped via project grants (see Scope and access control).
`document`	`name`, `id`, `metadata`	A document.
`text`	`body`, `doc`, `metadata`	A document’s text content; `body` is the raw text.

span

layer, value, doc, metadata

value matches the stored annotation value.

token

layer, value, doc, begin, end, metadata

value is the token’s surface text (the slice of the text body); begin / end are 0-based Unicode code-point offsets (begin inclusive, end exclusive).

relation

layer, value, doc, source, target, metadata

source / target may be inline variables (see Relationship clauses).

vocab

layer, form, metadata

Vocab items are global; scoped via project grants (see Scope and access control).

document

name, id, metadata

A document.

text

body, doc, metadata

A document’s text content; body is the raw text.

All constraints are optional, but a layer-less entity is still confined to your readable projects. Two clauses on the same variable conjoin their constraints: splitting compatible constraints across clauses combines them, while contradictory ones (a span ?s with value NOUN in one clause and VERB in another) match nothing.

["span", "?s", {"layer": "UPOS", "value": "NOUN"}]    # a NOUN annotation
["token", "?t", {"layer": "Words"}]                   # any word token
["token", "?t", {"layer": "Words", "begin": 0}]       # a word starting at 0
["relation", "?r", {"layer": "deprel", "value": "nsubj"}]  # an nsubj edge
["vocab", "?v", {"form": "Kemal"}]                    # the lexeme "Kemal"

6. Field access (dot paths)

A ?variable followed by a dot path (?t.begin) reads one of that entity’s fields. The result is a scalar, usable in a predicate, in order-by, or in an aggregate. So to keep only the tokens that begin at or after offset 5:

["token", "?t", {"layer": "Words"}],
[">=", "?t.begin", 5]

Supported names, by type:

Kind Readable fields

Kind	Readable fields
`span`	`value`, `doc`, `id`, `layer`, `metadata.KEY…`
`token`	`value`, `begin`, `end`, `precedence`, `doc`, `id`, `layer`, `metadata.KEY…`
`relation`	`value`, `doc`, `id`, `layer`, `source`, `target`, `metadata.KEY…`
`vocab`	`form`, `id`, `layer`, `metadata.KEY…`
`document`	`name`, `id`, `metadata.KEY…`
`text`	`body`, `doc`, `id`, `metadata.KEY…`
layer variable	`name`, `id`, `config.KEY…`

span

value, doc, id, layer, metadata.KEY…

token

value, begin, end, precedence, doc, id, layer, metadata.KEY…

relation

value, doc, id, layer, source, target, metadata.KEY…

vocab

form, id, layer, metadata.KEY…

document

name, id, metadata.KEY…

text

body, doc, id, metadata.KEY…

layer variable

name, id, config.KEY…

doc is the document id and id is the entity’s own id. metadata and config go deeper: the segment after them is a key, and you can keep descending into nested objects (?s.metadata.author.name, ?sl.config.editor.color).

layer (every annotation entity — span/token/relation/vocab) and source/target (a relation’s endpoint spans) are reference fields — opaque id columns, the dot-path spelling of the layer / source / target constraint slots. Like other ids they take only =, !=, and in (no ordering), and they compare to a variable or an id, never a name: ["=", "?s.layer", "?sl"] joins to a layer variable, and ["=", "?r.source", "?sp"] is the same as the source clause. A bare layer name on the right (["=", "?s.layer", "pos"]) is a 400 — match a layer by name with the layer constraint or a layer-variable clause instead.

The field name itself (begin, value, the metadata/config word) is part of the query language, so it is matched loosely: case and -/_ are ignored, and you may spell it in whatever idiom your client uses (?t.begin and ?t.Begin are the same). The keys after metadata/config, though, are your data: they are matched exactly, case-sensitive and byte for byte (?s.metadata.caseMarker is not ?s.metadata.casemarker), mirroring how metadata and config are stored. A key containing a literal . can’t be reached with a dot path. Use the metadata constraint for those.

7. Constraints

A constraint slot can be filled in several ways. These forms apply wherever a constraint map appears, in entity clauses and in seq elements alike.

7.1. Value matching

A scalar constraint can be given four ways. Which forms are allowed depends on the key:

Form Meaning Allowed on

Form	Meaning	Allowed on
Literal `"NOUN"`	exact equality	any key
List `["NOUN", "PROPN"]`	any of these	`value`, `form`, `doc`, `begin`, `end`, `name`, `body`, `id`
Regex `{"regex": "^w"}`	pattern match (see Regular expressions)	`value`, `form`, `name`, `body`
Variable `{"var": "?v"}`	bind the column, not filter (see Value variables)	`value`, `form`, `begin`, `end`, `doc`

Literal "NOUN"

exact equality

any key

List ["NOUN", "PROPN"]

any of these

value, form, doc, begin, end, name, body, id

Regex {"regex": "^w"}

pattern match (see Regular expressions)

value, form, name, body

Variable {"var": "?v"}

bind the column, not filter (see Value variables)

value, form, begin, end, doc

layer is not in any of these lists: it must resolve to a single layer (see Layer addressing) or hold a layer variable.

7.2. Regular expressions

A regex spec is {"regex": "<pattern>"}, optionally with "flags": "i" (case-insensitive, the only flag). Patterns are Java regular expressions, the syntax accepted by java.util.regex.Pattern. Matching is a substring search against the decoded value: a pattern matches anywhere unless you anchor it with ^ / $. Stick to common syntax (. * + ? [ ] ^ $ |) for patterns that port to other regex engines. A malformed pattern, or one longer than 512 characters, is a 400.

["span",  "?s", {"layer": "Lemma", "value": {"regex": "^walk"}}]
["vocab", "?v", {"form": {"regex": "ция$", "flags": "i"}}]

7.3. Value variables

Write {"var": "?v"} where a literal would go and the clause binds that column instead of filtering it. The same variable in two clauses joins them, so you can require two entities to share a value without knowing it up front:

# two DISTINCT spans on the same layer with the same value
["span", "?a", {"layer": "?L", "value": {"var": "?v"}}],
["span", "?b", {"layer": "?L", "value": {"var": "?v"}}],
["!=", "?a", "?b"]

The explicit {"var": …} wrapper is load-bearing: a plain string is always a literal, so a real value like "?x" is never mistaken for a variable. Value variables are join / comparison helpers only; they bind a value, not an entity, so they cannot appear in find.

7.4. Metadata

Any entity can carry metadata, an arbitrary JSON object. It is therefore common to want to constrain an entity query by just a certain portion of the whole JSON object. There are some special semantics at play for accomplishing this:

A plain value matches by equality, JSON type included: {"score": 5} matches the number 5, not the string "5", and {"ok": true} matches the boolean, not the string "true".
A list means "any of these", a shorthand for several separate literals, not a value to match as a whole. The genre example matches a document whose genre is the string "news" or the string "opinion"; it does not match one whose genre is the array ["news", "opinion"].
A regex runs against the value’s text: for a scalar, the decoded value (a string without its surrounding quotes, so ^[0-9]+$ matches the string "123"); for an array or object, the serialized JSON (for example ["news","opinion"]). That serialized form is the only way a regex reaches inside a non-scalar value.

Some examples:

["span", "?s", {"layer": "POS", "metadata": {"translation": "dog"}}]
["span", "?s", {"layer": "POS", "metadata": {"sense": {"regex": "^[0-9]+$"}}}]
["document", "?d", {"metadata": {"genre": ["news", "opinion"]}}]

Matching a metadata value that is itself an array or object is not well supported. If you need to do so, either restructure the metadata to remove the array/object, or match the serialized JSON with a regex. Suppose genre is stored as the array ["news", "opinion"]: a regex sees it as the text ["news","opinion"], so a pattern can pick out an element inside it. For example, to match documents whose genre array includes "news":

# genre stored as e.g. ["news", "opinion"]; match the serialized text
["document", "?d", {"metadata": {"genre": {"regex": "\"news\""}}}]

This is brittle: the regex has to match SQLite’s exact serialization (compact, no spaces, with object keys in whatever order they were stored), so reach for it only as a last resort.

Keys are arbitrary strings, kept verbatim and case-sensitive. The lookup is indexed against the entity you have already narrowed to, so it is cheap, but metadata is free-form: treat it as a filter on a search you have already narrowed, not a primary search key.

7.5. Filtering by document

An annotation’s doc is its document’s ID. To select by document name (or any document property), bind the annotation’s doc to a value variable and equate it to a document clause:

["span", "?s", {"layer": "POS", "doc": {"var": "?dv"}}],
["document", "?d", {"name": {"regex": "^interview-"}}],
["=", "?dv", "?d"]   # the span's document_id equals this document's id

Two annotations sharing the same doc value variable means "in the same document" without a document clause at all.

8. Relationship clauses

A relationship clause is [op, "?a", "?b"], a named edge between two variables. This is where the structure between entities comes from. Each operator fixes the entity kind of both operands, shown as (kind, kind) after the signature; a variable used at the wrong kind (a span where a token is required) is a 400. Only spans, tokens, relations, and vocab items take part in these clauses; documents and texts do not.

In the diagrams below, tokens are boxed words, a bracket marks the tokens a span covers, and an arrow is a relation edge.

Coverage. ["covers", "?span", "?token"] (span, token): the span includes that token.

   ┌─ ?s ─┐
   [ Fido ] [ barks ]
      ?t

Precedence. ["precedes", "?t1", "?t2"] (token, token): ?t2 is the immediate next token after ?t1.

   [ dogs ][ run ]
      ?t1    ?t2
       └──▶──┘

["precedes*", "?t1", "?t2"] (token, token): ?t1 comes somewhere before ?t2, not just right before.

   [ the ][ big ][ dog ]
     ?t1           ?t2
      └──── … ───▶┘

"Next" and "before" follow Plaid’s standard reading order (begin, precedence, end, id), the order tokens appear in as you read a document. Both are scoped to a single text and a single token layer: a word token never precedes a morpheme token, and neither relation reaches across texts.

Nesting. ["within", "?child", "?parent"] (token, token): the extent of ?child sits inside the extent of ?parent.

   word       [  replayed  ]
   morphemes  [re][play][ed]
                   ?child

First token. ["first-in", "?token", "?container"] (token, token): ?token is within ?container and is the first token of its layer there.

   sentence  [ the dog runs ]
   words     [the][dog][runs]
              ?token

Both compare tokens by their code-point offsets, so they work across layers: naming child and parent on different token layers (morpheme within word within sentence) is how you walk a hierarchy. Nesting is non-strict, so a child whose extent exactly fills its parent still counts, but a token is never within itself. first-in also requires that no token on the same layer as ?token in the container comes before it in that same reading order.

Overlap. ["overlaps", "?a", "?b"] (span, span): the two spans share at least one token.

       [a][b][c][d]
   ?x  └───────┘
   ?y        └────┘

Containment. ["contains", "?a", "?b"] (span, span): span ?a covers every token span ?b does.

       [a][b][c]
   ?a  └───────┘
   ?b     └─┘

Coextension. ["coextensive", "?a", "?b"] (span, span): the two spans cover exactly the same tokens.

       [a][b][c]
   ?x  └───────┘
   ?y  └───────┘

All three compare spans by the set of tokens each covers, so a discontinuous span works too, and all three exclude a span from matching itself. That is why coextensive is useful: it finds two different spans over the same tokens (e.g. a POS span lined up with an NER span).

Source. ["source", "?relation", "?span"] (relation, span): the span at the relation’s source end.

            ?relation
   [ Fido ] ──────▶ [ barks ]
     ?span

Target. ["target", "?relation", "?span"] (relation, span): the span at the relation’s target end.

            ?relation
   [ Fido ] ──────▶ [ barks ]
                       ?span

The two ends can be written inline on the relation clause or as separate edges; the two are equivalent:

["relation", "?r",
 {"layer": "deprel", "value": "nsubj", "source": "?h", "target": "?d"}]
# is the same as:
["relation", "?r", {"layer": "deprel", "value": "nsubj"}],
["source", "?r", "?h"],
["target", "?r", "?d"]

Reachability. ["related*", "?a", "?b", {layer}] (span, span): ?b is reached from ?a by following one or more relation edges on the named layer.

   [x] ──▶ [y] ──▶ [z]
   ?a               ?b

The trailing map is required: it names the relation layer, and may add a value each edge must carry (a literal or a list of literals only; a regex or value variable there is a 400). It follows one or more hops, never zero, so ?a related* ?a matches only through a cycle. Every span reached along the path must itself sit in a span layer within your scope, so the search never crosses into a project you cannot read.

# every node in ?head's dependency subtree, at any depth
["related*", "?head", "?node", {"layer": "dep"}]

Vocab link. ["vocab-link", "?token", "?vocab"] (token, vocab): the token is linked to that vocab item.

   [ Fido ] ──▶  vocab { form: "Fido" }
    ?token        ?v

9. Predicate clauses

A predicate compares two already-bound terms: [op, a, b] where op is one of =, !=, <, >, <=, >= and each term is a variable, a field path, or a literal.

["!=", "?s1", "?s2"]   # two DIFFERENT spans (compares entity ids)
["=",  "?v1", "?v2"]   # two value variables that must be equal
["<",  "?n", 5]        # a value variable bound to begin, vs a literal
["=",  "?s.value", "NOUN"]   # a field path vs a literal

On entity variables only = / != are allowed (ids have no order); this is how you say "two distinct matches" and keep an entity from matching itself.
On value variables and literals, all six operators work.
Both variable terms must be bound elsewhere in where; a predicate filters, it never introduces a variable. Predicates are not allowed inside not.

Two more predicates work on a field path (or, for in, also a variable):

["~",  "?s.value", "^N"]                      # regex match (bare pattern)
["~",  "?s.value", {"regex": "^n", "flags": "i"}]   # …with flags
["in", "?s.value", ["NOUN", "PROPN"]]         # membership / alternation

~ matches a text field (value, form, name, body, a metadata/config value, or a token’s surface value) against a regular expression. The right side is a bare pattern string or a {"regex": …, "flags": "i"?} spec (only the i — case-insensitive — flag is supported). Regexes run against the decoded text, so anchors (^, $) work. ~ on a number or id field is a 400.
in keeps rows whose left term equals one of the listed literals — the predicate form of a list constraint. The list must be non-empty literals. On a reference field the members must be ids, not names.
Like the comparison predicates, ~ and in are not allowed inside not.

9.1. Constraints, two ways

Most things you can say in an entity clause’s constraint map you can also say as a standalone predicate over a field path — the same filter, two spellings, the same result. The constraint map is the compact, common form; the predicate ("triple") form is the composable, Datalog-style one. Both are first-class; neither is preferred, and neither is going away. Pick whichever reads better.

Constraint map Equivalent predicate form

Constraint map	Equivalent predicate form
`["span", "?s", {"value": "NOUN"}]`	`["span", "?s"], ["=", "?s.value", "NOUN"]`
`{"value": ["NOUN", "PROPN"]}`	`["in", "?s.value", ["NOUN", "PROPN"]]`
`{"value": {"regex": "^N"}}`	`["~", "?s.value", "^N"]`
`{"begin": 5}`	`["=", "?t.begin", 5]` (and `<`, `>`, …)
`{"layer": "?sl"}`	`["=", "?s.layer", "?sl"]`
`{"source": "?sp"}` (relation)	`["=", "?r.source", "?sp"]` (or the `source` clause)
`{"metadata": {"k": {"regex": "x"}}}`	`["~", "?s.metadata.k", "x"]`

["span", "?s", {"value": "NOUN"}]

["span", "?s"], ["=", "?s.value", "NOUN"]

{"value": ["NOUN", "PROPN"]}

["in", "?s.value", ["NOUN", "PROPN"]]

{"value": {"regex": "^N"}}

["~", "?s.value", "^N"]

{"begin": 5}

["=", "?t.begin", 5] (and <, >, …)

{"layer": "?sl"}

["=", "?s.layer", "?sl"]

{"source": "?sp"} (relation)

["=", "?r.source", "?sp"] (or the source clause)

{"metadata": {"k": {"regex": "x"}}}

["~", "?s.metadata.k", "x"]

The triple form’s strength is composition: a bare ["span", "?s"] plus predicates lets you filter on fields of different entities and join them, where a single constraint map only describes one entity.

The two forms return the same rows for the categorical (string) values that annotations almost always carry. The one nuance is a numeric annotation value: a predicate/field-path reads it as a number (so ["=", "?s.value", 5] and ["in", "?s.value", [5]] match a stored 5.0), while the constraint map compares the stored text ({"value": 5} does not). Stick to one form when comparing numeric annotation values.

10. Sequences (`seq`)

A seq clause walks one token layer; each element matches a token at the next position, and consecutive elements are adjacent by precedes. It is the readable form of a chain of covers + precedes.

# a determiner immediately followed by a noun, over the Words layer
["seq", {"layer": "Words"},
 ["span", {"layer": "UPOS", "value": "DET"}, "as", "?d"],
 ["span", {"layer": "UPOS", "value": "NOUN"}, "as", "?n"]]

   seq over Words: DET then NOUN
     Words  [ the ][ dog ]
              └───▶───┘
     UPOS   [DET ] [NOUN ]
             ?d      ?n

Each element is [kind, {constraints}], optionally followed by "as", "?var" to capture it. A span element matches a token that the span covers; a token element matches the sequence token directly. An element’s constraint map accepts the same keys as the standalone clause of that kind (a span element takes value / doc / metadata, a token element value / begin / end / metadata). The seq config map takes layer (required) and an optional doc to pin the whole sequence to one document.

10.1. Quantifiers

Wrap an element to repeat it:

Form Meaning

Form	Meaning
`["?", element]`	0 or 1
`["rep", n, m, element]`	between `n` and `m` (inclusive, `0 ≤ n ≤ m ≤ 16`)

["?", element]

0 or 1

["rep", n, m, element]

between n and m (inclusive, 0 ≤ n ≤ m ≤ 16)

# DET, optional ADJ, NOUN
["seq", {"layer": "Words"},
 ["span", {"layer": "UPOS", "value": "DET"}, "as", "?d"],
 ["?", ["span", {"layer": "UPOS", "value": "ADJ"}]],
 ["span", {"layer": "UPOS", "value": "NOUN"}, "as", "?n"]]

   DET  ( ADJ )?  NOUN     ( )? matches ADJ 0 or 1 times
    ?d            ?n

   matches  the dog       (DET NOUN)
            the big dog   (DET ADJ NOUN)

Only fixed (non-quantified) elements may be named with "as"; a quantified element is anonymous filler. Unbounded quantifiers (*, +) are not supported; use a bounded rep. A bounded quantifier is tried at each allowed length, so the query above matches both DET NOUN and DET ADJ NOUN.

11. Disjunction (`or`)

For "this OR that", use ["or", group, group, …] where each group is a list of clauses (all of which must hold together). The query matches if any group matches.

# a token tagged NOUN or VERB
["or", [["span", "?s", {"layer": "UPOS", "value": "NOUN"}]],
       [["span", "?s", {"layer": "UPOS", "value": "VERB"}]]]

   ?s is NOUN or VERB:
          ┌─ NOUN
     ?s ──┤
          └─ VERB

Surrounding clauses (the ones outside the or) apply to every branch. Rules:

At least 2 groups, each a non-empty list of clauses.
Every find variable must be bound in every group, with the same kind in each, or it is a 400.
or may nest, and a group may contain a seq.
Each group runs as its own query and the results are combined, with duplicates removed (a row matching two groups appears once).

For "one field is one of several values", prefer a value list ({"value": ["NOUN", "PROPN"]}), which is simpler and faster than separate or branches.

12. Negation (`not`)

["not", clause, …] takes one or more clauses and matches when they have no joint match in your data.

# words with no NOUN annotation on them
["token", "?t", {"layer": "Words"}],
["not", ["covers", "?s", "?t"],
        ["span", "?s", {"layer": "UPOS", "value": "NOUN"}]]

   words with no covering NOUN span:
     [ runs ]       ✓ kept: no NOUN span covers it
        ?t

     ┌NOUN┐
     [ dog ]        ✗ dropped: a NOUN span covers it
        ?t

What the not means depends on which variables it shares with the rest of the query:

A variable also used outside the not is held to that binding, so the not is checked once per value. Above, ?t appears outside, so the not reads "this particular ?t has no covering NOUN span."
A variable used only inside the not is existential: "there is no such thing." Here ?s is only inside, giving "there is no NOUN span covering ?t." Such a variable stays local to the not and cannot be a find variable.

The body may itself contain or, seq, or another not.

13. Layer addressing

Wherever a clause takes a layer, a scalar reference is the layer’s ID (its UUID) — that is the only way to name a specific layer. There is no addressing by layer name or Project/Layer path: names are non-unique across a multi-tenant instance (two unrelated projects routinely share a layer name), so resolving one to a single layer is brittle, whereas an ID always identifies exactly one layer.

["span", "?s", {"layer": "98ef7a32-...-1327101afee3"}]   # by id

A non-ID reference (a name or path) is a 400. In practice this is no burden: anything that knows a layer’s name fetched it from the API and already holds its ID, so it just passes the ID. To select a layer by name, bind it with a layer variable and a *-layer clause (next section) — the match is then explicit, and you can see every layer it binds instead of a name silently resolving to one.

13.1. Layer variables

Instead of a reference, the layer slot can hold a variable ("?sl"), binding the entity’s layer as a first-class node. This does three things a reference cannot.

Same-layer join: two entities sharing a layer variable are forced onto the same (otherwise unspecified) layer:

# a NOUN and a VERB span on the SAME layer, whichever it is
["span", "?a", {"layer": "?sl", "value": "NOUN"}],
["span", "?b", {"layer": "?sl", "value": "VERB"}]

Project the layer: put a layer variable in find to get the layer ID back.

Match by name: constrain a layer variable with a -layer clause. Since a scalar reference is an ID only, this is how you select layers by *name — and because it binds a variable, it transparently matches every same-named layer in your scope (across projects), rather than resolving to one:

# every NOUN span on a layer named "pos", in ALL readable projects
["span", "?s", {"layer": "?sl", "value": "NOUN"}],
["span-layer", "?sl", {"name": "pos"}]

The layer-constraint clause matches the entity’s kind: span-layer, token-layer, relation-layer, vocab-layer. It constrains by name; an unconstrained layer variable ranges over every layer of its kind in scope. A layer variable used for two different kinds is a 400 kind conflict.

13.2. Layer structure

A layer clause can also name the layer’s immutable parent layer: the edge the data model records (a token layer’s text layer and optional parent token layer, a span layer’s token layer, a relation layer’s span layer). The slot is named after the attribute and holds either a layer variable (binding the parent as a node you can constrain further or project) or a scalar layer reference (resolved to one layer), exactly like the layer slot on an entity.

Layer clause Structural slots

Layer clause	Structural slots
`token-layer`	`text-layer`, `parent-token-layer`
`span-layer`	`token-layer`
`relation-layer`	`span-layer`

token-layer

text-layer, parent-token-layer

span-layer

token-layer

relation-layer

span-layer

Chain the slots through shared variables to filter an entity by an ancestor layer:

# a span whose span layer's token layer is under a text layer "Transcription"
["span",        "?s",    {"layer": "?sl"}],
["span-layer",  "?sl",   {"token-layer": "?tl"}],
["token-layer", "?tl",   {"text-layer": "?txtl"}],
["text-layer",  "?txtl", {"name": "Transcription"}]

A scalar reference pins the parent in a single clause. Like the layer slot on an entity, it is a layer ID (see Layer addressing); to pin the parent by name instead, bind it as a variable and name it with a *-layer clause (the chained form above):

# the token layer's text layer, pinned by id
["token-layer", "?tl", {"text-layer": "98ef7a32-...-1327101afee3"}]

Token-layer nesting (sentence > word) uses parent-token-layer; the inverse direction (a text layer’s token layers) is written from the child side:

# the token layer named "words" whose parent layer is named "sentences"
["token-layer", "?w", {"parent-token-layer": "?s", "name": "words"}],
["token-layer", "?s", {"name": "sentences"}]

Unlike the other layer kinds, text-layer has no parent slot of its own — its parent is the project, which scope already confines. It is otherwise an ordinary layer variable (match it by name, project it in find); and like any layer variable, a structural join only ever reaches layers in projects you can read.

14. Scope and access control

A query only ever sees data in projects you can read; access control is applied server-side from your authenticated identity and cannot be widened by the request. By default the scope is all your readable projects.

Narrow it with scope, by project ID (projects, like layers, are identified by id only — project names are non-unique):

"scope": {"project_ids": ["<uuid>", "<uuid>"]}

The requested scope is intersected with what you can read, so you can only ever narrow, never widen. If that intersection is empty (you scoped only to projects you cannot read, or you can read none), the query is a 400, not an empty result. Vocab layers are global; a vocab item is visible when some project in your scope has been granted its layer.

15. Ordering

Without order-by, rows come back in an unspecified order. order-by is a list of entries applied left to right; each entry is a dot path (["?t.begin"], or ["?t.begin", "desc"]) — or, equivalently, the longer [variable, attribute] / [variable, attribute, "desc"] form:

# NOUNs, sorted by document then by position in the text
client.query({
    "find": ["?t"],
    "where": [["token", "?t", {"layer": "Words"}],
              ["span",  "?s", {"layer": "POS", "value": "NOUN"}],
              ["covers", "?s", "?t"]],
    "order_by": [["?t", "doc"], ["?t", "begin"]],
})

The variable must be one you find (you can only sort by a returned column).
You can sort by any of that variable’s fields — see the [Field access] table for the full list per kind (e.g. a document by name, a text by body) — including a metadata / config path, which is only expressible with the dotted form (["?s.metadata.score", "desc"]).
Direction is "asc" (default) or "desc". Missing values (e.g. a null precedence) always sort last, either direction.
Ordering applies across the whole result, including or / seq queries (which Plaid merges into one result). It is not allowed with an aggregate return.

16. Return shapes

Set return to choose the result shape.

16.1. IDs

The default. results is a list of rows, one ID per find variable in order; columns echoes the variable names without ?.

{"return": "ids",
 "columns": ["s1", "s2"],
 "results": [["<id>", "<id>"], ...],
 "count": 12,
 "truncated": false}

16.2. Entities

The same envelope, but each cell is the full entity object, exactly the shape the corresponding GET endpoint returns (id, layer, value, and nested data like a span’s ordered token list or a relation’s endpoints). Hydrated in one pass, no N+1. A find column that holds a layer variable comes back as just {"id": …}.

r = client.query({..., "return": "entities"})
r["results"][0][0]
# -> {"id": "...", "layer": "...", "value": "NOUN", "tokens": ["..."], ...}

16.3. Count

A scalar count of distinct matches. It ignores limit and is computed up to 100,000; a larger result reports count: 100000, truncated: true. It counts distinct tuples of the find variables, so binding extra entities can change the number (see the fan-out note under Aggregates).

{"return": "count", "count": 38, "truncated": false}

16.4. Aggregates

Instead of a keyword, return can be an aggregate spec: group the matches by one or more variables and reduce each group.

client.query({
    "where": [["token", "?t", {"layer": "Words",
                               "doc": {"var": "?d"},
                               "begin": {"var": "?b"}}]],
    "return": {"group": ["?d"],
               "aggregates": [["count"], ["min", "?b"],
                              ["max", "?b"], ["avg", "?b"]]},
})
# -> {"return": "aggregate",
#     "columns": ["d", "count", "min_b", "max_b", "avg_b"],
#     "results": [["<doc1>", 137, 0, 980, 412.6], ...],
#     "count": 12, "truncated": false}   # count = number of group rows

Ops: count (counts matches, no source), and sum / avg / min / max over a value variable. sum / avg assume the value is numeric (a value is JSON-decoded first); min / max work on anything comparable.
Group keys are bound variables (an entity groups by its ID; a value variable groups by its value). "group": [] gives one overall row. Result columns are the group variable names followed by one column per aggregate (op for count, op_var otherwise).
Groups are not matches, so they are not subject to the 1000/100,000 row cap, but an explicit limit is honored (clamped to a backstop of 100,000 group rows); past that the result is truncated. The 30-second time limit still applies. The envelope’s top-level count is the number of group rows returned.
find and order-by are not used with an aggregate return.
When the pattern uses or / seq, every branch must bind the same set of entity and layer variables, not just the same find vars, else 400.

Grouping matters. A "match" is a binding of all the query’s variables, so a one-to-many join repeats the aggregated value once per match. If you bind extra entities (e.g. add ["covers", "?s", "?t"]), a span’s value is counted once per token it covers: sum / avg over ?s inflate, and the aggregate count reports span×token pairs, not distinct spans. (min / max are unaffected.) Keep the where pattern down to the entity you are measuring, plus pure filters, when you want faithful sum / avg / count. This is also why a group: [] count differs from return: "count": the former counts full matches, the latter counts distinct tuples of the find variables.

17. Result-size guardrails

ids / entities queries are paginated by limit:

No limit: 1000 rows.
An explicit limit is honored up to a hard cap of 100,000 (a larger value is clamped). ids results are cheap; an entities result this large is slow, since each row is hydrated with a fetch.
truncated: true means you hit the effective limit and there may be more.

There is no cursor yet; to page a large result, add more selective clauses or query per project. count is bounded separately (above).

Every query is also bounded by a 30-second time limit; a query that exceeds it is aborted with HTTP 408. If you hit it, the pattern is too broad: add a more selective clause or tighten scope.

Two structural limits bound the pattern itself: an or / seq expansion of more than 128 branches is rejected, as is clause nesting deeper than 64.

18. Errors

Author errors return HTTP 400 with a message:

{"error": "No span layer matching \"poss\" is visible in the queried project scope"}

The clients raise this as an exception (PlaidAPIError in Python). Common causes: an unbound or duplicate find variable, an unknown clause or constraint key, a layer reference that does not resolve or is ambiguous, an unbounded seq quantifier, a value variable in find, an empty project scope, or an unsupported return. Passing an as-of / time-travel parameter is also a 400: queries run against current state only.

A query that runs too long returns 408. Internal failures return 500 with a generic message (details are logged server-side, not exposed).

19. Clients and casing

The official clients pass your query body through their normal request/response pipeline, which converts naming conventions automatically:

Top-level and scope keys are recased to the wire format. Python order_by / project_ids and JavaScript orderBy / projectIds both become order-by / project-ids on the wire.
Clause heads, variables, and values are plain string values, and pass through untouched: write "span", "?s1", "vocab-link", "NOUN" literally in every language. Constraint keys inside a clause (layer, value, begin, …) are single lowercase words and are identical across languages.
Response entity objects are recased and namespace-stripped, so a span comes back as {"value": "NOUN", "tokens": […]} (Python snake_case / JavaScript camelCase), not {"span/value": …}.

The same query in JavaScript; only the language idiom differs:

const result = await client.query({
  find: ['?d', '?n'],
  where: [
    ['seq', { layer: 'Words' },
      ['span', { layer: 'UPOS', value: 'DET' }, 'as', '?d'],
      ['?', ['span', { layer: 'UPOS', value: 'ADJ' }]],
      ['span', { layer: 'UPOS', value: 'NOUN' }, 'as', '?n']],
  ],
  scope: { projectIds: ['<uuid>'] },   // -> project-ids on the wire
  return: 'entities',
  limit: 20,
});
// result.results[0][0] -> { id: '...', value: 'DET', tokens: ['...'] }

20. Worked examples

These run against a Universal Dependencies project with a token hierarchy Sentences › Words › Morphemes, a UPOS span layer, and a Dependency Relations relation layer. Layers are addressed by path; swap in ids for real code.

P = "My UD Project"

# how many NOUNs?  -> {"return": "count", "count": 38, "truncated": false}
client.query({
    "find": ["?s"],
    "where": [["span", "?s", {"layer": f"{P}/UPOS", "value": "NOUN"}]],
    "return": "count",
})

# determiner immediately followed by a noun, as full entities
client.query({
    "find": ["?d", "?n"],
    "where": [["seq", {"layer": f"{P}/Morphemes"},
               ["span", {"layer": f"{P}/UPOS", "value": "DET"}, "as", "?d"],
               ["span", {"layer": f"{P}/UPOS", "value": "NOUN"}, "as", "?n"]]],
    "return": "entities",
    "limit": 20,
})

# every nsubj dependency whose dependent is sentence-initial
client.query({
    "find": ["?r"],
    "where": [
        ["relation", "?r", {"layer": f"{P}/Dependency Relations",
                            "value": "nsubj", "target": "?dep"}],
        ["covers", "?dep", "?td"],
        ["within", "?td", "?s"],
        ["token", "?s", {"layer": f"{P}/Sentences"}],
        ["first-in", "?td", "?s"],
    ],
})

# tokens linked to a vocab item
client.query({
    "find": ["?t"],
    "where": [["vocab", "?v", {"form": "Kemal"}],
              ["vocab-link", "?t", "?v"]],
})

# restrict to one project by ID
client.query({
    "find": ["?s"],
    "where": [["span", "?s", {"layer": "UPOS", "value": "NOUN"}]],
    "scope": {"project_ids": ["5b7ce985-...-6d1186cbd822"]},
})