|
Note
|
This is the reference for Plaid’s query language: the full clause set, the wire shape, and the engine’s limits. For an example-led introduction, see the Querying chapter of the manual. |
A query searches Plaid data across every project you can read, in one
request. You describe a pattern, the entities (spans, tokens, relations,
vocab items) you are looking for and the relationships between them, and the
engine returns every match. It is exposed as POST /api/v1/query and as a
query() method on the official clients. Examples here use the Python client,
with a JavaScript snippet where the two differ (see Clients and casing).
1. Mental model
A query is a pattern you match against your data: you name the things you are looking for, say how they relate, and Plaid returns every combination that fits.
-
You declare variables (
?s,?t, …) for the entities you want. -
You write a list of clauses. Each clause either describes a single entity (a span on the
poslayer with valueNOUN, call it?s) or relates two of them (?s covers ?t). -
All clauses must hold at once (an implicit AND); reusing a variable in two clauses joins them, so the name must stand for the same thing in both places.
-
You ask for some of those variables back, as ids, full entities, or a count.
Disjunction (or) and negation (not) are written as explicit clauses, and the
seq shorthand covers the common "this token, then that token" case. All of them
are built from the same handful of basic clauses.
from plaid_client import PlaidClient
client = PlaidClient.login(
"http://localhost:8085", "you@example.com", "password")
# every NOUN immediately followed by a VERB, across all readable projects
result = client.query({
"find": ["?s1", "?s2"],
"where": [
["span", "?s1", {"layer": "pos", "value": "NOUN"}],
["span", "?s2", {"layer": "pos", "value": "VERB"}],
["covers", "?s1", "?t1"], ["covers", "?s2", "?t2"],
["precedes", "?t1", "?t2"],
],
})
# -> {"return": "ids", "columns": ["s1", "s2"],
# "results": [["<noun-id>", "<verb-id>"], ...],
# "count": N, "truncated": false}
2. Request shape
A query is a JSON object. The wire keys are below; the clients recase them from your language’s idiom (see Clients and casing).
| Key | Required | Meaning |
|---|---|---|
|
usually |
Non-empty list of variables to return, in column order. Omitted only with an aggregate |
|
yes |
Non-empty list of clauses (the pattern). |
|
no |
Restrict to specific projects, by id (see Scope and access control). Default: every project you can read. |
|
no |
Max rows. Default 1000, hard cap 100,000 (see Result-size guardrails). |
|
no |
Sort the rows (see Ordering). Without it, row order is unspecified. |
|
no |
|
|
no |
Substitute |
|
Note
|
For legibility, examples in this reference write a layer as a readable name
(e.g. pos, UPOS) where a real query carries the layer’s UUID — layers are
addressed by id only (see Layer addressing). Read "layer": "pos" as "the id of
the layer called pos".
|
3. Variables
A variable is a string beginning with ?: "?s1", "?token", "?head".
The name is arbitrary; what matters is that the same name in two clauses refers to
the same entity (that is the join). Every variable in find must be bound by some
positive clause in where. Names beginning with ?__ are reserved for the
engine; a find variable may not use one.
4. Bindings
If some value, like the ID of a layer, is known before the query is executed, you may put it into the query as a literal, like the span layer ID below:
{
"find": ["?s"],
"where": [["span", "?s", {"layer": "0194e8a1-…-uuid", "value": "NOUN"}]],
}
However, it’s often more convenient to use bindings instead, where a variable
takes the value’s place, and the variable is separately bound in a "bindings"
map:
{
"find": ["?s"],
"where": [["span", "?s", {"layer": "?lyr", "value": "?tag"}]],
"bindings": {"?lyr": "0194e8a1-…-uuid", "?tag": "NOUN"}
}
This is a convenience for you: these two example queries are behaviorally identical, and under the hood, before the latter query is executed, it is transformed into the former query by having all bound variables substituted with their bound values.
A binding value is a scalar (string / number / boolean) or a non-empty
list of scalars. A list in a value position matches any of them
(see Value matching):
# with bindings {"?tags": ["NOUN", "PROPN"]}
["span", "?s", {"layer": "UPOS", "value": "?tags"}]
Bindings are strict. Every binding must be referenced, and a placeholder may only
stand where a literal is allowed; using one where a variable is required (in
find, as the entity slot of a clause, in order-by) is an error.
5. Entity clauses
An entity clause is [kind, "?var", {constraints}]. The kind picks the entity
type (and its table and access-control scope); the constraint map filters it.
| Kind | Constraints | Notes |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
Vocab items are global; scoped via project grants (see Scope and access control). |
|
|
A document. |
|
|
A document’s text content; |
All constraints are optional, but a layer-less entity is still confined to your
readable projects. Two clauses on the same variable conjoin their constraints:
splitting compatible constraints across clauses combines them, while contradictory
ones (a span ?s with value NOUN in one clause and VERB in another) match
nothing.
["span", "?s", {"layer": "UPOS", "value": "NOUN"}] # a NOUN annotation
["token", "?t", {"layer": "Words"}] # any word token
["token", "?t", {"layer": "Words", "begin": 0}] # a word starting at 0
["relation", "?r", {"layer": "deprel", "value": "nsubj"}] # an nsubj edge
["vocab", "?v", {"form": "Kemal"}] # the lexeme "Kemal"
6. Field access (dot paths)
A ?variable followed by a dot path (?t.begin) reads one of that entity’s
fields. The result is a scalar, usable in a predicate, in
order-by, or in an aggregate. So to keep only the
tokens that begin at or after offset 5:
["token", "?t", {"layer": "Words"}],
[">=", "?t.begin", 5]
Supported names, by type:
| Kind | Readable fields |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
layer variable |
|
doc is the document id and id is the entity’s own id. metadata and
config go deeper: the segment after them is a key, and you can keep descending
into nested objects (?s.metadata.author.name, ?sl.config.editor.color).
layer (every annotation entity — span/token/relation/vocab) and source/target
(a relation’s endpoint spans) are reference fields — opaque id columns, the
dot-path spelling of the layer / source / target constraint slots. Like other ids they take only =, !=,
and in (no ordering), and they compare to a variable or an id, never a
name: ["=", "?s.layer", "?sl"] joins to a layer variable, and
["=", "?r.source", "?sp"] is the same as the source
clause. A bare layer name on the right (["=", "?s.layer", "pos"]) is a
400 — match a layer by name with the layer constraint or a layer-variable
clause instead.
The field name itself (begin, value, the metadata/config word) is part
of the query language, so it is matched loosely: case and -/_ are ignored,
and you may spell it in whatever idiom your client uses (?t.begin and
?t.Begin are the same). The keys after metadata/config, though, are your
data: they are matched exactly, case-sensitive and byte for byte
(?s.metadata.caseMarker is not ?s.metadata.casemarker), mirroring how
metadata and config are stored. A key containing a literal . can’t be reached
with a dot path. Use the metadata constraint for those.
7. Constraints
A constraint slot can be filled in several ways. These forms apply wherever a
constraint map appears, in entity clauses and in seq elements alike.
7.1. Value matching
A scalar constraint can be given four ways. Which forms are allowed depends on the key:
| Form | Meaning | Allowed on |
|---|---|---|
Literal |
exact equality |
any key |
List |
any of these |
|
Regex |
pattern match (see Regular expressions) |
|
Variable |
bind the column, not filter (see Value variables) |
|
layer is not in any of these lists: it must resolve to a single layer (see
Layer addressing) or hold a layer variable.
7.2. Regular expressions
A regex spec is {"regex": "<pattern>"}, optionally with "flags": "i"
(case-insensitive, the only flag). Patterns are Java regular expressions, the
syntax accepted by
java.util.regex.Pattern.
Matching is a substring search against the decoded value: a pattern matches
anywhere unless you anchor it with ^ / $. Stick to common syntax
(. * + ? [ ] ^ $ |) for patterns that port to other regex engines. A malformed
pattern, or one longer than 512 characters, is a 400.
["span", "?s", {"layer": "Lemma", "value": {"regex": "^walk"}}]
["vocab", "?v", {"form": {"regex": "ция$", "flags": "i"}}]
7.3. Value variables
Write {"var": "?v"} where a literal would go and the clause binds that column
instead of filtering it. The same variable in two clauses joins them, so you can
require two entities to share a value without knowing it up front:
# two DISTINCT spans on the same layer with the same value
["span", "?a", {"layer": "?L", "value": {"var": "?v"}}],
["span", "?b", {"layer": "?L", "value": {"var": "?v"}}],
["!=", "?a", "?b"]
The explicit {"var": …} wrapper is load-bearing: a plain string is always a
literal, so a real value like "?x" is never mistaken for a variable. Value
variables are join / comparison helpers only; they bind a value, not an entity, so
they cannot appear in find.
7.4. Metadata
Any entity can carry metadata, an arbitrary JSON object. It is therefore common to want to constrain an entity query by just a certain portion of the whole JSON object. There are some special semantics at play for accomplishing this:
-
A plain value matches by equality, JSON type included:
{"score": 5}matches the number5, not the string"5", and{"ok": true}matches the boolean, not the string"true". -
A list means "any of these", a shorthand for several separate literals, not a value to match as a whole. The
genreexample matches a document whosegenreis the string"news"or the string"opinion"; it does not match one whosegenreis the array["news", "opinion"]. -
A regex runs against the value’s text: for a scalar, the decoded value (a string without its surrounding quotes, so
^[0-9]+$matches the string"123"); for an array or object, the serialized JSON (for example["news","opinion"]). That serialized form is the only way a regex reaches inside a non-scalar value.
Some examples:
["span", "?s", {"layer": "POS", "metadata": {"translation": "dog"}}]
["span", "?s", {"layer": "POS", "metadata": {"sense": {"regex": "^[0-9]+$"}}}]
["document", "?d", {"metadata": {"genre": ["news", "opinion"]}}]
Matching a metadata value that is itself an array or object is not well
supported. If you need to do so, either restructure the metadata to remove
the array/object, or match the serialized JSON with a regex. Suppose genre is
stored as the array ["news", "opinion"]: a regex sees it as the text
["news","opinion"], so a pattern can pick out an element inside it. For example,
to match documents whose genre array includes "news":
# genre stored as e.g. ["news", "opinion"]; match the serialized text
["document", "?d", {"metadata": {"genre": {"regex": "\"news\""}}}]
This is brittle: the regex has to match SQLite’s exact serialization (compact, no spaces, with object keys in whatever order they were stored), so reach for it only as a last resort.
Keys are arbitrary strings, kept verbatim and case-sensitive. The lookup is indexed against the entity you have already narrowed to, so it is cheap, but metadata is free-form: treat it as a filter on a search you have already narrowed, not a primary search key.
7.5. Filtering by document
An annotation’s doc is its document’s ID. To select by document name (or any
document property), bind the annotation’s doc to a value variable and equate it
to a document clause:
["span", "?s", {"layer": "POS", "doc": {"var": "?dv"}}],
["document", "?d", {"name": {"regex": "^interview-"}}],
["=", "?dv", "?d"] # the span's document_id equals this document's id
Two annotations sharing the same doc value variable means "in the same document"
without a document clause at all.
8. Relationship clauses
A relationship clause is [op, "?a", "?b"], a named edge between two variables.
This is where the structure between entities comes from. Each operator fixes the
entity kind of both operands, shown as (kind, kind) after the signature; a
variable used at the wrong kind (a span where a token is required) is a 400. Only
spans, tokens, relations, and vocab items take part in these clauses; documents
and texts do not.
In the diagrams below, tokens are boxed words, a bracket marks the tokens a span covers, and an arrow is a relation edge.
Coverage. ["covers", "?span", "?token"] (span, token): the span
includes that token.
┌─ ?s ─┐
[ Fido ] [ barks ]
?t
Precedence. ["precedes", "?t1", "?t2"] (token, token): ?t2 is the
immediate next token after ?t1.
[ dogs ][ run ]
?t1 ?t2
└──▶──┘
["precedes*", "?t1", "?t2"] (token, token): ?t1 comes somewhere before
?t2, not just right before.
[ the ][ big ][ dog ]
?t1 ?t2
└──── … ───▶┘
"Next" and "before" follow Plaid’s standard reading order (begin, precedence,
end, id), the order tokens appear in as you read a document. Both are scoped to a
single text and a single token layer: a word token never precedes a morpheme
token, and neither relation reaches across texts.
Nesting. ["within", "?child", "?parent"] (token, token): the extent of
?child sits inside the extent of ?parent.
word [ replayed ]
morphemes [re][play][ed]
?child
First token. ["first-in", "?token", "?container"] (token, token):
?token is within ?container and is the first token of its layer there.
sentence [ the dog runs ]
words [the][dog][runs]
?token
Both compare tokens by their code-point offsets, so they work across layers: naming
child and parent on different token layers (morpheme within word within sentence)
is how you walk a hierarchy. Nesting is non-strict, so a child whose extent
exactly fills its parent still counts, but a token is never within itself.
first-in also requires that no token on the same layer as ?token in the
container comes before it in that same reading order.
Overlap. ["overlaps", "?a", "?b"] (span, span): the two spans share at
least one token.
[a][b][c][d] ?x └───────┘ ?y └────┘
Containment. ["contains", "?a", "?b"] (span, span): span ?a covers
every token span ?b does.
[a][b][c] ?a └───────┘ ?b └─┘
Coextension. ["coextensive", "?a", "?b"] (span, span): the two spans
cover exactly the same tokens.
[a][b][c] ?x └───────┘ ?y └───────┘
All three compare spans by the set of tokens each covers, so a discontinuous span
works too, and all three exclude a span from matching itself. That is why
coextensive is useful: it finds two different spans over the same tokens (e.g. a
POS span lined up with an NER span).
Source. ["source", "?relation", "?span"] (relation, span): the span at
the relation’s source end.
?relation
[ Fido ] ──────▶ [ barks ]
?span
Target. ["target", "?relation", "?span"] (relation, span): the span at
the relation’s target end.
?relation
[ Fido ] ──────▶ [ barks ]
?span
The two ends can be written inline on the relation clause or as separate edges;
the two are equivalent:
["relation", "?r",
{"layer": "deprel", "value": "nsubj", "source": "?h", "target": "?d"}]
# is the same as:
["relation", "?r", {"layer": "deprel", "value": "nsubj"}],
["source", "?r", "?h"],
["target", "?r", "?d"]
Reachability. ["related*", "?a", "?b", {layer}] (span, span): ?b is
reached from ?a by following one or more relation edges on the named layer.
[x] ──▶ [y] ──▶ [z] ?a ?b
The trailing map is required: it names the relation layer, and may add a
value each edge must carry (a literal or a list of literals only; a regex or
value variable there is a 400). It follows one or more
hops, never zero, so ?a related* ?a matches only through a cycle. Every span
reached along the path must itself sit in a span layer within your scope, so the
search never crosses into a project you cannot read.
# every node in ?head's dependency subtree, at any depth
["related*", "?head", "?node", {"layer": "dep"}]
Vocab link. ["vocab-link", "?token", "?vocab"] (token, vocab): the
token is linked to that vocab item.
[ Fido ] ──▶ vocab { form: "Fido" }
?token ?v
9. Predicate clauses
A predicate compares two already-bound terms: [op, a, b] where op is one of
=, !=, <, >, <=, >= and each term is a variable, a
field path, or a literal.
["!=", "?s1", "?s2"] # two DIFFERENT spans (compares entity ids)
["=", "?v1", "?v2"] # two value variables that must be equal
["<", "?n", 5] # a value variable bound to begin, vs a literal
["=", "?s.value", "NOUN"] # a field path vs a literal
-
On entity variables only
=/!=are allowed (ids have no order); this is how you say "two distinct matches" and keep an entity from matching itself. -
On value variables and literals, all six operators work.
-
Both variable terms must be bound elsewhere in
where; a predicate filters, it never introduces a variable. Predicates are not allowed insidenot.
Two more predicates work on a field path (or, for in, also a variable):
["~", "?s.value", "^N"] # regex match (bare pattern)
["~", "?s.value", {"regex": "^n", "flags": "i"}] # …with flags
["in", "?s.value", ["NOUN", "PROPN"]] # membership / alternation
-
~matches a text field (value,form,name,body, ametadata/configvalue, or a token’s surfacevalue) against a regular expression. The right side is a bare pattern string or a{"regex": …, "flags": "i"?}spec (only thei— case-insensitive — flag is supported). Regexes run against the decoded text, so anchors (^,$) work.~on a number or id field is a 400. -
inkeeps rows whose left term equals one of the listed literals — the predicate form of a list constraint. The list must be non-empty literals. On a reference field the members must be ids, not names. -
Like the comparison predicates,
~andinare not allowed insidenot.
9.1. Constraints, two ways
Most things you can say in an entity clause’s constraint map you can also say as a standalone predicate over a field path — the same filter, two spellings, the same result. The constraint map is the compact, common form; the predicate ("triple") form is the composable, Datalog-style one. Both are first-class; neither is preferred, and neither is going away. Pick whichever reads better.
| Constraint map | Equivalent predicate form |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The triple form’s strength is composition: a bare ["span", "?s"] plus
predicates lets you filter on fields of different entities and join them, where
a single constraint map only describes one entity.
The two forms return the same rows for the categorical (string) values that
annotations almost always carry. The one nuance is a numeric annotation value:
a predicate/field-path reads it as a number (so ["=", "?s.value", 5] and
["in", "?s.value", [5]] match a stored 5.0), while the constraint map
compares the stored text ({"value": 5} does not). Stick to one form when
comparing numeric annotation values.
10. Sequences (seq)
A seq clause walks one token layer; each element matches a token at the next
position, and consecutive elements are adjacent by precedes. It is the readable
form of a chain of covers + precedes.
# a determiner immediately followed by a noun, over the Words layer
["seq", {"layer": "Words"},
["span", {"layer": "UPOS", "value": "DET"}, "as", "?d"],
["span", {"layer": "UPOS", "value": "NOUN"}, "as", "?n"]]
seq over Words: DET then NOUN
Words [ the ][ dog ]
└───▶───┘
UPOS [DET ] [NOUN ]
?d ?n
Each element is [kind, {constraints}], optionally followed by "as", "?var" to
capture it. A span element matches a token that the span covers; a token
element matches the sequence token directly. An element’s constraint map accepts
the same keys as the standalone clause of that kind (a span element takes
value / doc / metadata, a token element value / begin / end / metadata). The
seq config map takes layer (required) and an optional doc to pin the whole
sequence to one document.
10.1. Quantifiers
Wrap an element to repeat it:
| Form | Meaning |
|---|---|
|
0 or 1 |
|
between |
# DET, optional ADJ, NOUN
["seq", {"layer": "Words"},
["span", {"layer": "UPOS", "value": "DET"}, "as", "?d"],
["?", ["span", {"layer": "UPOS", "value": "ADJ"}]],
["span", {"layer": "UPOS", "value": "NOUN"}, "as", "?n"]]
DET ( ADJ )? NOUN ( )? matches ADJ 0 or 1 times
?d ?n
matches the dog (DET NOUN)
the big dog (DET ADJ NOUN)
Only fixed (non-quantified) elements may be named with "as"; a quantified
element is anonymous filler. Unbounded quantifiers (*, +) are not supported;
use a bounded rep. A bounded quantifier is tried at each allowed length, so the
query above matches both DET NOUN and DET ADJ NOUN.
11. Disjunction (or)
For "this OR that", use ["or", group, group, …] where each group is a list of
clauses (all of which must hold together). The query matches if any group matches.
# a token tagged NOUN or VERB
["or", [["span", "?s", {"layer": "UPOS", "value": "NOUN"}]],
[["span", "?s", {"layer": "UPOS", "value": "VERB"}]]]
?s is NOUN or VERB:
┌─ NOUN
?s ──┤
└─ VERB
Surrounding clauses (the ones outside the or) apply to every branch. Rules:
-
At least 2 groups, each a non-empty list of clauses.
-
Every
findvariable must be bound in every group, with the same kind in each, or it is a 400. -
ormay nest, and a group may contain aseq. -
Each group runs as its own query and the results are combined, with duplicates removed (a row matching two groups appears once).
For "one field is one of several values", prefer a value list
({"value": ["NOUN", "PROPN"]}), which is simpler and faster than separate or
branches.
12. Negation (not)
["not", clause, …] takes one or more clauses and matches when they have no
joint match in your data.
# words with no NOUN annotation on them
["token", "?t", {"layer": "Words"}],
["not", ["covers", "?s", "?t"],
["span", "?s", {"layer": "UPOS", "value": "NOUN"}]]
words with no covering NOUN span:
[ runs ] ✓ kept: no NOUN span covers it
?t
┌NOUN┐
[ dog ] ✗ dropped: a NOUN span covers it
?t
What the not means depends on which variables it shares with the rest of the
query:
-
A variable also used outside the
notis held to that binding, so thenotis checked once per value. Above,?tappears outside, so thenotreads "this particular?thas no covering NOUN span." -
A variable used only inside the
notis existential: "there is no such thing." Here?sis only inside, giving "there is no NOUN span covering?t." Such a variable stays local to thenotand cannot be afindvariable.
The body may itself contain or, seq, or another not.
13. Layer addressing
Wherever a clause takes a layer, a scalar reference is the layer’s ID (its
UUID) — that is the only way to name a specific layer. There is no addressing by
layer name or Project/Layer path: names are non-unique across a
multi-tenant instance (two unrelated projects routinely share a layer name), so
resolving one to a single layer is brittle, whereas an ID always identifies exactly
one layer.
["span", "?s", {"layer": "98ef7a32-...-1327101afee3"}] # by id
A non-ID reference (a name or path) is a 400. In practice this is no burden:
anything that knows a layer’s name fetched it from the API and already holds its ID,
so it just passes the ID. To select a layer by name, bind it with a layer variable
and a *-layer clause (next section) — the match is then explicit, and you can see
every layer it binds instead of a name silently resolving to one.
13.1. Layer variables
Instead of a reference, the layer slot can hold a variable ("?sl"), binding
the entity’s layer as a first-class node. This does three things a reference
cannot.
Same-layer join: two entities sharing a layer variable are forced onto the same (otherwise unspecified) layer:
# a NOUN and a VERB span on the SAME layer, whichever it is
["span", "?a", {"layer": "?sl", "value": "NOUN"}],
["span", "?b", {"layer": "?sl", "value": "VERB"}]
Project the layer: put a layer variable in find to get the layer ID back.
Match by name: constrain a layer variable with a -layer clause. Since a scalar
reference is an ID only, this is how you select layers by *name — and because it
binds a variable, it transparently matches every same-named layer in your scope
(across projects), rather than resolving to one:
# every NOUN span on a layer named "pos", in ALL readable projects
["span", "?s", {"layer": "?sl", "value": "NOUN"}],
["span-layer", "?sl", {"name": "pos"}]
The layer-constraint clause matches the entity’s kind: span-layer,
token-layer, relation-layer, vocab-layer. It constrains by name;
an unconstrained layer variable ranges over every layer of its kind in
scope. A layer variable used for two different kinds is a 400 kind conflict.
13.2. Layer structure
A layer clause can also name the layer’s immutable parent layer: the edge
the data model records (a token layer’s text layer and optional parent token
layer, a span layer’s token layer, a relation layer’s span layer). The slot is
named after the attribute and holds either a layer variable (binding the parent
as a node you can constrain further or project) or a scalar layer reference
(resolved to one layer), exactly like the layer slot on an entity.
| Layer clause | Structural slots |
|---|---|
|
|
|
|
|
|
Chain the slots through shared variables to filter an entity by an ancestor layer:
# a span whose span layer's token layer is under a text layer "Transcription"
["span", "?s", {"layer": "?sl"}],
["span-layer", "?sl", {"token-layer": "?tl"}],
["token-layer", "?tl", {"text-layer": "?txtl"}],
["text-layer", "?txtl", {"name": "Transcription"}]
A scalar reference pins the parent in a single clause. Like the layer slot on an
entity, it is a layer ID (see Layer addressing); to pin the parent by name
instead, bind it as a variable and name it with a *-layer clause (the chained form
above):
# the token layer's text layer, pinned by id
["token-layer", "?tl", {"text-layer": "98ef7a32-...-1327101afee3"}]
Token-layer nesting (sentence > word) uses parent-token-layer; the inverse
direction (a text layer’s token layers) is written from the child side:
# the token layer named "words" whose parent layer is named "sentences"
["token-layer", "?w", {"parent-token-layer": "?s", "name": "words"}],
["token-layer", "?s", {"name": "sentences"}]
Unlike the other layer kinds, text-layer has no parent slot of its own — its
parent is the project, which scope already confines. It is otherwise an ordinary
layer variable (match it by name, project it in find); and like any
layer variable, a structural join only ever reaches layers in projects you can read.
14. Scope and access control
A query only ever sees data in projects you can read; access control is applied server-side from your authenticated identity and cannot be widened by the request. By default the scope is all your readable projects.
Narrow it with scope, by project ID (projects, like layers, are identified by
id only — project names are non-unique):
"scope": {"project_ids": ["<uuid>", "<uuid>"]}
The requested scope is intersected with what you can read, so you can only ever narrow, never widen. If that intersection is empty (you scoped only to projects you cannot read, or you can read none), the query is a 400, not an empty result. Vocab layers are global; a vocab item is visible when some project in your scope has been granted its layer.
15. Ordering
Without order-by, rows come back in an unspecified order. order-by is a list of
entries applied left to right; each entry is a dot path
(["?t.begin"], or ["?t.begin", "desc"]) — or, equivalently, the longer
[variable, attribute] / [variable, attribute, "desc"] form:
# NOUNs, sorted by document then by position in the text
client.query({
"find": ["?t"],
"where": [["token", "?t", {"layer": "Words"}],
["span", "?s", {"layer": "POS", "value": "NOUN"}],
["covers", "?s", "?t"]],
"order_by": [["?t", "doc"], ["?t", "begin"]],
})
-
The variable must be one you
find(you can only sort by a returned column). -
You can sort by any of that variable’s fields — see the [Field access] table for the full list per kind (e.g. a document by
name, a text bybody) — including ametadata/configpath, which is only expressible with the dotted form (["?s.metadata.score", "desc"]). -
Direction is
"asc"(default) or"desc". Missing values (e.g. a nullprecedence) always sort last, either direction. -
Ordering applies across the whole result, including
or/seqqueries (which Plaid merges into one result). It is not allowed with an aggregatereturn.
16. Return shapes
Set return to choose the result shape.
16.1. IDs
The default. results is a list of rows, one ID per find variable in order;
columns echoes the variable names without ?.
{"return": "ids",
"columns": ["s1", "s2"],
"results": [["<id>", "<id>"], ...],
"count": 12,
"truncated": false}
16.2. Entities
The same envelope, but each cell is the full entity object, exactly the shape
the corresponding GET endpoint returns (id, layer, value, and nested data like a
span’s ordered token list or a relation’s endpoints). Hydrated in one pass, no
N+1. A find column that holds a layer variable comes back as just {"id": …}.
r = client.query({..., "return": "entities"})
r["results"][0][0]
# -> {"id": "...", "layer": "...", "value": "NOUN", "tokens": ["..."], ...}
16.3. Count
A scalar count of distinct matches. It ignores limit and is computed up to
100,000; a larger result reports count: 100000, truncated: true. It counts
distinct tuples of the find variables, so binding extra entities can change the
number (see the fan-out note under Aggregates).
{"return": "count", "count": 38, "truncated": false}
16.4. Aggregates
Instead of a keyword, return can be an aggregate spec: group the matches by
one or more variables and reduce each group.
client.query({
"where": [["token", "?t", {"layer": "Words",
"doc": {"var": "?d"},
"begin": {"var": "?b"}}]],
"return": {"group": ["?d"],
"aggregates": [["count"], ["min", "?b"],
["max", "?b"], ["avg", "?b"]]},
})
# -> {"return": "aggregate",
# "columns": ["d", "count", "min_b", "max_b", "avg_b"],
# "results": [["<doc1>", 137, 0, 980, 412.6], ...],
# "count": 12, "truncated": false} # count = number of group rows
-
Ops:
count(counts matches, no source), andsum/avg/min/maxover a value variable.sum/avgassume the value is numeric (avalueis JSON-decoded first);min/maxwork on anything comparable. -
Group keys are bound variables (an entity groups by its ID; a value variable groups by its value).
"group": []gives one overall row. Result columns are the group variable names followed by one column per aggregate (opforcount,op_varotherwise). -
Groups are not matches, so they are not subject to the 1000/100,000 row cap, but an explicit
limitis honored (clamped to a backstop of 100,000 group rows); past that the result istruncated. The 30-second time limit still applies. The envelope’s top-levelcountis the number of group rows returned. -
findandorder-byare not used with an aggregatereturn. -
When the pattern uses
or/seq, every branch must bind the same set of entity and layer variables, not just the samefindvars, else 400.
Grouping matters. A "match" is a binding of all the query’s variables, so a
one-to-many join repeats the aggregated value once per match. If you bind extra
entities (e.g. add ["covers", "?s", "?t"]), a span’s value is counted once per
token it covers: sum / avg over ?s inflate, and the aggregate count
reports span×token pairs, not distinct spans. (min / max are unaffected.) Keep
the where pattern down to the entity you are measuring, plus pure filters, when
you want faithful sum / avg / count. This is also why a group: [] count
differs from return: "count": the former counts full matches, the latter counts
distinct tuples of the find variables.
17. Result-size guardrails
ids / entities queries are paginated by limit:
-
No
limit: 1000 rows. -
An explicit
limitis honored up to a hard cap of 100,000 (a larger value is clamped).idsresults are cheap; anentitiesresult this large is slow, since each row is hydrated with a fetch. -
truncated: truemeans you hit the effective limit and there may be more.
There is no cursor yet; to page a large result, add more selective clauses or query
per project. count is bounded separately (above).
Every query is also bounded by a 30-second time limit; a query that exceeds it
is aborted with HTTP 408. If you hit it, the pattern is too broad: add a more
selective clause or tighten scope.
Two structural limits bound the pattern itself: an or / seq expansion of more
than 128 branches is rejected, as is clause nesting deeper than
64.
18. Errors
Author errors return HTTP 400 with a message:
{"error": "No span layer matching \"poss\" is visible in the queried project scope"}
The clients raise this as an exception (PlaidAPIError in Python). Common causes:
an unbound or duplicate find variable, an unknown clause or constraint key, a
layer reference that does not resolve or is ambiguous, an unbounded seq
quantifier, a value variable in find, an empty project scope, or an unsupported
return. Passing an
as-of / time-travel parameter is also a 400: queries run against current state
only.
A query that runs too long returns 408. Internal failures return 500 with a generic message (details are logged server-side, not exposed).
19. Clients and casing
The official clients pass your query body through their normal request/response pipeline, which converts naming conventions automatically:
-
Top-level and
scopekeys are recased to the wire format. Pythonorder_by/project_idsand JavaScriptorderBy/projectIdsboth becomeorder-by/project-idson the wire. -
Clause heads, variables, and values are plain string values, and pass through untouched: write
"span","?s1","vocab-link","NOUN"literally in every language. Constraint keys inside a clause (layer,value,begin, …) are single lowercase words and are identical across languages. -
Response entity objects are recased and namespace-stripped, so a span comes back as
{"value": "NOUN", "tokens": […]}(Pythonsnake_case/ JavaScriptcamelCase), not{"span/value": …}.
The same query in JavaScript; only the language idiom differs:
const result = await client.query({
find: ['?d', '?n'],
where: [
['seq', { layer: 'Words' },
['span', { layer: 'UPOS', value: 'DET' }, 'as', '?d'],
['?', ['span', { layer: 'UPOS', value: 'ADJ' }]],
['span', { layer: 'UPOS', value: 'NOUN' }, 'as', '?n']],
],
scope: { projectIds: ['<uuid>'] }, // -> project-ids on the wire
return: 'entities',
limit: 20,
});
// result.results[0][0] -> { id: '...', value: 'DET', tokens: ['...'] }
20. Worked examples
These run against a Universal Dependencies project with a token hierarchy
Sentences › Words › Morphemes, a UPOS span layer, and a Dependency Relations
relation layer. Layers are addressed by path; swap in ids for real code.
P = "My UD Project"
# how many NOUNs? -> {"return": "count", "count": 38, "truncated": false}
client.query({
"find": ["?s"],
"where": [["span", "?s", {"layer": f"{P}/UPOS", "value": "NOUN"}]],
"return": "count",
})
# determiner immediately followed by a noun, as full entities
client.query({
"find": ["?d", "?n"],
"where": [["seq", {"layer": f"{P}/Morphemes"},
["span", {"layer": f"{P}/UPOS", "value": "DET"}, "as", "?d"],
["span", {"layer": f"{P}/UPOS", "value": "NOUN"}, "as", "?n"]]],
"return": "entities",
"limit": 20,
})
# every nsubj dependency whose dependent is sentence-initial
client.query({
"find": ["?r"],
"where": [
["relation", "?r", {"layer": f"{P}/Dependency Relations",
"value": "nsubj", "target": "?dep"}],
["covers", "?dep", "?td"],
["within", "?td", "?s"],
["token", "?s", {"layer": f"{P}/Sentences"}],
["first-in", "?td", "?s"],
],
})
# tokens linked to a vocab item
client.query({
"find": ["?t"],
"where": [["vocab", "?v", {"form": "Kemal"}],
["vocab-link", "?t", "?v"]],
})
# restrict to one project by ID
client.query({
"find": ["?s"],
"where": [["span", "?s", {"layer": "UPOS", "value": "NOUN"}]],
"scope": {"project_ids": ["5b7ce985-...-6d1186cbd822"]},
})