Re: [DISCUSS] Graph Schema Interfaces for TP

pieter Sat, 04 Apr 2026 03:24:51 -0700

Hi,

This looks good to me.


#1

Will we not need an additional type to represent the
'EdgeLabel:VertexLabel:Direction'? 

In Sqlg this is called 'EdgeRole'.

class EdgeRole {
   private EdgeType;
   private VertexType;
   private Direction;
}

Consider the following example

VertexType aVertexType = schema.traversal().addVType("A");
VertexType bVertexType = schema.traversal().addVType("B");
VertexType cVertexType = schema.traversal().addVType("C");
EdgeType lovesEdgeType = schema.traversal().addEType("loves")
lovesEdgeType.from("A").to("B");
lovesEdgeType.from("A").to("C");

In this case there are 3 EdgesRoles

loves:A:OUT
loves:B:IN
loves:C:IN

Later the user needs to manipulate the 'loves:C:IN' EdgeRole like
deleting it or changing the multiplicities...
For this the user needs to be able to access the EdgeRole.

EdgeRole cLovesEdgeRole = lovesEdgeType.edgeRole(cVertexType);
cLovesEdgeRole.drop();

#2

Regarding transactions.

Some backends will support transactions for schema and data changes.
Some will support transactions only for data changes. Some databases
like Mariadb will silently commit the current transaction when a schema
change occurs.

Probabaly there are variations on this theme, so it should be up to the
provider to specify the granularity of support.

Further I suggest adding schema.lock/unlock. If locked no schema
changes are allowed. There can also be a transaction scoped unlock()
which will just for the current transaction unlock the schema.

This will add in a safety net for graphs with a well defined schema
where no code should be able to make schema changes.

#3

Regarding File IO.

I am not sure why this is needed. We can simply export/import some
gremlin to manage the schema. We already have the language that does
this, why is a translation to json needed?

Regards
Pieter

On Fri, 2026-04-03 at 02:16 +0000, Cole Greer via dev wrote:
> Hi Everyone,
> 
> The topic of Graph Schema has been discussed extensively in recent
> TInkerPop Gatherings, and the following proposal has emerged from
> these gatherings. I believe it is now ready for broad consideration
> and discussions. I’ve done my best to incorporate initial feedback
> from Josh, Pieter, Valentyn, Stephen, Kris and others into this
> proposal, however I won’t claim that it accurately represents the
> views of anyone other than myself at this time. This is a broad topic
> and I’m deliberately excluding critical topics to focus this thread
> on standardizing interfaces for gremlin users and providers to
> interact with schema (see assumptions for more details).
> 
> ## Overview
> 
> This proposal introduces graph schema interfaces for TinkerPop: a way
> to define vertex types, edge types, and property types as a meta-
> graph that is itself traversable with Gremlin. The schema describes
> the structure of a data graph; what kinds of vertices and edges
> exist, what properties they carry, and how they connect..
> 
> ## Assumptions
> 
> - Type keys are element labels: there is a 1-to-1 mapping between a
> label and a type definition. A vertex labeled "person" corresponds to
> exactly one VertexType, and an edge labeled "knows" corresponds to
> exactly one EdgeType.
> - Java classes are used as a type system: This proposal uses Java
> classes to define property type constraints. This is intended as a
> placeholder to be replaced by a proper type system to be defined via
> a later discussion.
> - This proposal makes very little consideration of if/when/where/how
> validation and enforcement of schema takes place. I believe it is
> important for us to ship something which is flexible and useful to
> providers out of the box as well as leaving space for providers to
> plugin existing implementations or build their own if they desire.
> I’ve left this out of scope for this proposal to focus first on
> interfaces which give providers the appropriate access to schema.
> 
> ## Design Points
> 
> ### 1. Schema-as-Graph
> 
> `GraphSchema extends Graph`. Providers implement a familiar
> interface, and users traverse the schema with schema.traversal().
> This avoids inventing a parallel API surface. The schema is just
> another graph.
> 
> A data graph exposes its schema via Graph.schema(), which returns the
> GraphSchema instance. Providers that don't support schema return
> UnsupportedOperationException by default.
> 
> ### 2. All type definitions are vertices
> 
> VertexType, EdgeType, and PropertyType are all vertices in the schema
> meta-graph.
> 
> - A VertexType vertex represents a vertex label definition (e.g.
> "person", "software").
> - An EdgeType vertex represents an edge label definition (e.g.
> "knows", "created"). Even though it describes edges in the data
> graph, it is itself a vertex in the schema graph, connected to its
> endpoint VertexType vertices via from/to edges.
> - A PropertyType vertex represents a property on a type, connected to
> its parent type vertex via a “hasProperty" edge.
> 
> Property definitions are independent per type, no sharing across
> types.
> 
> Schema graph example for the classic TinkerPop modern graph:
> ```
> (person:vertexType) --hasProperty--> (name:propertyType)
> (person:vertexType) --hasProperty--> (age:propertyType)
> (software:vertexType) --hasProperty--> (name:propertyType)
> (software:vertexType) --hasProperty--> (lang:propertyType)
> (knows:edgeType) --from--> (person:vertexType)
> (knows:edgeType) --to-->   (person:vertexType)
> (knows:edgeType) --hasProperty--> (weight:propertyType)
> (created:edgeType) --from--> (person:vertexType)
> (created:edgeType) --to-->   (software:vertexType)
> (created:edgeType) --hasProperty--> (weight:propertyType)
> ```
> 
> ### 3. Constraints are properties on type vertices
> 
> Rather than a fixed constraint taxonomy, constraints are regular
> properties on type vertices, keyed by string via constraint(key,
> value). This keeps the model extensible such that providers can
> define their own constraints without changes to the core API.
> 
> Constraints can be added to VertexType, EdgeType, and PropertyType
> vertices directly. The most common constraints such as property types
> and required properties would apply to PropertyTypes, while edge
> multiplicity constraints (e.g. one-to-many, one-to-one) are naturally
> expressed as constraints on the EdgeType itself rather than on any
> property.
> 
> While constraint keys are arbitrary strings and providers are free to
> implement any constraints they like, TinkerPop should standardize a
> set of core constraint keys representing the most common constraints.
> Examples include “type", “required", “unique", “minValue",
> “maxValue", etc. Providers that support equivalent constraints are
> encouraged to follow these conventional names for interoperability.
> 
> Non-core constraints (custom to a provider) are encouraged to follow
> a namespaced key convention to avoid collisions, e.g.
> "tinkergraph:notNull". Core constraint keys are unnamespaced.
> 
> ### 4. Schema traversal steps in core Gremlin
> 
> New steps for schema manipulation live directly in
> GraphTraversal/GraphTraversalSource, not in a separate DSL:
> 
> - addVType(label) — creates a VertexType vertex
> - addEType(label) — creates an EdgeType vertex
> - propertyType(name) — creates a PropertyType vertex and connects it
> via hasProperty
> - constraint(key, value) — adds a constraint property to the current
> type vertex
> 
> Example: defining a vertex type with properties:
> ```
> schema.traversal().addVType("person")
>     .propertyType("name").constraint("type",
> String.class).constraint("required", true).constraint("unique", true)
>     .propertyType("age").constraint("type", Integer.class)
> ```
> 
> Example: defining an edge type with endpoint types and a property:
> ```
> schema.traversal().addEType("knows")
>     .from("person").to("person")
>     .propertyType("weight").constraint("type", Double.class)
> ```
> 
> This mirrors the addE().from().to() pattern from the data-graph. Here
> from() and to() take vertex type labels (strings) and create from/to
> edges in the schema graph connecting the EdgeType to the referenced
> VertexType vertices.
> 
> ### 5. Convenience methods for direct access
> 
> The schema-as-graph model is the source of truth, but traversing it
> for simple lookups isn’t always convenient. Direct methods provide
> compact access:
> 
> GraphSchema methods:
> - vertexTypes() → Collection<VertexType>
> - vertexType(String label) → Optional<VertexType>
> - edgeTypes() → Collection<EdgeType>
> - edgeType(String label) → Optional<EdgeType>
> - addVertexType(String label) → VertexType
> - addEdgeType(String label) → EdgeType
> - store(OutputStream):  serialize the schema to a compact JSON
> representation
> - load(InputStream): deserialize and merge a schema from JSON into
> this schema graph
> 
> EdgeType methods:
> - fromVertexTypes() → Collection<VertexType>
> - toVertexTypes() → Collection<VertexType>
> 
> Example:
> ```
> GraphSchema schema = graph.schema();
> 
> // Look up a vertex type
> VertexType person = schema.vertexType("person").orElseThrow();
> 
> // Inspect its properties
> for (PropertyType pd : person.propertyTypes()) {
>     System.out.println(pd.name() + " : " + pd.constraint("type"));
> }
> 
> // Look up an edge type and its connectivity
> EdgeType knows = schema.edgeType("knows").orElseThrow();
> Collection<VertexType> fromTypes = knows.fromVertexTypes();
> Collection<VertexType> toTypes = knows.toVertexTypes();
> ```
> 
> ### 6. Cross-graph jumps
> 
> Two steps bridge the data graph and schema graph:
> 
> - type(): from a data traversal, jump to the element's type
> definition in the schema graph.
> - instances(): from a schema traversal, jump to all matching elements
> in the data graph.
> 
> These compose for round-trip traversals:
> ```
> // Get the type definition for "person" vertices
> g.V().hasLabel("person").type()
> 
> // Get all instances of a schema type
> schema.traversal().vertexType("person").instances()
> 
> // Round-trip: find marko's type, then get all instances of that type
> g.V().has("person", "name", "marko").type().instances()
> ```
> 
> ### 7. Schema restriction strategy
> 
> There are some steps we will want to restrict in both the data graph
> and the schema-graph. addVType() wouldn’t make sense in the data-
> graph, nor would addV() be sensible in the schema-graph. A
> TraversalStrategy can restrict schema traversals to a safe subset of
> Gremlin steps (allowlist-based). This prevents accidentally running
> data element insertions, OLAP computations, complex control flow, or
> side-effect steps against the schema graph. The strategy should be
> auto-registered when traversing a GraphSchema instance.
> 
> The exact allowlist should be a topic for later discussion.
> 
> ### 8. Instance counts on type vertices
> 
> VertexType.instanceCount() and EdgeType.instanceCount() return the
> count of data graph elements matching each type. This is a method
> rather than a property on the type vertex, keeping the schema graph
> definitional (not statistical) and giving providers full
> implementation flexibility.
> 
> Approximate counts are likely acceptable and preferable for
> performance in most cases. However, TinkerPop should not stand in the
> way of providers that prefer exact counts, and should ensure that
> appropriate hooks are in place in reference implementations so that
> providers can maintain exact counts if they so desire.
> 
> Transactional implications need additional consideration. Maintaining
> accurate counts across concurrent writes, rollbacks, and transaction
> isolation levels adds significant complexity. This interacts with the
> broader schema transactions question (see transactions below) and
> should be addressed alongside it.
> 
> ### 9. GLV Support
> 
> Each GLV (Python, JavaScript, .NET, Go) needs:
> 
> - Schema data classes: Parallel classes to the 4 core Java
> interfaces, following the same pattern as existing Vertex and Edge
> classes. These are data containers representing schema objects
> returned from the server:
>   - GraphSchema: holds collections of VertexTypes and EdgeTypes
>   - VertexType: label, full constraints map, and collection of
> PropertyTypes
>   - EdgeType: label, full constraints map, from/to VertexType
> references (same pattern as Edge.outV/Edge.inV), and collection of
> PropertyTypes
>   - PropertyType: name and full constraints map (including data type
> as a constraint)
> - All new gremlin steps are supported from each GLV
> 
> ## Future Questions
> 
> ### Schema validation
> 
> Providers will need lots of flexibility regarding validation modes.
> Some providers may choose to have write-time validation for all
> inserts, others may choose validate an entire graph against a schema
> as a batch job, while others may choose to validate on-commit. For
> our purposes, we need to provide a viable reference implementation,
> as well as ensuring sufficient extension points exist for providers
> to fulfill their needs.
> 
> ### Dynamic schema updates from data writes
> 
> It would be useful to auto-update the schema graph when data writes
> introduce new labels or properties (e.g. addV("newLabel”)
> automatically creates a VertexType). Keeping the schema exactly in-
> sync with such operations may introduce too much overhead for many
> purposes. We should provide appropriate hooks for providers to
> implement such behaviour if desired, or to help providers aggregate
> changes and perform incremental batch updates to the schema.
> 
> ### Transactions
> 
> The schema graph will need to be transactional if the data
> 
> ### File IO
> 
> It is often useful to persist and load schemas to/from files. This
> capability should be build into the GraphSchema class via simple
> store() and load() methods, using a custom compact JSON
> representation of the schema. The specifics of this format are
> deferred to later discussion.
> 
> GraphSchema exposes file IO directly:
> - store(OutputStream): serialize the schema to a compact JSON
> representation
> - load(InputStream): deserialize a schema from JSON and merge it into
> the current schema graph
> 
> Schema file IO should be implemented across all GLVs.
> 
> ## Reference Implementation
> 
> TinkerGraph serves as the reference implementation:
> 
> - TinkerGraphSchema extends TinkerGraph implements GraphSchema
> - TinkerVertexType extends TinkerVertex implements VertexType
> - TinkerPropertyType extends TinkerVertex implements PropertyType
> - TinkerEdgeType extends TinkerVertex implements EdgeType
> - Recursion guard prevents schema-of-schema (TinkerGraphSchema
> overrides initSchema())
> 
> 
> Please let me know any thoughts you may have on the approach. I
> intend to move this into a proposal PR soon, unless there are any
> major disagreements over the design.
> 
> Thanks,
> Cole

Re: [DISCUSS] Graph Schema Interfaces for TP

Reply via email to