Hi Everyone,

Gremlin Server uses Groovy init scripts to bind traversal sources,
load data, and run lifecycle hooks at startup. This means gremlin-groovy
is enabled in every default config, which exposes us to additional security
concerns that come with groovy scripts. These concerns are already
captured in https://issues.apache.org/jira/browse/TINKERPOP-3107.
This proposal would make Groovy opt-in, introducing three Java-native
replacements.

Groovy should not be fully removed; it should continue to work when
explicitly configured. The Groovy-based hook/binding path would be
deprecated, with a logged warning.


1. Auto-create TraversalSources from Graphs

For each graph in the "graphs" config, the server would automatically
create a TraversalSource if one is not explicitly defined. A graph named
"graph" would produce "g"; others would produce "g_<name>".

A minimal working config would become just:

  graphs: {
    graph: conf/tinkergraph-empty.properties}

With multiple graphs, each would get its own TraversalSource:

  graphs: {
    graph: conf/tinkergraph-empty.properties,
    modern: conf/tinkergraph-empty.properties}

This would produce "g" (from "graph") and "g_modern" (from
"modern").


2. Declarative traversalSources config in YAML

For cases needing strategy configuration or custom traversal source naming:

  graphs: {
    graph: conf/tinkergraph-empty.properties}
  traversalSources: {
    g: {graph: graph},
    gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"},
    gCustom: {graph: graph, query: "g.with(...)", language: "gremlin-groovy"}}

The "query" field would be a Gremlin expression evaluated with a base
traversal source bound as "g". The "language" field would select the
script engine (defaulting to the sole configured engine, or
gremlin-lang).

Explicit traversal source definitions would suppress auto-creation for that 
graph.


3. Java-based LifeCycleHooks in YAML

The existing LifeCycleHook interface already covers the core needs:
onStartUp and onShutDown callbacks. The only gap is that Groovy hooks
accessed graphs and traversal sources through global script engine
bindings, which are not available in a pure Java context. Adding
GraphManager to LifeCycleHook.Context would close that gap.

The LifeCycleHook interface could also gain an init method for
receiving optional configuration from the YAML:

  default void init(Map<String, Object> config) {}

Instantiation would use a no-arg constructor + init(config), following
the same pattern as Authenticator.setup().

Hooks would be configured directly in YAML, replacing Groovy closures:

  lifecycleHooks:
    - className: 
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
      config: {graph: graph, dataset: modern}

A built-in TinkerFactoryDataLoader hook (in tinkergraph-gremlin) would
replace the generate-modern.groovy / generate-classic.groovy scripts.


Example: Updated Configs

gremlin-server.yaml (currently 20+ lines with Groovy) would become:

  host: localhost
  port: 8182
  graphs: {
    graph: conf/tinkergraph-empty.properties}

gremlin-server-modern.yaml would become:

  host: localhost
  port: 8182
  graphs: {
    graph: conf/tinkergraph-empty.properties}
  lifecycleHooks:
    - className: 
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
      config: {graph: graph, dataset: modern}

gremlin-server-modern-readonly.yaml would become:

  host: localhost
  port: 8182
  graphs: {
    graph: conf/tinkergraph-empty.properties}
  traversalSources: {
    g: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}
  lifecycleHooks:
    - className: 
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
      config: {graph: graph, dataset: modern}

Please let me know if you have any thoughts on the proposed changes.

Thanks,
Cole

Reply via email to