Hi Everyone,
Gremlin Server uses Groovy init scripts to bind traversal sources,
load data, and run lifecycle hooks at startup. This means gremlin-groovy
is enabled in every default config, which exposes us to additional security
concerns that come with groovy scripts. These concerns are already
captured in https://issues.apache.org/jira/browse/TINKERPOP-3107.
This proposal would make Groovy opt-in, introducing three Java-native
replacements.
Groovy should not be fully removed; it should continue to work when
explicitly configured. The Groovy-based hook/binding path would be
deprecated, with a logged warning.
1. Auto-create TraversalSources from Graphs
For each graph in the "graphs" config, the server would automatically
create a TraversalSource if one is not explicitly defined. A graph named
"graph" would produce "g"; others would produce "g_<name>".
A minimal working config would become just:
graphs: {
graph: conf/tinkergraph-empty.properties}
With multiple graphs, each would get its own TraversalSource:
graphs: {
graph: conf/tinkergraph-empty.properties,
modern: conf/tinkergraph-empty.properties}
This would produce "g" (from "graph") and "g_modern" (from
"modern").
2. Declarative traversalSources config in YAML
For cases needing strategy configuration or custom traversal source naming:
graphs: {
graph: conf/tinkergraph-empty.properties}
traversalSources: {
g: {graph: graph},
gReadOnly: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"},
gCustom: {graph: graph, query: "g.with(...)", language: "gremlin-groovy"}}
The "query" field would be a Gremlin expression evaluated with a base
traversal source bound as "g". The "language" field would select the
script engine (defaulting to the sole configured engine, or
gremlin-lang).
Explicit traversal source definitions would suppress auto-creation for that
graph.
3. Java-based LifeCycleHooks in YAML
The existing LifeCycleHook interface already covers the core needs:
onStartUp and onShutDown callbacks. The only gap is that Groovy hooks
accessed graphs and traversal sources through global script engine
bindings, which are not available in a pure Java context. Adding
GraphManager to LifeCycleHook.Context would close that gap.
The LifeCycleHook interface could also gain an init method for
receiving optional configuration from the YAML:
default void init(Map<String, Object> config) {}
Instantiation would use a no-arg constructor + init(config), following
the same pattern as Authenticator.setup().
Hooks would be configured directly in YAML, replacing Groovy closures:
lifecycleHooks:
- className:
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
config: {graph: graph, dataset: modern}
A built-in TinkerFactoryDataLoader hook (in tinkergraph-gremlin) would
replace the generate-modern.groovy / generate-classic.groovy scripts.
Example: Updated Configs
gremlin-server.yaml (currently 20+ lines with Groovy) would become:
host: localhost
port: 8182
graphs: {
graph: conf/tinkergraph-empty.properties}
gremlin-server-modern.yaml would become:
host: localhost
port: 8182
graphs: {
graph: conf/tinkergraph-empty.properties}
lifecycleHooks:
- className:
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
config: {graph: graph, dataset: modern}
gremlin-server-modern-readonly.yaml would become:
host: localhost
port: 8182
graphs: {
graph: conf/tinkergraph-empty.properties}
traversalSources: {
g: {graph: graph, query: "g.withStrategies(ReadOnlyStrategy)"}}
lifecycleHooks:
- className:
org.apache.tinkerpop.gremlin.tinkergraph.lifecycle.TinkerFactoryDataLoader
config: {graph: graph, dataset: modern}
Please let me know if you have any thoughts on the proposed changes.
Thanks,
Cole