[ 
https://issues.apache.org/jira/browse/TINKERPOP-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676776#comment-17676776
 ] 

Cole Greer commented on TINKERPOP-2767:
---------------------------------------

I've narrowed down the underlying issue here. This traversal is triggering a 
stack overflow on the server which is not being handled very well. Essentially 
the following traversal is big enough that when it gets expanded out by the 
server, it will blow the stack. (Might need to increase the 'times' amount to 
reproduce depending on your server environment)
{code:java}
g.with("evaluationTimeout", 5000).V(1).repeat(__.out()).times(3000){code}
After the initial stack overflow, the JVM seems to allocate more memory which 
allows a repeated attempt at this traversal to succeed.

It is noteworthy that if the original request is sent as gremlin script, the 
stack overflow exception will surface in the server's logs, and the server will 
send that exception to the client. I've tested this with the console, java 
driver, and go driver and all of them do a reasonable job at handling this 
exception, informing the user, and recovering (although all of them approach 
this somewhat differently).
If the original request is sent as bytecode however, the same stack overflow 
error is still thrown but it is being swallowed somewhere in the server and it 
never surfaces. The server will stop processing that one traversal and continue 
to function normally. The server however does not send any response to the 
client, which is what causes the driver to hang indefinitely. I have seen this 
with the example script in go that Simon provided as well as this equivalent 
example in java:
{code:java}
GraphTraversalSource g = 
traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"));
g.addV("test").property("id", 1).next();
g.addV("test").property("id", 2).next();
g.addE("test").from(__.V(1)).to(__.V(2)).property("id", "e1");
g.addE("test").from(__.V(2)).to(__.V(1)).property("id", "e2");

var result = g.with("evaluationTimeout", 
5000l).V(1).repeat(__.out()).times(3000).next();
System.out.println(result); {code}

> Repeat Out Times traversal hangs indefinitely on first execution
> ----------------------------------------------------------------
>
>                 Key: TINKERPOP-2767
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2767
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: javascript
>    Affects Versions: 3.5.3
>         Environment: Windows 10
>            Reporter: Simon Zhao
>            Priority: Major
>
> Originally encountered when fixing TINKERPOP-2754
>  
> The following traversal in JS seems to cause hanging the first time you run 
> it on a newly launched gremlin-server (3.5.3) via docker
>  
> {{await g.V('1').repeat(_.out()).times(1500).next();}}
>  
> The same hanging occurs in gremlin-go. 
>  
> {code:java}
> _, err = g.With("evaluationTimeout", 
> 1000).V("1").Repeat(gremlingo.T__.Out()).Times(int32(1500)).Next() {code}
>  
> The timeout is optional, but indicates that something is going wrong since it 
> is not returning. Interestingly enough, if the timeout is very low, then it 
> won't hang because it will say the timeout was exceeded. This indicates that 
> if the traversal is completed within the timeout, it's just not returning for 
> some reason on the first call.
>  
> If you were to write a script and invoke this snippet of code, it will hang. 
> If you forcefully terminate the script and rerun it, then it doesn't hang.
>  
> main.go
> {code:java}
> package main
> import (
>    gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
>    "log"
> )
> func main() {
>    driver, err := 
> gremlingo.NewDriverRemoteConnection("ws://localhost:45940/gremlin")
>    if err != nil {
>       log.Print("Err creating DRC")
>       return
>    }
>    defer driver.Close()
>    log.Println("Start")
>    g := gremlingo.Traversal_().WithRemote(driver)
>    LABEL := "test"
>    _, err = g.V().HasLabel(LABEL).Drop().Next()
>    _, err = g.AddV(LABEL).Property(gremlingo.T.Id, "1").Next()
>    _, err = g.AddV(LABEL).Property(gremlingo.T.Id, "2").Next()
>    _, err = 
> g.AddE(LABEL).From(gremlingo.T__.V("1")).To(gremlingo.T__.V("2")).Property(gremlingo.T.Id,
>  "e1").Next()
>    _, err = 
> g.AddE(LABEL).From(gremlingo.T__.V("2")).To(gremlingo.T__.V("1")).Property(gremlingo.T.Id,
>  "e2").Next()
>    if err != nil {
>       log.Println("Error during setup")
>       return
>    }
>    log.Println("Start the problematic traversal")
>    _, err = g.With("evaluationTimeout", 
> 1000).V("1").Repeat(gremlingo.T__.Out()).Times(int32(1500)).Next()
>    if err != nil {
>       log.Println("Error with the problematic traversal, but we didn't hang")
>       return
>    }
>    log.Println("End")
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to