[graph-tool] dfs/bfs seem to have overhead depending on total graph size

My-Tien Nguyen Fri, 26 Feb 2021 08:31:43 -0800

Hi all,

I’ve got a very large undirected graph (407 mil vertices, 522 mil edges,2 vertex properties) that consists of multiple connected components (ccs).

I noticed that when I call e.g. dfs_iterator or dfs_search on a sourcevertex, it takes around 1 – 2 seconds to return. The upper bound isdepending on the component’s size, but the lower bound seems to be thesame for all components.

I have created a test graph with only a subset of the ccs of the largegraph. Iterating the same cc in the test graph takes only a couple ofmilliseconds instead of ≥ 1 s. This tells me, that the dfs/bfs iteratorshave some kind of overhead depending on the complete graph size.

I wrote a DFSVisitor that collects some timings during iteration tobetter see which steps consume time. Here are the results (rounded forreadability):


   In [30]: test_dfs(graph, graph.vertex(0))

   # first time the function is entered
   visitor.start_vertex_t             2.3e-05 s
   visitor.first_discover_vertex_t       0.18 s
   visitor.first_examine_edge_t          0.18 s
   visitor.first_tree_edge_t             0.63 s
   visitor.first_finish_vertex_t         0.63 s

   # average time between last 2 calls of the function
   visitor.discover_vertex_t             0.002 s
   visitor.examine_edge_t                0.001 s
   visitor.tree_edge_t                   0.001 s
   visitor.finish_vertex_t               0.001 s

   # last time finished() is called
   visitor.finished                      1.3 s

   # number of times the functions were called
   visitor.discovered_vertices           565
   visitor.examined_edges                1978
   visitor.tree_edges                    564
   visitor.finished_vertices             565

   took 1.38 s

As you can see, start_vertex is called immediately, but then it takes avery long time until the other Visitor functions are called for thefirst time after which the calls are faster again, but still quite slow.On the test graph I think I can see the same trend with smaller numbersbecause the graph is smaller:


In [30]: test_dfs(test_graph, test_graph.vertex(0))

# first time the function is entered
visitor.start_vertex_t             2e-05 s
visitor.first_discover_vertex_t 0.0007 s
visitor.first_examine_edge_t 0.0007 s
visitor.first_tree_edge_t             0.0016s
visitor.first_finish_vertex_t 0.0015 s

# average time between last 2 calls of the function
visitor.discover_vertex_t             1.6e-05 s
visitor.examine_edge_t 3.7e-06 s
visitor.tree_edge_t 1.4e-05 s
visitor.finish_vertex_t 1.4e-05 s

# last time finished() is called
visitor.finished 0.01 s

# number of times the functions were called
visitor.discovered_vertices           565
visitor.examined_edges                1978
visitor.tree_edges                    564
visitor.finished_vertices             565

took 0.01 s

If I iterate the same cc in the large graph in python, it takes me onlya couple of milliseconds:


   def dfs(graph, vertex_idx):
        t = time.time()
        todo = {vertex_idx}
        visited = set()
        while len(todo) > 0:
            next_vertex = todo.pop()
            visited.add(next_vertex)
            todo |=
   set(graph.iter_all_neighbors(graph.vertex(next_vertex))) - visited
        print(f'dfs took {time.time() - t} s')
        return visited

   In [38]: d = dfs(graph, 0)
   dfs took 0.01653146743774414 s

Because of graph_tool’s slowness for small ccs I ended up writing aheuristic that always attempts a naive python dfs first, but aborts if 1second is exceeded and only then does the graph_tool dfs. Anyone knowwhat’s going on here?



Best

My-Tien

_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

[graph-tool] dfs/bfs seem to have overhead depending on total graph size

Reply via email to