Package: wnpp Severity: wishlist Subject: ITP: odgi -- optimized dynamic genome/graph implementation Package: wnpp Owner: Michael R. Crusoe <michael.cru...@gmail.com> Severity: wishlist
* Package name : odgi Version : 0.4.1 Upstream Author : , Erik Garrison * URL : https://github.com/vgteam/odgi * License : Expat Programming Lang: C Description : optimized dynamic genome/graph implementation Representing large genomic variation graphs with minimal memory overhead requires a careful encoding of the graph entities. It is possible to build succinct, static data structures to store queryable graphs, as in https://github.com/vgteam/xg, but dynamic data structures are more tricky to implement. . odgi follows the dynamic https://github.com/jltsiren/gbwt in developing a byte-packed version of the graph and paths through it. Each node is represented by a byte array into which variable length integers are used to represent, 1) the node sequence, 2) its edges, and 3) the paths crossing the node. . The edges and path steps are recorded relativistically, as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order, and when sorted the stored deltas will tend to be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking. . The savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte. The resulting implementation is able to load the whole genome 1000 Genomes Project graph in around 20GB of RAM. . Initially, `odgi` has been developed to allow in-memory manipulation of graphs produced by the https://github.com/ekg/seqwish variation graph inducer. Remark: This package is maintained by Debian Med Packaging Team at https://salsa.debian.org/med-team/odgi