[Virtuoso-users] Large N3 files

Marc-Alexandre Nolin Thu, 01 Oct 2009 19:19:37 +0000

Hello,

Does somebody ever load something has large has a complete N3 version
of GenBank or Refseq into a single Virtuoso Triplestore? I'm using a
the ttlp_mt program has mentioned on how to load Bio2RDF data, but I
call it from a Perl script.


The current Virtuoso.db are by far the biggest I've ever build

Refseq = 93 GB
Genbank = 82 GB

And only about 10-15% of all have been done. If I continue, I estimate
that the virtuoso.db will reach 1000 GB each. Is there a way to enable
compression of the object part of the triple if its a literal because
I've plenty of sequence in these dump that take a lot of space when
not compressed? Also, is there a faster way to load than ttlp_mt for
N3 because the load is slowing down?

Genbank N3 dump at http://quebec.bio2rdf.org/download/n3/genbank/ (71G
compressed)
RefSeq N3 dump at http://quebec.bio2rdf.org/download/n3/refseq/ (27G compressed)

Be aware that my script correct some errors in triples that have been
create by my first version of my rdfizer. So if you try to load it in
the current state, some triple will miss the closing ">", but since
creating the N3 take more than a week, I'm not redoing it now :)

Thanks,

Marc-Alexandre Nolin

[Virtuoso-users] Large N3 files

Reply via email to