Hello, Does somebody ever load something has large has a complete N3 version of GenBank or Refseq into a single Virtuoso Triplestore? I'm using a the ttlp_mt program has mentioned on how to load Bio2RDF data, but I call it from a Perl script.
The current Virtuoso.db are by far the biggest I've ever build Refseq = 93 GB Genbank = 82 GB And only about 10-15% of all have been done. If I continue, I estimate that the virtuoso.db will reach 1000 GB each. Is there a way to enable compression of the object part of the triple if its a literal because I've plenty of sequence in these dump that take a lot of space when not compressed? Also, is there a faster way to load than ttlp_mt for N3 because the load is slowing down? Genbank N3 dump at http://quebec.bio2rdf.org/download/n3/genbank/ (71G compressed) RefSeq N3 dump at http://quebec.bio2rdf.org/download/n3/refseq/ (27G compressed) Be aware that my script correct some errors in triples that have been create by my first version of my rdfizer. So if you try to load it in the current state, some triple will miss the closing ">", but since creating the N3 take more than a week, I'm not redoing it now :) Thanks, Marc-Alexandre Nolin