Hello,
I am trying to upload a very huge file (uniprot.rdf) Its size is
about 45GB!! (the compressed file (3.5GB) can be found in:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/). I
have adapted a bit the virtuoso.ini file setting the striping options
(about 100GB reserved). I have also played with the NumberOfBuffers, and
MaxCheckPointRemap as suggested in
http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading
. The ISQL sentence I am using is:
DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
'http://www.cellcycleontology.org/ontology/rdf/uniprot',
'http://www.cellcycleontology.org/ontology/rdf/uniprot');
however, after initiating the loading process, virtuoso freezes the OS
(the load average of the system rises to 40!!) then after some time I
get an error message:
SQL> SET AUTOCOMMIT ON;
SQL>
DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
'http://www.cellcycleontology.org/ontology/rdf/uniprot',
'http://www.cellcycleontology.org/ontology/rdf/uniprot');
*** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
at line 2 of Top-Level:
DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/virtuoso/data/rdf/uniprot.rdf'),
'http://www.cellcycleontology.org/ontology/rdf/uniprot',
'http://www.cellcycleontology.org/ontology/rdf/uniprot')
It seems that the function DB.DBA.RDF_LOAD_RDFXML_MT
<http://docs.openlinksw.com/virtuoso/fn_rdf_load_rdfxml_mt.html>
(http://docs.openlinksw.com/virtuoso/functionidx.html) could help me
dealing with large RDF files perhaps by loading split files
(file_to_string_output) as suggested in
http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloading
If it would be the case, how would it be recommended to split such as
large file? In the
http://docs.openlinksw.com/virtuoso/fn_file_to_string_output.html it is
mentioned that the initial and final segments should be defined (how
long should they be?). Once loaded, will virtuoso be able to cope with
such DB? Were some tuning in the INI parameters be still needed/suggested?
thanks in advance for any hints,
Erick
--
==================================================================
Erick Antezana http://www.cellcycleontology.org
PhD student
Tel:+32 (0)9 331 38 24 fax:+32 (0)9 3313809
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
er...@psb.ugent.be http://www.psb.ugent.be/~erant
==================================================================