Jörn, On 1 Sep 2014, at 16:44, Jörn Hees <j_h...@cs.uni-kl.de> wrote:
> > On 1 Sep 2014, at 17:15, Hugh Williams <hwilli...@openlinksw.com> wrote: > >> [Hugh] Did you let the load continue or was it stopped ? > > yupp, i let it continue but it was killed for out of memory 2.5 days after my > last mail... :-/ > > >> Development indicate your suggestion is not without merit but implementation >> is not as simple as it may seems as the indexes are not all sequential, but >> something like that could possibly be implemented. It is suggested you could >> try dropping the indexes on RDF_QUAD table, load the Freebase datasets and >> then recreate indexes after loading, which would require a smaller working >> set that would better fix into the 32GB RAM available. The command for >> dropping the necessary indexes are: >> >> drop index rdf_quad_pogs; >> drop index rdf_quad_sp; >> drop index rdf_quad_op; >> drop index rdf_quad_gs; >> >> and the respective indexes can then be recreated as detailed at: >> >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme >> >> Note you need to recreate the column-wise indexes being v7. Let us know how >> this works for you. > > Cool, will try. [Hugh] OK, let us know the outcome ... > >> Note you can also use the ld_meter scripts we provided for monitoring the >> Virtuoso Bulk loader activity as detailed at: >> >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility >> >> Also, how many "rdf_loader_run()" processes do you have running when >> performing the load, as for v7 we recommend running Number of Core * 0.4 >> for best performance typically ? > > Thanks, didn't know these. I'll probably not run multiple rdf_loaders at the > same time as deactivating the indices, etc. (i assume it's meant for cases > where you have enough RAM and aren't invalidating even more of the cache > hierarchy by several processes concurring?) [Hugh] You should run multiple rdf_loader_run() processes as they are many datasets to load and you want to achieve maximum platform utilisation (mainly optimum use of cores for parallel loading of triples) during the load. Regards Hugh > > Cheers, > Jörn > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users