Thank you for your advice Daniel. Actually I want to delete only the statements containing the specific predicate. I don't want to delete all the triples containing the subject of the predicate. As I have already said, I don't feel comfortable with the DELETE queries. Is my query wrong? Could you suggest the correct query?
Kind regards, Pantelis Natsiavas 2016-08-17 17:19 GMT+03:00 Davis, Daniel (NIH/NLM) [C] <daniel.da...@nih.gov >: > So, this has nothing to do with the large vector size, but just to be sure > the SPARQL is correct - do you wish to delete the subjects (and all their > triples) where the subject has the predicate, or just the predicate itself? > > > > As far as avoiding the maximum vector size, I think your best approach is > to limit the number of matches and repeat the query until there are no > results, maybe with a count query in-between. I have had to do similar > sorts of work-arounds to avoid the maximum # of results and maximum size of > string issues. For instance, my first attempts to export large NTriples > files after processing failed due to these issues. You may be able to > adapt the code below, but I think that a repeated deleted query limited to > a # of triples will be best in your case. > > > > Anyway, the code: > > > > CREATE PROCEDURE meshrdf_export(in graph_uri varchar, in file_name > varchar) { > > DECLARE banner any; > > DECLARE env, ses any; > > DECLARE ses_len, max_ses_len any; > > > > SET isolation = 'uncommitted'; > > > > max_ses_len := 10000000; > > > > -- > > -- Truncate file and write a comment line indicating the graph and > datetime of export. > > -- > > --no_c_escapes- > > banner := sprintf('# <%s> exported at %s\n', graph_uri, > datestring(now())); > > string_to_file (file_name, banner, -2); > > > > env := vector (0, 0, 0); > > ses := string_output (); > > > > FOR (SELECT * FROM (SPARQL > > define input:storage "" > > SELECT ?s ?p ?o WHERE { > > GRAPH `iri(?:graph_uri)` { > > ?s ?p ?o > > } > > } ORDER BY ?s ?p ?o) AS sub OPTION (loop)) DO { > > http_nt_triple (env, "s", "p", "o", ses); > > ses_len := length (ses); > > > > IF (ses_len > max_ses_len) { > > string_to_file (file_name, ses, -1); > > ses := string_output (); > > } > > } > > IF (length (ses)) { > > string_to_file (file_name, ses, -1); > > } > > } > > > > Dan Davis, Systems/Applications Architect (Contractor), > > Office of Computer and Communications Systems, > > National Library of Medicine, NIH > > > > > > *From:* Pantelis Natsiavas [mailto:natsia...@gmail.com] > *Sent:* Wednesday, August 17, 2016 4:36 AM > *To:* virtuoso-users <virtuoso-users@lists.sourceforge.net> > *Subject:* [Virtuoso-users] Deleting large number of triples > > > > Hi everybody. > > > > I am trying to delete a large number of triples of a very big graph. The > graph contains *217.609.545* triples and I want to delete all the triples > having a specific predicate (*64.884.016* triples). > > > > I am trying to do it through the isql-v command line interface, using the > command: > > > > SPARQL DEFINE sql:log-enable 3 > > WITH <graph> > > DELETE { ?s <predicate> ?o } > > WHERE{ ?s <predicate> ?o } > > > > After some time (I don't know exactly how much) I got the error > > > > *** Error 42000: [Virtuoso Driver][Virtuoso Server]FRVEC: array in for > vectored over max vector length 2000000 > 1000000 > at line 1 of Top-Level: > > > > I checked the virtuoso.log and I see nothing related to the specific > error. > > > > I changed the parameters in virtuoso.ini: > > MaxQueryMem = 8G ; from 2G > VectorSize = 1000 ; not changed > MaxVectorSize = 2000000 ; from 1000000 > AdjustVectorSize = 1 ; from 0 > > > > I am not very confident about these changes in virtuoso settings, but > checking the http://docs.openlinksw.com/virtuoso/dbadm.html these changes > seemed the right thing to do. > > > > I restarted the VM and retried the whole process. After one hour, the > memory consumed by Virtuoso got around 100% and got an error: > > *** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server > > > > Please note that from previous similar errors, I already have the > following virtuoso.ini settings: > > NumberOfBuffers = 1360000 > MaxDirtyBuffers = 1000000 > ThreadCleanupInterval = 1 > ResourcesCleanupInterval = 1 > > > > My questions: > > 1. Is there any way to improve my query in order to facilitate its > processing? It is the first time I am doing a DELETE query and I am not > comfortable with it. > > 2. Is there any way to "split" the query so that it doesn't need to handle > all these triples at once? > > 3. Alternatively, is there any configuration change that might improve > memory handling in order to handle such big queries? > > > > Kind regards, > > Pantelis Natsiavas > > > > >
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users