Dear Ivan, Thank you for the very detailed answer! We will investigate whether solution #3 would work for us. 500x performance improvement sure would be nice:-)
Kind regards Olivier Dameron On 07/25/2016 06:58 AM, Ivan Mikhailov wrote: > Hello Oliver, > > The cost of "variable P" heavily depends on the sample query. > > 1. In good case the overhead is zero. It's possible when graphs (G) are > known or almost all data of the dataset are in very numerous small > graphs, so it's cheaper to read whole graphs than to try to use > additional indices that needs known P first. > > 2. If the value of P can be found before search, as if > :display :startRelation ?startRel . > is matched before > ?gene ?startRel ?start . > then it may cost zero if statistics of the desired predicate (or > predicates) are close to average statistics across all predicates of the > database. In that case the optimizer will build an execution plan using > "average" statistics but the plan will match one built using statistics > specific for a specific predicate. If the desired predicate is very > frequent or very infrequent then the optimizer can miss, resulting in > poor plan; the cost of the planning error can be big for single-process > installations and blocking for clusters. Note that rdf:type is the most > common example of predicate with far-from-average statistics. > > 3. In some cases, known G can eliminate the extra cost of unknown P. As > a common rule, if you know G or if you can easily calculate G at the > beginning of the query then the pattern > <calculate ?g here> > GRAPH ?g { <interesting part of query> } > is worth trying. Some customer reported 500x performance on single > server. > > 4. If the query is executed by an ODBC client or the like and P is a > parameter passed from outside then it may be practical to inline P into > the text of the query rather than using ?:p parameter passing notation. > At least, it's practical to have a separate variant of the query for > parameter set to rdf:type > > Best Regards, > > Ivan Mikhailov > OpenLink Software > http://virtuoso.openlinksw.com > > On Fri, 2016-07-22 at 17:09 +0200, Olivier Dameron wrote: >> Dear Virtuosans, >> I noticed that queries can be much slower when using variables as >> properties (even if the variable can only have one value) e.g.: >> >> DATA (extract): >> :display :startRelation :posStart. >> :gene0 rdf:type :Gene. >> :gene0 :posStart "53416"^^xsd:numeric. >> :gene1 rdf:type :Gene. >> :gene1 :posStart "29513"^^xsd:numeric. >> ... >> >> SIMPLE QUERY: >> ?gene :posStart ?start. >> >> SLOWER QUERY: >> :display :startRelation ?startRel . >> ?gene ?startRel ?start. >> >> I assume that the query engine first tries to match the "?gene >> ?startRel ?start" constraint, whereas begining by ":display >> :startRelation ?startRel." would define the value of ?startRel which >> would be used to find the start positions of the genes. >> I can live with the simple query, but the second one would make the >> development of our application easier. Is there anything we could do to >> improve the performance of the second query? >> >> Thank you! >> kind regards >> Olivier Dameron >> >> NB: for the record, below is the script I used for generating the >> dataset, as well as the two queries >> >> ===== generateDataSet.py >> #! /usr/bin/env python >> >> import random >> >> nbGenes = 50000 >> geneLengthMax = 100 >> chromosomeLength = 100000 >> >> with open("debugDataset.ttl", "w") as dataFile: >> dataFile.write(""" >> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. >> @prefix : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/>. >> >> :display :startRelation :posStart. >> :display :stopRelation :posStop. >> >> """) >> for i in range(nbGenes): >> geneIdent = ":gene" + str(i) >> dataFile.write("\n" + geneIdent + " rdf:type :Gene.\n") >> posStart = random.randint(1, chromosomeLength-geneLengthMax) >> posStop = posStart + random.randint(1, geneLengthMax) >> dataFile.write(geneIdent + " :posStart \"" + str(posStart) + >> "\"^^xsd:numeric.\n") >> dataFile.write(geneIdent + " :posStop \"" + str(posStop) + >> "\"^^xsd:numeric.\n") >> >> >> ===== getOverlap.sparql >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/> >> >> SELECT (count(*) as ?nbOverlap) >> >> WHERE { >> ?gene1 a :Gene; >> :posStart ?start1; >> :posStop ?stop1. >> >> ?gene2 a :Gene; >> :posStart ?start2; >> :posStop ?stop2. >> FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1) >> } >> >> >> ===== getOverlapSLOW.sparql >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/> >> >> SELECT (count(*) as ?nbOverlap) >> >> WHERE { >> :display :startRelation ?startRel. >> :display :stopRelation ?stopRel. >> >> ?gene1 a :Gene; >> ?startRel ?start1; >> ?stopRel ?stop1. >> >> ?gene2 a :Gene; >> ?startRel ?start2; >> ?stopRel ?stop2. >> FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1) >> } >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> reports.http://sdm.link/zohodev2dev >> _______________________________________________ >> Virtuoso-users mailing list >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users