On 2014-08-28 14:19, Rumi wrote: > Hi Bart, > > On 28-Aug-14 1:56 PM, Bart Vandewoestyne wrote: >> On 2014-07-16 10:34, Alexey Zakhlestin wrote: >>> On 14 Jul 2014, at 17:37, Bart Vandewoestyne >>> <bart.vandewoest...@telenet.be> wrote: >>> >>>> Our problem is the following: we observed that certain of our SPARQL >>>> queries have subqueries (that return IDs) that run rather slow (because >>>> of a slow bif:contains full-text-search). We therefore >>>> 'migrated'/'copied'/'flattened' some of our data to another >>>> technology/database/store that allows for faster textual search. >>>> Querying this database results in a list of IDs, and these IDs are then >>>> 'injected' in a SPARQL query (we simply create a SPARQL query string >>>> with all these IDs in the FILTER command). >>>> >>>> Apparently, we are limited to filtering 4094 IDs and I don't see a way >>>> to overcome this if the subquery returns more. Could there be a way to >>>> work-around this limitation at SPARQL level? >>> Couple of untested thoughts: >>> >>> 1. try to use several IN() filters by joining them using OR >>> >>> FILTER(?id IN(1,2,3) or ?id IN(4,5,6)) >>> >>> 2. use UNION to join result sets >>> >>> { >>> … . FILTER(?id IN(1,2,3) >>> } >>> UNION >>> { >>> … . FILTER(?id IN(4,5,6) >>> } >>> >>> >>> 3. insert IDs in separate named graph, use them for subquery and then >>> clear the graph >>> >>> INSERT DATA {<id1> a <https://example.com/dummy> . <id2> a >>> <https://example.com/dummy> .} etc. >> Getting back to my question from July, I have now tested solutions 1) >> and 2). Solution 2 using UNIONs seems indeed a good way to overcome the >> 4094 FILTER limit. Concerning solution 1), I came across the following >> issue: >> >> I tried filtering using the following expression: >> >> FILTER( ?id IN(<id1>) >> || ?id IN(<id2>) >> || ... and so on ... >> || ?id IN(<idN) ) >> >> or, similarly: >> >> FILTER( (?id = <id1>) >> || (?id = <id2>) >> || ... and so on ... >> || (?id = <idN) ) >> >> This works with values of N up to and including 1024. From the moment I >> try with N=1025, Virtuoso returns me the error: >> >> Virtuoso 37000 Error SP031: SPARQL compiler: >> Internal error: The maximum number of elements in array too long > > > The error comes from the compiler's limitations .. > Does it make difference if you use instead this: > > FILTER( ?id IN(<id1>, <id2> ...<id1025>)) > > i.e. do you get again (the same) error or this works for you? > > Best Regards, > Rumi Kocis
Hello Rumi, The FILTER statement you propose (and its limitations) were the trigger for my initial question :-) If you write the FILTER statement the way you propose (= the way I originally wrote it), you are limited to inserting only 4094 IDs. A way to overcome this was method 2) with UNIONS, mentioned above. I was now testing method 1) using boolean OR to combine different IN-filters, but apparently there I get stuck at the 1024 limit... So my questions remain: * Can anybody confirm this 1024 limit? * Does anybody see a way to overcome it? Or should I better stick with the UNION solution, which for now is the best solution for the 4094 limit I could come up with. Kind regards, Bart ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users