On 2014-08-28 14:19, Rumi wrote:
> Hi Bart,
>
> On 28-Aug-14 1:56 PM, Bart Vandewoestyne wrote:
>> On 2014-07-16 10:34, Alexey Zakhlestin wrote:
>>> On 14 Jul 2014, at 17:37, Bart Vandewoestyne
>>> <bart.vandewoest...@telenet.be> wrote:
>>>
>>>> Our problem is the following: we observed that certain of our SPARQL
>>>> queries have subqueries (that return IDs) that run rather slow (because
>>>> of a slow bif:contains full-text-search).  We therefore
>>>> 'migrated'/'copied'/'flattened' some of our data to another
>>>> technology/database/store that allows for faster textual search.
>>>> Querying this database results in a list of IDs, and these IDs are then
>>>> 'injected' in a SPARQL query (we simply create a SPARQL query string
>>>> with all these IDs in the FILTER command).
>>>>
>>>> Apparently, we are limited to filtering 4094 IDs and I don't see a way
>>>> to overcome this if the subquery returns more.  Could there be a way to
>>>> work-around this limitation at SPARQL level?
>>> Couple of untested thoughts:
>>>
>>> 1. try to use several IN() filters by joining them using OR
>>>
>>>       FILTER(?id IN(1,2,3) or ?id IN(4,5,6))
>>>
>>> 2. use UNION to join result sets
>>>
>>>       {
>>>           … . FILTER(?id IN(1,2,3)
>>>       }
>>>       UNION
>>>       {
>>>           … . FILTER(?id IN(4,5,6)
>>>       }
>>>
>>>
>>> 3. insert IDs in separate named graph, use them for subquery and then
>>> clear the graph
>>>
>>>       INSERT DATA {<id1> a <https://example.com/dummy> . <id2> a
>>> <https://example.com/dummy> .} etc.
>> Getting back to my question from July, I have now tested solutions 1)
>> and 2).  Solution 2 using UNIONs seems indeed a good way to overcome the
>> 4094 FILTER limit.  Concerning solution 1), I came across the following
>> issue:
>>
>> I tried filtering using the following expression:
>>
>>     FILTER( ?id IN(<id1>)
>>             || ?id IN(<id2>)
>>             || ... and so on ...
>>             || ?id IN(<idN) )
>>
>> or, similarly:
>>
>>     FILTER( (?id = <id1>)
>>             || (?id = <id2>)
>>             || ... and so on ...
>>             || (?id = <idN) )
>>
>> This works with values of N up to and including 1024.  From the moment I
>> try with N=1025, Virtuoso returns me the error:
>>
>> Virtuoso 37000 Error SP031: SPARQL compiler:
>>        Internal error: The maximum number of elements in array too long
>
>
> The error comes from the compiler's limitations ..
> Does it make difference if you use instead this:
>
> FILTER( ?id IN(<id1>, <id2> ...<id1025>))
>
> i.e. do you get again (the same) error or this works for you?
>
> Best Regards,
> Rumi Kocis

Hello Rumi,

The FILTER statement you propose (and its limitations) were the trigger 
for my initial question :-)  If you write the FILTER statement the way 
you propose (= the way I originally wrote it), you are limited to 
inserting only 4094 IDs.  A way to overcome this was method 2) with 
UNIONS, mentioned above.  I was now testing method 1) using boolean OR 
to combine different IN-filters, but apparently there I get stuck at the 
1024 limit...

So my questions remain:

* Can anybody confirm this 1024 limit?

* Does anybody see a way to overcome it?  Or should I better stick with 
the UNION solution, which for now is the best solution for the 4094 
limit I could come up with.

Kind regards,
Bart

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to