Hi Bart,

On 28-Aug-14 1:56 PM, Bart Vandewoestyne wrote:
On 2014-07-16 10:34, Alexey Zakhlestin wrote:
On 14 Jul 2014, at 17:37, Bart Vandewoestyne <bart.vandewoest...@telenet.be> 
wrote:

Our problem is the following: we observed that certain of our SPARQL
queries have subqueries (that return IDs) that run rather slow (because
of a slow bif:contains full-text-search).  We therefore
'migrated'/'copied'/'flattened' some of our data to another
technology/database/store that allows for faster textual search.
Querying this database results in a list of IDs, and these IDs are then
'injected' in a SPARQL query (we simply create a SPARQL query string
with all these IDs in the FILTER command).

Apparently, we are limited to filtering 4094 IDs and I don't see a way
to overcome this if the subquery returns more.  Could there be a way to
work-around this limitation at SPARQL level?
Couple of untested thoughts:

1. try to use several IN() filters by joining them using OR

      FILTER(?id IN(1,2,3) or ?id IN(4,5,6))

2. use UNION to join result sets

      {
          … . FILTER(?id IN(1,2,3)
      }
      UNION
      {
          … . FILTER(?id IN(4,5,6)
      }


3. insert IDs in separate named graph, use them for subquery and then clear the 
graph

      INSERT DATA {<id1> a <https://example.com/dummy> . <id2> a 
<https://example.com/dummy> .} etc.
Getting back to my question from July, I have now tested solutions 1)
and 2).  Solution 2 using UNIONs seems indeed a good way to overcome the
4094 FILTER limit.  Concerning solution 1), I came across the following
issue:

I tried filtering using the following expression:

    FILTER( ?id IN(<id1>)
            || ?id IN(<id2>)
            || ... and so on ...
            || ?id IN(<idN) )

or, similarly:

    FILTER( (?id = <id1>)
            || (?id = <id2>)
            || ... and so on ...
            || (?id = <idN) )

This works with values of N up to and including 1024.  From the moment I
try with N=1025, Virtuoso returns me the error:

Virtuoso 37000 Error SP031: SPARQL compiler:
       Internal error: The maximum number of elements in array too long


The error comes from the compiler's limitations ..
Does it make difference if you use instead this:

FILTER( ?id IN(<id1>, <id2> ...<id1025>))

i.e. do you get again (the same) error or this works for you?

Best Regards,
Rumi Kocis



I find this error message in the source file
http://code.metager.de/source/xref/openlink/virtuoso-opensource/libsrc/Wi/sparql_sff.c
but I cannot deduce the exact reason for the error message.

Could it be that only 1024 expressions can be ORed together in a FILTER
statement like the one above?  Thereby making solution 1) not really an
option to overcome the 4094 limit?

Any thoughts on this are welcome!

Bart

------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to