On 21 May 2009, at 17:30, Frederick Giasson wrote:
Hi Daniel,
I've been using the ARC PHP libraries to query my local Virtuoso
SPARQL end point. While this works fine for small amounts of data,
the memory usage of paging through hundreds of pages of results
is too much for my PHP process to handle.
Is there a better way to do SPARQL querying against a local
Virtuoso than using ARC?
Feel free to tell me to RTFM, but i'd appreciate any thoughts you
might have.
Thanks,
Dan
Dan,
What's the configuration of your machine? Basically, how much RAM
is in place?
BTW - Have you looked at the Virtuoso tunning guide?
Links:
1. http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html
I think this is related to ARC or PHP or the SPARQL query he sends
and not the performance of the virtuoso data store.
Daniel: make sure that what handle the paging is the sparql query by
using LIMIT and OFFSET. Otherwise, a really big number of triples
can be returned, and then PHP can choke with its memory if too many
objects are created by ARC.
So, if you are performing the paging using SPARQL, a small amount of
data will be loaded in PHP objects (ARC) and will then be usable.
Does this answer your question?
Hi Fred,
You're right that it's a PHP issue - Virtuoso is returning results
quickly.
The problem is that even though PHP has a max memory of 4GB (out of
the machine's 8GB) it still grinds to a halt and gets killed by the
kernel after a number of hours of processing results.
I'm using LIMIT and OFFSET in the SPARQL queries.
As far as I can gather from looking at memory usage with xdebug,
something that ARC uses (possibly the XML parser) is leaking memory
even iteration, and this is adding up.
Is there a better way to query virtuoso (with SPARQL queries) locally,
rather than having to use the HTTP/XML endpoint?
Thanks,
Dan
--
Daniel Alexander Smith
IAM Group
School of Electronics and Computer Science
University of Southampton
das...@ecs.soton.ac.uk