Kingsley,
I would like to make some final clarifications before I begin. This is a big
time consuming decision so I want to try to choose wisely.
-Should I transform only the portion of my database (half) that is going to
need provenance reification? Unfortunately, it would seem you should do all or
nothing so there is a consistent way to construct a query.
-With a reification style database you basically only query four predicates
(IS-A, HAS-SUBJECT, HAS-PREDICATE, HAS-OBJECT), which is the primary index used
by the default indexing scheme. Normally Virtuoso caches many triples
corresponding to the queries predicate, which is futile with only 4 predicates.
This fact alone seems to have an adverse performance effect on future queries.
-Do you strongly suggest against putting the ID inside the graph name for some
reason? Basically make the graph name the unique ID. When required to group a
portion of the 1 million named graphs, I would have to use a SPARQL FILTER to
text search for "my_uri1_*" to get all the <my_uri1_<id>>. Is this whole
concept a bad idea for some reason in comparison to the reification approach?
-Lastly should I forget about trying to find a unique ID per triple in the
Virtuoso architecture?
Regards,
Kevin
From: Kingsley Idehen [mailto:kide...@openlinksw.com]
Sent: Wednesday, February 5, 2014 4:18 PM
To: virtuoso-users@lists.sourceforge.net
Subject: Re: [Virtuoso-users] Require Unique integer ID for each RDF Triple
On 2/5/14 4:26 PM, Kevin wrote:
Kingsley,
Thank you for personally taking the time and your insightful explanation.
Without reification (Adding 4x triples - Ouch) is there any hope of getting a
unique integer index per triple (See my initial ideas)? Currently I can put
the ID in the graph name (ie. <my_iri_<id>>, but it really destroys the coarse
grain intent of graph names and graph groups.
I assume your reification solution would transform ROVER IS-A DOG into:
ID1 IS-A STATEMENT
ID1 HAS-SUBJECT ROVER
ID1 HAS-PREDICATE IS-A
ID1 HAS-OBJECT DOG
Yes, that's basically the intent of RDF reification vocabulary.
While this may be my best solution, do you agree this suffers from the
following:
-Triple count and thus memory is increased by ~4x.
Not if the DBMS engine has key compression and column-wise storage, which is
basically a feature of Virtuoso 7.x.
-Performance and caching is hurt due to the more complex queries (More
Predicates).
Not so, if you have column-wise storage, key compression, and vectorized
execution of queries, all of which are Virtuoso 7.x features.
-Finally only half my triples require provenance reification resulting an ugly
hybrid (normal and reified)
Not really, I suggest you try this with Virtuoso 7.x :-)
Kingsley
Regards,
Kevin
From: Kingsley Idehen [mailto:kide...@openlinksw.com]
Sent: Wednesday, February 5, 2014 2:38 PM
To: virtuoso-users@lists.sourceforge.net
Subject: Re: [Virtuoso-users] Require Unique integer ID for each RDF Triple
On 2/5/14 2:57 PM, Kingsley Idehen wrote:
On 2/5/14 2:17 PM, Kevin wrote:
Virtuoso Fans,
For a year I have really needed Virtuoso to provide a unique integer ID for
each RDF triple. In other words, I would like Virtuoso to store SPOGI (Like
AllegroGraph) instead of just SPOG. Often people question if I really need the
index. It is essentially for numerous reasons including storing unique
information (meta-data) about each triple (i.e. time-stamp) and allowing full
utilization of the Yago2 database. The Yago2
<http://www2007.org/papers/paper391.pdf> database (Search "fact identiļ¬er")
and AllegroGraph
<http://franz.com/agraph/support/documentation/current/triple-index.html>
triple store have embraced this meta data concept, as it unleashes some
powerful concepts. If a Virtuoso trick can be found to provide said index I
think a complete Semantic Web solution will be born.
My current approach is to have unique named graphs on each triple. While this
solution partially works it hurts performance and it feels like an ugly hack.
In addition it makes it hard to utilize the graph names in a coarse grain
manner as intended. You can group graphs to achieve larger categories, but
isn't a million graphs in a GraphGroup impractical?
Can someone with internal Virtuoso knowledge devise a way to get a unique
integer ID per triple? Perhaps a way exist to access the row-id of the
underlying RDB? Maybe the indexing scheme be augmented in any way to yield the
index I am seeking? As a crazy last resort can the Virtuoso Open Source code
base be altered to provide the index ID?
Regards,
Kevin
Kevin,
You are asking for reification of triples stored in Virtuoso. Nothing stops you
generating the reified triples right now, bar processing time.
All you do is forward-chain over all the triples creating new relations that
associate each triple with an rdf:Statement [1] i.e., a rdf:subject [2],
rdf:predicate [3], and rdf:object [4] relation per triple.
Today, you can LOAD YAGO's reified triples into Virtuoso, we tested that a long
time ago. We even do that with Uniprot [5].
Links:
1. http://www.w3.org/TR/rdf-schema/#ch_statement -- about RDF Statement entity
type
2. http://www.w3.org/TR/rdf-schema/#ch_subject -- about RDF subject relation
3. http://www.w3.org/TR/rdf-schema/#ch_predicate -- about RDF predicate relation
4. http://www.w3.org/TR/rdf-schema/#ch_object -- about RDF object relation.
5. http://bit.ly/W8MYMj -- Uniprot reified statements example (using the 50
billion+ RDF statements LOD Cloud Cache).
Kingsley
Kevin,
In regards to #5 you can use: http://lod.openlinksw.com/c/F35JIZE. The
original's timeout setting is to low, this one is set to 60 seconds.
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
<http://www.openlinksw.com/blog/%7Ekidehen>
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
_____
<http://www.avast.com/>
This email is free from viruses and malware because avast! Antivirus
<http://www.avast.com/> protection is active.
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231
<http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk>
&iu=/4140/ostg.clktrk
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
---
This email is free from viruses and malware because avast! Antivirus protection
is active.
http://www.avast.com
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users