Re: [Virtuoso-users] XPATH/XSLT on sponged RDFa

Kingsley Idehen Sat, 14 Feb 2009 19:18:58 +0000

Jem Rayfield wrote:

Hi Kingsley,


I want to push content into Virtuoso via HTTP rather than a crawl

mechanism.I don't think the content crawler is what I am after.

Our content gets created and pushed onto publish/rendering queues. I
need to aysnch the triple extraction as and when content is published
rather than waiting for a publishing mechanism, exposure onto public
facing web servers and a subsequent crawl. Ideally I would like the
extraction process to work on new content documents (rdf) and augment
the rdf to an existing graph. (Will a HTTP post approach always create a
new graph? Maybe I have configured something incorrectly?)

Okay so you do that using HTTP or HTTP/WebDAV.

New graphs are only created automatically when you post to RDF_Sinkfolder. That said, there is a WebDAV specific graph that tracks allWebDAV resources. The WebDAV graph is purely about WebDAV metadata in anRDF graph whereas the RDF_Sink and Sponging result in graph derived fromthe content of the information resource.

(The crawl mechanism certainly looks like a very interesting
feature but I don't think it fits this use case. Although I could have
missed something?)

Depends on the workflow you seek. From what you state above, you arelooking at projecting what you have to the outside via Virtuoso, ratherthan grabbing stuff from the outside into Virtuoso (which is one areawhere crawling helps).

After pushing the XHTML2/RDFa content into Virtuoso via HTTP I am able
to use SPARQL on the quad store. The origin document is also available
via the DAV interface.

Okay.

However I am assuming that your DAV store is built using database tables
and thus the sponging process also consumes the XHTML into a Table
(XMLType (text index)).
This Table/XMLType could then be queried using SQL/SPARQL?. I could then
maybe even be expose these a stored proc using virtuoso web service->PL?

Yes.

You can also make all the records in the WebDAV table appear asinformation resources via our Dynamic Extensions Type (DET) feature.

To see DETs in action, look at Virtuoso installations/DAV/<home>/<some-dav-account>/Items folder. This is basically anexample of our DET functionality.


So my question is if this assumption is correct can I query this table
and if so what is this table.

Before going to the system tables though, I encourage you to look at theWebDAV functions in place. Here is a function index link:

http://docs.openlinksw.com/virtuoso/fn_dav_api_user.html

Or do I need to create a new cartridge which persists the origin
document into a specific table with the correct ACL which can then be
queried?

No.


I hope this make some sort of sense. (It's getting late on a Friday ;-))


I will read over the tutorial links you sent and something will probably
spring out at me after a weekend of sleep!


Okay :-)


Kingsley


Thanks again...any pointers idea appreciated
Jem



-----Original Message-----

From: Kingsley Idehen [mailto:kide...@openlinksw.com]Sent: 13 February 2009 15:41

To: Jem Rayfield
Cc: virtuoso-users@lists.sourceforge.net
Subject: Re: [Virtuoso-users] XPATH/XSLT on sponged RDFa

Jem,

<<
I have managed to sponge XHTML/RDFa via DAV (using the steps below)
however I want to be able to query the origin XHTML2 using XPATH or
transform using XSLT. I can only see the origin content within DAV at
the moment and cannot see the XML stored as an XMLType within a Virtuoso
Table. Ideally I would like to be able to query the triples using SPARQL
and (maybe in combination with) query the origin XHTML using XPATH. Have
you came across anything like this (examples)? This would be most useful
as I could then transform required content from the result of a SPARQL
query and maybe even expose the content via a Virtuoso stored procedure
(Virtuosos web service->stored-proc mapping) this would allow some
pretty funky logic and will enable me to constrain and control access to
the content.
 >>

Please confirm that this is what you seek:

1. Grab XHTML content from an HTTP accessible source into Virtuoso
(WebDAV Content Management realm) 2. Have an RDF graph(s) generated from
the imported resource(s) 3. Have SPARQL access to the RDF in the Quad
Store 4. XQuery/XPath access to the XHTML via WebDAV or any other means.

If the above is true, the key to this is via the Virtuoso ContentCrawler which can do the following on a scheduled basis:


1. Grab/Sponge Web content into a location of your choice within WebDAV

2. Indicate to the Crawler that is should use one or more SpongerCartridges during the crawl


Result:

1. WebDAV accessible XHTML to which you can apply XSLT, XQuery, XPathQueries2. Triples in the Quad Store (with Graph IRIs matching the contentsource URLs)

Also see: http://demo.openlinksw.com/tutorial re. examples of XML datamanipulation etc.. You can install a local version of this via the"tutorial vad package".



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: [Virtuoso-users] XPATH/XSLT on sponged RDFa

Reply via email to