On 10/5/15 11:58 PM, Haag, Jason wrote:
> Hi Kingsley,
>
> In response to your advice I had a few questions. I recently performed
> a clean install of VOS. I'm running Version: 07.20.3214, Build: Oct 6
> 2015 on Debian + Ubuntu. I checked RDFa option under cartridges.

Was this the HTML Extractor Cartridge?
> I didn't see the double check for HTML (and variants) option.

If you are configuring the HTML Extractor Cartridge you would see that
option.

> Where do I configure the URI burner options?

You are configuring the Virtuoso Sponger. URIBurner is just a public
facing instance of the Sponger offered as a transformation service.

>
> Here is a screen capture of my import settings for the crawler job:
> https://docs.google.com/document/d/1Y0Z9b5vBftbgniwmVTp10WT0gblXQKC93ivvvnggPYE/edit?usp=sharing

That shows your crawler jobs being configured to use 3 sponger
cartridges. I can also see that you are using an older version of the
Sponger which doesn't include the HTML (an variants) Cartridge. That
cartridge actually replaces all the RDFa variants presented in the old
interface.
>
> When I execute a SPARQL query it returns duplicate
> data: 
> http://52.23.175.123:8890/sparql?default-graph-uri=&query=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+*%0D%0A%0D%0AWHERE+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on

Yes, because you have the same data across several internal document
identifiers (Named Graphs). See:
http://52.23.175.123:8890/sparql?default-graph-uri=&qtxt=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+%3Fg%0D%0A%0D%0AWHERE+%7B+GRAPH+%3Fg+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.+%7D%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on


>
>
> Are these URIs with an IP address from the sponger?

Yes, they are proxy Linked Data URIs i.e., URIs made by the sponger that
deliver 5-Star Linked Data principles adherence.

> Did I duplicate the import data by selecting too many options?

You certainly have many named graphs being created that contain the same
data.

Kingsley

> Thank you for the support and advice. It would be helpful if there
> were more information about these settings/ hatch options.
>
> Kind Regards,
>
> J Haag
>
> SPARQL example: http://52.23.175.123:8890/sparql
>
> PREFIX xapi: <http://purl.org/xapi/ontology#>
>
> SELECT DISTINCT *
>
> WHERE {
>
>    ?Verb a xapi:Verb .
>
>
> }
>
>
>
>
> Your advice was to do the following:
>
> [1] Uncheck "WebDAV" checkbox
>
> [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger
> Cartridge won't be invoked (this includes the ability to read RDFa)
> [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML
> Cartridge .
>
> Also double check the "HTML (and variants)" Cartridge options. You need
> to set: rdfa=yes, in options. Here is a dump of the options used by
> URIBurner:
>
> fallback-mode=no
> *rdfa=yes*
> reify_html5md=1
> reify_rdfa=0
> reify_jsonld=1
> reify_all_grddl=0
> passthrough_mode=yes
> loose=yes
> reify_html=0
> reify_html_misc=0
> reify_turtle=yes
>
>
> As for what's the best solution for your goal? This is the best solution
> since you can schedule your content crawling.  You result should
> ultimately match:
>
> http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html
> -- Using /about sponger service.
> > Message: 1
> > Date: Fri, 2 Oct 2015 12:43:18 -0400
> > From: Kingsley Idehen <kide...@openlinksw.com
> <mailto:kide...@openlinksw.com>>
> > Subject: Re: [Virtuoso-users] Automating RDF data imports in VIrtuoso
> > To: virtuoso-users@lists.sourceforge.net
> <mailto:virtuoso-users@lists.sourceforge.net>
> > Message-ID: <560eb426.4060...@openlinksw.com
> <mailto:560eb426.4060...@openlinksw.com>>
> > Content-Type: text/plain; charset="windows-1252"
> >
> > On 9/29/15 10:57 AM, Haag, Jason wrote:
> >> Following up on my original inquiry: I currently have several RDF
> >> datasets available on my server. Each data set has an RDF dump
> >> available as RDF/XML, JSON-LD, and Turtle. These dumps are generated
> >> automatically without virtuoso from an HTML page marked up using RDFa.
> >>
> >> What is the best option for automating the import of this data on a
> >> regular basis into the virtuoso DB? I would like to automatically
> >> import RDFa data ideally, but or even rdf/xml or turtle files would be
> >> fine too. I tried this with the attached settings, but the data
> >> doesn't appear in the database. What do I need to enable or change in
> >> my settings in order to automatically import RDF data? See attached
> >> screen captures. Thanks for any tips or advice!
> >
> > Do the following:
> >
> > [1] Uncheck "WebDAV" checkbox
> > [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger
> > Cartridge won't be invoked (this includes the ability to read RDFa)
> > [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML
> > Cartridge .
> >
> > Also double check the "HTML (and variants)" Cartridge options. You need
> > to set: rdfa=yes, in options. Here is a dump of the options used by
> > URIBurner:
> >
> > fallback-mode=no
> > *rdfa=yes*
> > reify_html5md=1
> > reify_rdfa=0
> > reify_jsonld=1
> > reify_all_grddl=0
> > passthrough_mode=yes
> > loose=yes
> > reify_html=0
> > reify_html_misc=0
> > reify_turtle=yes
> >
> >
> > As for what's the best solution for your goal? This is the best solution
> > since you can schedule your content crawling.  You result should
> > ultimately match:
> >
> >
> http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html
> > -- Using /about sponger service.
> >
> >
> > --
> > Regards,
> >
> > Kingsley Idehen
> > Founder & CEO
> > OpenLink Software
> > Company Web: http://www.openlinksw.com
> > Personal Weblog 1: http://kidehen.blogspot.com
> > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
> <http://www.openlinksw.com/blog/%7Ekidehen>
> > Twitter Profile: https://twitter.com/kidehen
> > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
> > LinkedIn Profile: http://www.linkedin.com/in/kidehen
> > Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
> >
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to