On 10/5/15 11:58 PM, Haag, Jason wrote: > Hi Kingsley, > > In response to your advice I had a few questions. I recently performed > a clean install of VOS. I'm running Version: 07.20.3214, Build: Oct 6 > 2015 on Debian + Ubuntu. I checked RDFa option under cartridges.
Was this the HTML Extractor Cartridge? > I didn't see the double check for HTML (and variants) option. If you are configuring the HTML Extractor Cartridge you would see that option. > Where do I configure the URI burner options? You are configuring the Virtuoso Sponger. URIBurner is just a public facing instance of the Sponger offered as a transformation service. > > Here is a screen capture of my import settings for the crawler job: > https://docs.google.com/document/d/1Y0Z9b5vBftbgniwmVTp10WT0gblXQKC93ivvvnggPYE/edit?usp=sharing That shows your crawler jobs being configured to use 3 sponger cartridges. I can also see that you are using an older version of the Sponger which doesn't include the HTML (an variants) Cartridge. That cartridge actually replaces all the RDFa variants presented in the old interface. > > When I execute a SPARQL query it returns duplicate > data: > http://52.23.175.123:8890/sparql?default-graph-uri=&query=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+*%0D%0A%0D%0AWHERE+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on Yes, because you have the same data across several internal document identifiers (Named Graphs). See: http://52.23.175.123:8890/sparql?default-graph-uri=&qtxt=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+%3Fg%0D%0A%0D%0AWHERE+%7B+GRAPH+%3Fg+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.+%7D%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on > > > Are these URIs with an IP address from the sponger? Yes, they are proxy Linked Data URIs i.e., URIs made by the sponger that deliver 5-Star Linked Data principles adherence. > Did I duplicate the import data by selecting too many options? You certainly have many named graphs being created that contain the same data. Kingsley > Thank you for the support and advice. It would be helpful if there > were more information about these settings/ hatch options. > > Kind Regards, > > J Haag > > SPARQL example: http://52.23.175.123:8890/sparql > > PREFIX xapi: <http://purl.org/xapi/ontology#> > > SELECT DISTINCT * > > WHERE { > > ?Verb a xapi:Verb . > > > } > > > > > Your advice was to do the following: > > [1] Uncheck "WebDAV" checkbox > > [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger > Cartridge won't be invoked (this includes the ability to read RDFa) > [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML > Cartridge . > > Also double check the "HTML (and variants)" Cartridge options. You need > to set: rdfa=yes, in options. Here is a dump of the options used by > URIBurner: > > fallback-mode=no > *rdfa=yes* > reify_html5md=1 > reify_rdfa=0 > reify_jsonld=1 > reify_all_grddl=0 > passthrough_mode=yes > loose=yes > reify_html=0 > reify_html_misc=0 > reify_turtle=yes > > > As for what's the best solution for your goal? This is the best solution > since you can schedule your content crawling. You result should > ultimately match: > > http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html > -- Using /about sponger service. > > Message: 1 > > Date: Fri, 2 Oct 2015 12:43:18 -0400 > > From: Kingsley Idehen <kide...@openlinksw.com > <mailto:kide...@openlinksw.com>> > > Subject: Re: [Virtuoso-users] Automating RDF data imports in VIrtuoso > > To: virtuoso-users@lists.sourceforge.net > <mailto:virtuoso-users@lists.sourceforge.net> > > Message-ID: <560eb426.4060...@openlinksw.com > <mailto:560eb426.4060...@openlinksw.com>> > > Content-Type: text/plain; charset="windows-1252" > > > > On 9/29/15 10:57 AM, Haag, Jason wrote: > >> Following up on my original inquiry: I currently have several RDF > >> datasets available on my server. Each data set has an RDF dump > >> available as RDF/XML, JSON-LD, and Turtle. These dumps are generated > >> automatically without virtuoso from an HTML page marked up using RDFa. > >> > >> What is the best option for automating the import of this data on a > >> regular basis into the virtuoso DB? I would like to automatically > >> import RDFa data ideally, but or even rdf/xml or turtle files would be > >> fine too. I tried this with the attached settings, but the data > >> doesn't appear in the database. What do I need to enable or change in > >> my settings in order to automatically import RDF data? See attached > >> screen captures. Thanks for any tips or advice! > > > > Do the following: > > > > [1] Uncheck "WebDAV" checkbox > > [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger > > Cartridge won't be invoked (this includes the ability to read RDFa) > > [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML > > Cartridge . > > > > Also double check the "HTML (and variants)" Cartridge options. You need > > to set: rdfa=yes, in options. Here is a dump of the options used by > > URIBurner: > > > > fallback-mode=no > > *rdfa=yes* > > reify_html5md=1 > > reify_rdfa=0 > > reify_jsonld=1 > > reify_all_grddl=0 > > passthrough_mode=yes > > loose=yes > > reify_html=0 > > reify_html_misc=0 > > reify_turtle=yes > > > > > > As for what's the best solution for your goal? This is the best solution > > since you can schedule your content crawling. You result should > > ultimately match: > > > > > http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html > > -- Using /about sponger service. > > > > > > -- > > Regards, > > > > Kingsley Idehen > > Founder & CEO > > OpenLink Software > > Company Web: http://www.openlinksw.com > > Personal Weblog 1: http://kidehen.blogspot.com > > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen > <http://www.openlinksw.com/blog/%7Ekidehen> > > Twitter Profile: https://twitter.com/kidehen > > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about > > LinkedIn Profile: http://www.linkedin.com/in/kidehen > > Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users