Thank you Kingsley. Where do I obtain a newer version of the Sponger with the HTML (an variants) Cartridge? ------------------------------------------------------- +1.850.266.7100(office) +1.850.471.1300 (mobile) jhaag75 (skype) http://jasonhaag.com (Web) http://twitter.com/mobilejson (Twitter) http://linkedin.com/in/jasonhaag (LinkedIn)
On Tue, Oct 6, 2015 at 8:21 AM, <virtuoso-users-requ...@lists.sourceforge.net> wrote: > Send Virtuoso-users mailing list submissions to > virtuoso-users@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > or, via email, send a message with subject or body 'help' to > virtuoso-users-requ...@lists.sourceforge.net > > You can reach the person managing the list at > virtuoso-users-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Virtuoso-users digest..." > > > Today's Topics: > > 1. Re: Virtuoso-users Digest, Vol 108, Issue 4 (Haag, Jason) > 2. Re: Virtuoso-users Digest, Vol 108, Issue 4 (Kingsley Idehen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 5 Oct 2015 22:58:46 -0500 > From: "Haag, Jason" <jhaa...@gmail.com> > Subject: Re: [Virtuoso-users] Virtuoso-users Digest, Vol 108, Issue 4 > To: virtuoso-users <virtuoso-users@lists.sourceforge.net>, > kide...@openlinksw.com > Message-ID: > <cahjqjnj39f8kfwr9q8k0jjgt7x+nrkjibmpczn_tbe+0aqb...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Kingsley, > > In response to your advice I had a few questions. I recently performed a > clean install of VOS. I'm running Version: 07.20.3214, Build: Oct 6 2015 on > Debian + Ubuntu. I checked RDFa option under cartridges. I didn't see the > double check for HTML (and variants) option. Where do I configure the URI > burner options? > > Here is a screen capture of my import settings for the crawler job: > https://docs.google.com/document/d/1Y0Z9b5vBftbgniwmVTp10WT0gblXQKC93ivvvnggPYE/edit?usp=sharing > > When I execute a SPARQL query it returns duplicate data: > http://52.23.175.123:8890/sparql?default-graph-uri=&query=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+*%0D%0A%0D%0AWHERE+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on > > > Are these URIs with an IP address from the sponger? Did I duplicate the > import data by selecting too many options? Thank you for the support and > advice. It would be helpful if there were more information about these > settings/ hatch options. > > Kind Regards, > > J Haag > > SPARQL example: http://52.23.175.123:8890/sparql > > PREFIX xapi: <http://purl.org/xapi/ontology#> > > SELECT DISTINCT * > > WHERE { > > ?Verb a xapi:Verb . > > > } > > > > > Your advice was to do the following: > > [1] Uncheck "WebDAV" checkbox > > [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger > Cartridge won't be invoked (this includes the ability to read RDFa) > [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML > Cartridge . > > Also double check the "HTML (and variants)" Cartridge options. You need > to set: rdfa=yes, in options. Here is a dump of the options used by > URIBurner: > > fallback-mode=no > *rdfa=yes* > reify_html5md=1 > reify_rdfa=0 > reify_jsonld=1 > reify_all_grddl=0 > passthrough_mode=yes > loose=yes > reify_html=0 > reify_html_misc=0 > reify_turtle=yes > > > As for what's the best solution for your goal? This is the best solution > since you can schedule your content crawling. You result should > ultimately match: > > http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html > -- Using /about sponger service. >> Message: 1 >> Date: Fri, 2 Oct 2015 12:43:18 -0400 >> From: Kingsley Idehen <kide...@openlinksw.com> >> Subject: Re: [Virtuoso-users] Automating RDF data imports in VIrtuoso >> To: virtuoso-users@lists.sourceforge.net >> Message-ID: <560eb426.4060...@openlinksw.com> >> Content-Type: text/plain; charset="windows-1252" >> >> On 9/29/15 10:57 AM, Haag, Jason wrote: >>> Following up on my original inquiry: I currently have several RDF >>> datasets available on my server. Each data set has an RDF dump >>> available as RDF/XML, JSON-LD, and Turtle. These dumps are generated >>> automatically without virtuoso from an HTML page marked up using RDFa. >>> >>> What is the best option for automating the import of this data on a >>> regular basis into the virtuoso DB? I would like to automatically >>> import RDFa data ideally, but or even rdf/xml or turtle files would be >>> fine too. I tried this with the attached settings, but the data >>> doesn't appear in the database. What do I need to enable or change in >>> my settings in order to automatically import RDF data? See attached >>> screen captures. Thanks for any tips or advice! >> >> Do the following: >> >> [1] Uncheck "WebDAV" checkbox >> [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger >> Cartridge won't be invoked (this includes the ability to read RDFa) >> [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML >> Cartridge . >> >> Also double check the "HTML (and variants)" Cartridge options. You need >> to set: rdfa=yes, in options. Here is a dump of the options used by >> URIBurner: >> >> fallback-mode=no >> *rdfa=yes* >> reify_html5md=1 >> reify_rdfa=0 >> reify_jsonld=1 >> reify_all_grddl=0 >> passthrough_mode=yes >> loose=yes >> reify_html=0 >> reify_html_misc=0 >> reify_turtle=yes >> >> >> As for what's the best solution for your goal? This is the best solution >> since you can schedule your content crawling. You result should >> ultimately match: >> >> > http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html >> -- Using /about sponger service. >> >> >> -- >> Regards, >> >> Kingsley Idehen >> Founder & CEO >> OpenLink Software >> Company Web: http://www.openlinksw.com >> Personal Weblog 1: http://kidehen.blogspot.com >> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen >> Twitter Profile: https://twitter.com/kidehen >> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about >> LinkedIn Profile: http://www.linkedin.com/in/kidehen >> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Tue, 6 Oct 2015 09:21:38 -0400 > From: Kingsley Idehen <kide...@openlinksw.com> > Subject: Re: [Virtuoso-users] Virtuoso-users Digest, Vol 108, Issue 4 > To: virtuoso-users@lists.sourceforge.net > Message-ID: <5613cae2.3080...@openlinksw.com> > Content-Type: text/plain; charset="windows-1252" > > On 10/5/15 11:58 PM, Haag, Jason wrote: >> Hi Kingsley, >> >> In response to your advice I had a few questions. I recently performed >> a clean install of VOS. I'm running Version: 07.20.3214, Build: Oct 6 >> 2015 on Debian + Ubuntu. I checked RDFa option under cartridges. > > Was this the HTML Extractor Cartridge? >> I didn't see the double check for HTML (and variants) option. > > If you are configuring the HTML Extractor Cartridge you would see that > option. > >> Where do I configure the URI burner options? > > You are configuring the Virtuoso Sponger. URIBurner is just a public > facing instance of the Sponger offered as a transformation service. > >> >> Here is a screen capture of my import settings for the crawler job: >> https://docs.google.com/document/d/1Y0Z9b5vBftbgniwmVTp10WT0gblXQKC93ivvvnggPYE/edit?usp=sharing > > That shows your crawler jobs being configured to use 3 sponger > cartridges. I can also see that you are using an older version of the > Sponger which doesn't include the HTML (an variants) Cartridge. That > cartridge actually replaces all the RDFa variants presented in the old > interface. >> >> When I execute a SPARQL query it returns duplicate >> data: >> http://52.23.175.123:8890/sparql?default-graph-uri=&query=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+*%0D%0A%0D%0AWHERE+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on > > Yes, because you have the same data across several internal document > identifiers (Named Graphs). See: > http://52.23.175.123:8890/sparql?default-graph-uri=&qtxt=PREFIX+xapi%3A+%3Chttp%3A%2F%2Fpurl.org%2Fxapi%2Fontology%23%3E%0D%0A%0D%0ASELECT+DISTINCT+%3Fg%0D%0A%0D%0AWHERE+%7B+GRAPH+%3Fg+%7B%0D%0A%0D%0A+++%3FVerb+a+xapi%3AVerb+.+%7D%0D%0A%0D%0A%0D%0A%7D%0D%0A&should-sponge=&format=text%2Fhtml&timeout=0&debug=on > > >> >> >> Are these URIs with an IP address from the sponger? > > Yes, they are proxy Linked Data URIs i.e., URIs made by the sponger that > deliver 5-Star Linked Data principles adherence. > >> Did I duplicate the import data by selecting too many options? > > You certainly have many named graphs being created that contain the same > data. > > Kingsley > >> Thank you for the support and advice. It would be helpful if there >> were more information about these settings/ hatch options. >> >> Kind Regards, >> >> J Haag >> >> SPARQL example: http://52.23.175.123:8890/sparql >> >> PREFIX xapi: <http://purl.org/xapi/ontology#> >> >> SELECT DISTINCT * >> >> WHERE { >> >> ?Verb a xapi:Verb . >> >> >> } >> >> >> >> >> Your advice was to do the following: >> >> [1] Uncheck "WebDAV" checkbox >> >> [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger >> Cartridge won't be invoked (this includes the ability to read RDFa) >> [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML >> Cartridge . >> >> Also double check the "HTML (and variants)" Cartridge options. You need >> to set: rdfa=yes, in options. Here is a dump of the options used by >> URIBurner: >> >> fallback-mode=no >> *rdfa=yes* >> reify_html5md=1 >> reify_rdfa=0 >> reify_jsonld=1 >> reify_all_grddl=0 >> passthrough_mode=yes >> loose=yes >> reify_html=0 >> reify_html_misc=0 >> reify_turtle=yes >> >> >> As for what's the best solution for your goal? This is the best solution >> since you can schedule your content crawling. You result should >> ultimately match: >> >> http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html >> -- Using /about sponger service. >> > Message: 1 >> > Date: Fri, 2 Oct 2015 12:43:18 -0400 >> > From: Kingsley Idehen <kide...@openlinksw.com >> <mailto:kide...@openlinksw.com>> >> > Subject: Re: [Virtuoso-users] Automating RDF data imports in VIrtuoso >> > To: virtuoso-users@lists.sourceforge.net >> <mailto:virtuoso-users@lists.sourceforge.net> >> > Message-ID: <560eb426.4060...@openlinksw.com >> <mailto:560eb426.4060...@openlinksw.com>> >> > Content-Type: text/plain; charset="windows-1252" >> > >> > On 9/29/15 10:57 AM, Haag, Jason wrote: >> >> Following up on my original inquiry: I currently have several RDF >> >> datasets available on my server. Each data set has an RDF dump >> >> available as RDF/XML, JSON-LD, and Turtle. These dumps are generated >> >> automatically without virtuoso from an HTML page marked up using RDFa. >> >> >> >> What is the best option for automating the import of this data on a >> >> regular basis into the virtuoso DB? I would like to automatically >> >> import RDFa data ideally, but or even rdf/xml or turtle files would be >> >> fine too. I tried this with the attached settings, but the data >> >> doesn't appear in the database. What do I need to enable or change in >> >> my settings in order to automatically import RDF data? See attached >> >> screen captures. Thanks for any tips or advice! >> > >> > Do the following: >> > >> > [1] Uncheck "WebDAV" checkbox >> > [2] Check "Sponger" checkbox -- otherwise "HTML (and variants)" Sponger >> > Cartridge won't be invoked (this includes the ability to read RDFa) >> > [3] Check "Show Sponger Extractor Cartridges" -- and then check the HTML >> > Cartridge . >> > >> > Also double check the "HTML (and variants)" Cartridge options. You need >> > to set: rdfa=yes, in options. Here is a dump of the options used by >> > URIBurner: >> > >> > fallback-mode=no >> > *rdfa=yes* >> > reify_html5md=1 >> > reify_rdfa=0 >> > reify_jsonld=1 >> > reify_all_grddl=0 >> > passthrough_mode=yes >> > loose=yes >> > reify_html=0 >> > reify_html_misc=0 >> > reify_turtle=yes >> > >> > >> > As for what's the best solution for your goal? This is the best solution >> > since you can schedule your content crawling. You result should >> > ultimately match: >> > >> > >> http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs/index.html >> > -- Using /about sponger service. >> > >> > >> > -- >> > Regards, >> > >> > Kingsley Idehen >> > Founder & CEO >> > OpenLink Software >> > Company Web: http://www.openlinksw.com >> > Personal Weblog 1: http://kidehen.blogspot.com >> > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen >> <http://www.openlinksw.com/blog/%7Ekidehen> >> > Twitter Profile: https://twitter.com/kidehen >> > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about >> > LinkedIn Profile: http://www.linkedin.com/in/kidehen >> > Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this >> > >> >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Virtuoso-users mailing list >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Company Web: http://www.openlinksw.com > Personal Weblog 1: http://kidehen.blogspot.com > Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen > Twitter Profile: https://twitter.com/kidehen > Google+ Profile: https://plus.google.com/+KingsleyIdehen/about > LinkedIn Profile: http://www.linkedin.com/in/kidehen > Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this > > -------------- next part -------------- > An HTML attachment was scrubbed... > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: smime.p7s > Type: application/pkcs7-signature > Size: 2407 bytes > Desc: S/MIME Cryptographic Signature > > ------------------------------ > > ------------------------------------------------------------------------------ > > > ------------------------------ > > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > > End of Virtuoso-users Digest, Vol 108, Issue 10 > *********************************************** ------------------------------------------------------------------------------ _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users