Hi Ivan, On Sat, Sep 12, 2009 at 10:58 AM, Ivan Mikhailov <imikhai...@openlinksw.com> wrote: > Hello Aldo, > > While developers of old RDFa cartridge enjoy their weekend, I've tried > our new, fast, not-yet-published RDFa loader, and got > > XML parser detected an error: > ERROR : Entity reference expected after '&' character > at line 191 column 65 of > 'http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody' > <li><a href="/4566-6454_9-0.html">See All Cell Phones & > Accessories</a> > ------------------------------------------------------------^ > > What's a pity :| After all successful tests "in vitro", the first run > "in wild" demonstrated lack of functionality: the loader should be more > tolerant to small errors, when needed.
LOL. I know the feeling. Real life! You never know what's out there ;) Have you thought about running the input through something like HTML Tidy[1] or htmlLawed[2]. * Not sure if they pick up on that particular error, and I know that other issues arise ( performance, determinism, etc ) I am just suggesting a research direction. [1] http://www.w3.org/People/Raggett/tidy/ [2] http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php Regards, A > > I'll report the progress re. both old and new reader. > > Best Regards, > > Ivan Mikhailov > OpenLink Software > http://virtuoso.openlinksw.com > > On Sat, 2009-09-12 at 04:48 -0400, Aldo Bucchi wrote: >> Hi, >> >> I am issuing the following SPARQL query against >> http://linkeddata.uriburner.com/sparql. >> >> define get:soft "soft" >> prefix foaf: <http://xmlns.com/foaf/0.1/> >> select distinct ?primaryTopic >> from >> <http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody> >> where >> { >> >> <http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody> >> foaf:primaryTopic ?primaryTopic . >> ?primaryTopic a gr:ProductOrService . >> } >> >> The query is POSTed and sparql XML results is requested. The response I get >> is: >> >> <sparql xmlns="http://www.w3.org/2005/sparql-results#" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd"> >> <head> >> <variable name="primaryTopic"/> >> </head> >> <results distinct="false" ordered="true"> >> <result> >> <binding >> name="primaryTopic"><literal>http://linkeddata.uriburner.com/about/id/http/shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody</literal></binding> >> </result> >> </results> >> </sparql> >> >> Looks nice. That's the result I was expecting. >> >> But... notice something wrong? >> The binding contains a <literal> tag instead of <uri>. >> >> I tried to reproduce the bug via the web interface ( choosing XML as >> the result type by manipulating the combo-box ) with no luck. ( it >> returns a result with the correct <uri> tag ). >> >> I attach the complete request that yields the incorrect response below: >> >> -------- REQUEST --------- >> >> POST /sparql HTTP/1.1 >> Host linkeddata.uriburner.com >> User-Agent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; >> rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 Glue/4.3 >> Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 >> Accept-Language en-us,en;q=0.5 >> Accept-Encoding gzip,deflate >> Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7 >> Keep-Alive 300 >> Content-type application/x-www-form-urlencoded >> Accept application/sparql-results+xml >> Content-length 636 >> >> query=%0Adefine%20get%3Asoft%20%22soft%22%09%0Aprefix%20foaf%3A%20%3Chttp%3A%2F%2Fxmlns%2Ecom%2Ffoaf%2F0%2E1%2F%3E%09%09%0Aselect%20distinct%20%3FprimaryTopic%20%0Afrom%20%3Chttp%3A%2F%2Fshopper%2Ecnet%2Ecom%2Fcell%2Dphones%2Flg%2Denv%2Dtouch%2Dverizon%2F4014%2D6454%5F9%2D33665903%2Ehtml%3Ftag%3DcontentMain%3BcontentBody%3E%20%0Awhere%20%0A%7B%0A%20%20%3Chttp%3A%2F%2Fshopper%2Ecnet%2Ecom%2Fcell%2Dphones%2Flg%2Denv%2Dtouch%2Dverizon%2F4014%2D6454%5F9%2D33665903%2Ehtml%3Ftag%3DcontentMain%3BcontentBody%3E%20%20foaf%3AprimaryTopic%20%3FprimaryTopic%20%2E%0A%20%20%3FprimaryTopic%20a%20gr%3AProductOrService%20%2E%0A%7D%20%09%0A%09%09 >> >> > > -- Aldo Bucchi skype:aldo.bucchi http://www.univrz.com/ http://aldobucchi.com/ PRIVILEGED AND CONFIDENTIAL INFORMATION This message is only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If you are not the intended recipient, please do not distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail.