Hi Ivan,

On Sat, Sep 12, 2009 at 10:58 AM, Ivan Mikhailov
<imikhai...@openlinksw.com> wrote:
> Hello Aldo,
>
> While developers of old RDFa cartridge enjoy their weekend, I've tried
> our new, fast, not-yet-published RDFa loader, and got
>
> XML parser detected an error:
>        ERROR  : Entity reference expected after '&' character
> at line 191 column 65 of
> 'http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody'
>     <li><a href="/4566-6454_9-0.html">See All Cell Phones &
> Accessories</a>
> ------------------------------------------------------------^
>
> What's a pity :| After all successful tests "in vitro", the first run
> "in wild" demonstrated lack of functionality: the loader should be more
> tolerant to small errors, when needed.

LOL. I know the feeling. Real life! You never know what's out there ;)
Have you thought about running the input through something like HTML
Tidy[1] or htmlLawed[2].
* Not sure if they pick up on that particular error, and I know that
other issues arise ( performance, determinism, etc ) I am just
suggesting a research direction.

[1] http://www.w3.org/People/Raggett/tidy/
[2] 
http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php

Regards,
A

>
> I'll report the progress re. both old and new reader.
>
> Best Regards,
>
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
>
> On Sat, 2009-09-12 at 04:48 -0400, Aldo Bucchi wrote:
>> Hi,
>>
>> I am issuing the following SPARQL query against
>> http://linkeddata.uriburner.com/sparql.
>>
>> define get:soft "soft"
>> prefix foaf: <http://xmlns.com/foaf/0.1/>
>> select distinct ?primaryTopic
>> from 
>> <http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody>
>> where
>> {
>>   
>> <http://shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody>
>>  foaf:primaryTopic ?primaryTopic .
>>   ?primaryTopic a gr:ProductOrService .
>> }
>>
>> The query is POSTed and sparql XML results is requested. The response I get 
>> is:
>>
>> <sparql xmlns="http://www.w3.org/2005/sparql-results#";
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>> xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd";>
>>  <head>
>>   <variable name="primaryTopic"/>
>>  </head>
>>  <results distinct="false" ordered="true">
>>   <result>
>>    <binding 
>> name="primaryTopic"><literal>http://linkeddata.uriburner.com/about/id/http/shopper.cnet.com/cell-phones/lg-env-touch-verizon/4014-6454_9-33665903.html?tag=contentMain;contentBody</literal></binding>
>>   </result>
>>  </results>
>> </sparql>
>>
>> Looks nice. That's the result I was expecting.
>>
>> But... notice something wrong?
>> The binding contains a <literal> tag instead of <uri>.
>>
>> I tried to reproduce the bug via the web interface ( choosing XML as
>> the result type by manipulating the combo-box ) with no luck. ( it
>> returns a result with the correct <uri> tag ).
>>
>> I attach the complete request that yields the incorrect response below:
>>
>> -------- REQUEST ---------
>>
>> POST /sparql HTTP/1.1
>> Host  linkeddata.uriburner.com
>> User-Agent    Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
>> rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 Glue/4.3
>> Accept        text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>> Accept-Language       en-us,en;q=0.5
>> Accept-Encoding       gzip,deflate
>> Accept-Charset        ISO-8859-1,utf-8;q=0.7,*;q=0.7
>> Keep-Alive    300
>> Content-type  application/x-www-form-urlencoded
>> Accept        application/sparql-results+xml
>> Content-length        636
>>
>> query=%0Adefine%20get%3Asoft%20%22soft%22%09%0Aprefix%20foaf%3A%20%3Chttp%3A%2F%2Fxmlns%2Ecom%2Ffoaf%2F0%2E1%2F%3E%09%09%0Aselect%20distinct%20%3FprimaryTopic%20%0Afrom%20%3Chttp%3A%2F%2Fshopper%2Ecnet%2Ecom%2Fcell%2Dphones%2Flg%2Denv%2Dtouch%2Dverizon%2F4014%2D6454%5F9%2D33665903%2Ehtml%3Ftag%3DcontentMain%3BcontentBody%3E%20%0Awhere%20%0A%7B%0A%20%20%3Chttp%3A%2F%2Fshopper%2Ecnet%2Ecom%2Fcell%2Dphones%2Flg%2Denv%2Dtouch%2Dverizon%2F4014%2D6454%5F9%2D33665903%2Ehtml%3Ftag%3DcontentMain%3BcontentBody%3E%20%20foaf%3AprimaryTopic%20%3FprimaryTopic%20%2E%0A%20%20%3FprimaryTopic%20a%20gr%3AProductOrService%20%2E%0A%7D%20%09%0A%09%09
>>
>>
>
>



-- 
Aldo Bucchi
skype:aldo.bucchi
http://www.univrz.com/
http://aldobucchi.com/

PRIVILEGED AND CONFIDENTIAL INFORMATION
This message is only for the use of the individual or entity to which it is
addressed and may contain information that is privileged and confidential. If
you are not the intended recipient, please do not distribute or copy this
communication, by e-mail or otherwise. Instead, please notify us immediately by
return e-mail.

Reply via email to