It sounds like maybe when you run this from code, you are getting an error page instead of the RSS feed and that error page is a malformed HTML.
Do you have a proxy where you run the code? If so, your browser may be using proxy and your DIH code does not. You could try running something like WireShark, Fiddler or similar t inspect the request/response you are actually getting. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sat, Jun 7, 2014 at 10:52 AM, ienjreny <ismaeel.enjr...@gmail.com> wrote: > Hello, > > I am using the following script to index RSS items > > <dataSource type="URLDataSource" encoding="UTF-8" /> > <document> > <entity name="slashdot" > pk="link" > url="http://www.alarabiya.net/.mrss/ar.xml" > processor="XPathEntityProcessor" > forEach="/rss/channel/item"> > > <field column="category_name" name="category_name" > xpath="/rss/channel/item/title" /> > <field column="link" name="url" xpath="/rss/channel/item/link" /> > > </entity> > </document> > > But I am facing the following error > > Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag > </head>; expected </meta>. > > Can any body help? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Error-when-using-URLDataSource-to-index-RSS-items-tp4140548.html > Sent from the Solr - User mailing list archive at Nabble.com.