On Tue, Nov 4, 2008 at 1:31 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Thank you for the "rootEntity" tip. Does this mean that the inner loop only 
> walks the first item and breaks out of the loop? This is very good because it 
> allows me to drill down a few levels without downloading 10,000 feeds. 
> (Public API sites tend to dislike this behavior :)
>

nope . It goes through each item in the inner loop and create one
document for each item.

> The URL is wrong because the streaming parser is iterating past the end of 
> the element entries. It is an off-by-one bug of some sort in the DIH code.
>
> Thanks,
>
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: Saturday, November 01, 2008 7:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH Http input bug - problem with two-level RSS walker
>
> If you wish to create 1 doc per inner entity the set rootEntity="false" for 
> the entity outer.
> The exception is because the url is wrong
>
> On Sat, Nov 1, 2008 at 10:30 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
>> I wrote a nested HttpDataSource RSS poller. The outer loop reads an
>> rss feed which contains N links to other rss feeds. The nested loop
>> then reads each one of those to create documents. (Yes, this is an
>> obnoxious thing to do.) Let's say the outer RSS feed gives 10 items.
>> Both feeds use the same
>> structure: /rss/channel with a <title> node and then N <item> nodes
>> inside the channel. This should create two separate XML streams with
>> two separate Xpath iterators, right?
>>
>> <entity name="outer" http stuff>
>>    <field column="name" xpath="/rss/channel/title" />
>>    <field column="url" xpath="/rss/channel/item/link"/>
>>
>>    <entity name="inner" http stuff url="${outer.url}" pk="title" >
>>        <field column="title" xpath="/rss/channel/item/title" />
>>    </entity>
>> </entity>
>>
>> This does indeed walk each url from the outer feed and then fetch the
>> inner rss feed. Bravo!
>>
>> However, I found two separate problems in xpath iteration. They may be
>> related. The first problem is that it only stores the first document
>> from each "inner" feed. Each feed has several documents with different
>> title fields but it only grabs the first.
>>
>> The other is an off-by-one bug. The outer loop iterates through the 10
>> items and then tries to pull an 11th.  It then gives this exception trace:
>>
>> INFO: Created URL to:  [inner url]
>> Oct 31, 2008 11:21:20 PM
>> org.apache.solr.handler.dataimport.HttpDataSource
>> getData
>> SEVERE: Exception thrown while getting data
>> java.net.MalformedURLException: no protocol: null/account.rss
>>        at java.net.URL.<init>(URL.java:567)
>>        at java.net.URL.<init>(URL.java:464)
>>        at java.net.URL.<init>(URL.java:413)
>>        at
>> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSour
>> ce.jav
>> a:90)
>>        at
>> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSour
>> ce.jav
>> a:47)
>>        at
>> org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.j
>> ava:18
>> 3)
>>        at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPat
>> hEntit
>> yProcessor.java:210)
>>        at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(X
>> PathEn
>> tityProcessor.java:180)
>>        at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathE
>> ntityP
>> rocessor.java:160)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
>> 285)
>>  ...
>> Oct 31, 2008 11:21:20 PM org.apache.solr.handler.dataimport.DocBuilder
>> buildDocument
>> SEVERE: Exception while processing: album document :
>> SolrInputDocumnt[{name=name(1.0)={Groups of stuff}}]
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> Exception in invoking url null Processing Document # 11
>>        at
>> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSour
>> ce.jav
>> a:115)
>>        at
>> org.apache.solr.handler.dataimport.HttpDataSource.getData(HttpDataSour
>> ce.jav
>> a:47)
>>
>>
>>
>>
>>
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul

Reply via email to