<entity name="nytSportsFeed"
                               pk="link"
                               url="http://feeds1.nytimes.com/nyt/rss/Sports
"
                               processor="XPathEntityProcessor"
                               forEach="/rss/channel | /rss/channel/item"
                               dataSource="rss"

transformer="RegexTransformer,DateFormatTransformer">
                       <field column="source" xpath="/rss/channel/title"
commonField="true" />
                       <field column="source-link" xpath="/rss/channel/link"
commonField="true" />
                       <field column="title" xpath="/rss/channel/item/title"
/>
                       <field column="id" xpath="/rss/channel/item/guid" />
                       <field column="link" xpath="/rss/channel/item/link"
/>
     <!-- Use the RegexTransformer to strip out ads -->
                       <field column="description"
xpath="/rss/channel/item/description" regex="&lt;a.*?&lt;/a&gt;"
replaceWith=""/>
                       <field column="category"
xpath="/rss/channel/item/category" />
     <!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
     <field column="pubDate" xpath="/rss/channel/item/pubDate"
dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
     <entity name="x"   url="${nytSportsFeed.link}"
                        processor="PlainTextEntityProcessor"

                        dataSource="rss"
                        transformer="HTMLStripTransformer">
                        <field column="plainText" name="body"
stripHTML="true/>

     </entity>


   </entity>



On Tue, Oct 20, 2009 at 6:13 PM, Grant Ingersoll <gsing...@apache.org>wrote:

> Finally getting back to this...
>
> On Sep 17, 2009, at 12:28 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>  2009/9/17 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>
>>> it is possible to have a sub entity which has XPathEntityProcessor
>>> which can use the link ar the url
>>>
>>
>> This may not be a good solution.
>>
>> But you can use the $hasMore and $nextUrl options of
>> XPathEntityProcessor to recursively loop if there are more links
>>
>
> Is there an example of this somewhere?  The DIH Wiki refers to it, but I
> don't see an example of it.
>
> I have:
>  <entity name="nytSportsFeed"
>                                pk="link"
>                                url="
> http://feeds1.nytimes.com/nyt/rss/Sports";
>                                processor="XPathEntityProcessor"
>                                forEach="/rss/channel | /rss/channel/item"
>            dataSource="rss"
>        transformer="RegexTransformer,DateFormatTransformer">
>                        <field column="source" xpath="/rss/channel/title"
> commonField="true" />
>                        <field column="source-link"
> xpath="/rss/channel/link" commonField="true" />
>                        <field column="title"
> xpath="/rss/channel/item/title" />
>                        <field column="id" xpath="/rss/channel/item/guid" />
>                        <field column="link" xpath="/rss/channel/item/link"
> />
>      <!-- Use the RegexTransformer to strip out ads -->
>                        <field column="description"
> xpath="/rss/channel/item/description" regex="&lt;a.*?&lt;/a&gt;"
> replaceWith=""/>
>                        <field column="category"
> xpath="/rss/channel/item/category" />
>      <!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
>      <field column="pubDate" xpath="/rss/channel/item/pubDate"
> dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
>    </entity>
>
> And I want to take the value from the link column and go get the contents
> of that link and index them into a "body" field.
>
> I'm not sure how to link in the sub-entity.
>
> Thanks,
> Grant
>
>
>
>
>>> On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll <gsing...@apache.org>
>>> wrote:
>>>
>>>> Many RSS feeds contain a <link> to some full article.  How can I have
>>>> the
>>>> DIH get the RSS feed and then have it go and fetch the content at the
>>>> link?
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>>
>
>


-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to