<entity name="nytSportsFeed" pk="link" url="http://feeds1.nytimes.com/nyt/rss/Sports " processor="XPathEntityProcessor" forEach="/rss/channel | /rss/channel/item" dataSource="rss"
transformer="RegexTransformer,DateFormatTransformer"> <field column="source" xpath="/rss/channel/title" commonField="true" /> <field column="source-link" xpath="/rss/channel/link" commonField="true" /> <field column="title" xpath="/rss/channel/item/title" /> <field column="id" xpath="/rss/channel/item/guid" /> <field column="link" xpath="/rss/channel/item/link" /> <!-- Use the RegexTransformer to strip out ads --> <field column="description" xpath="/rss/channel/item/description" regex="<a.*?</a>" replaceWith=""/> <field column="category" xpath="/rss/channel/item/category" /> <!-- 'Sun, 18 May 2008 11:23:11 +0000' --> <field column="pubDate" xpath="/rss/channel/item/pubDate" dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" /> <entity name="x" url="${nytSportsFeed.link}" processor="PlainTextEntityProcessor" dataSource="rss" transformer="HTMLStripTransformer"> <field column="plainText" name="body" stripHTML="true/> </entity> </entity> On Tue, Oct 20, 2009 at 6:13 PM, Grant Ingersoll <gsing...@apache.org>wrote: > Finally getting back to this... > > On Sep 17, 2009, at 12:28 AM, Noble Paul നോബിള് नोब्ळ् wrote: > > 2009/9/17 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >> >>> it is possible to have a sub entity which has XPathEntityProcessor >>> which can use the link ar the url >>> >> >> This may not be a good solution. >> >> But you can use the $hasMore and $nextUrl options of >> XPathEntityProcessor to recursively loop if there are more links >> > > Is there an example of this somewhere? The DIH Wiki refers to it, but I > don't see an example of it. > > I have: > <entity name="nytSportsFeed" > pk="link" > url=" > http://feeds1.nytimes.com/nyt/rss/Sports" > processor="XPathEntityProcessor" > forEach="/rss/channel | /rss/channel/item" > dataSource="rss" > transformer="RegexTransformer,DateFormatTransformer"> > <field column="source" xpath="/rss/channel/title" > commonField="true" /> > <field column="source-link" > xpath="/rss/channel/link" commonField="true" /> > <field column="title" > xpath="/rss/channel/item/title" /> > <field column="id" xpath="/rss/channel/item/guid" /> > <field column="link" xpath="/rss/channel/item/link" > /> > <!-- Use the RegexTransformer to strip out ads --> > <field column="description" > xpath="/rss/channel/item/description" regex="<a.*?</a>" > replaceWith=""/> > <field column="category" > xpath="/rss/channel/item/category" /> > <!-- 'Sun, 18 May 2008 11:23:11 +0000' --> > <field column="pubDate" xpath="/rss/channel/item/pubDate" > dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" /> > </entity> > > And I want to take the value from the link column and go get the contents > of that link and index them into a "body" field. > > I'm not sure how to link in the sub-entity. > > Thanks, > Grant > > > > >>> On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll <gsing...@apache.org> >>> wrote: >>> >>>> Many RSS feeds contain a <link> to some full article. How can I have >>>> the >>>> DIH get the RSS feed and then have it go and fetch the content at the >>>> link? >>>> >>>> Thanks, >>>> Grant >>>> >>>> > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com