Finally getting back to this...
On Sep 17, 2009, at 12:28 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
2009/9/17 Noble Paul നോബിള് नोब्ळ्
<noble.p...@corp.aol.com>:
it is possible to have a sub entity which has XPathEntityProcessor
which can use the link ar the url
This may not be a good solution.
But you can use the $hasMore and $nextUrl options of
XPathEntityProcessor to recursively loop if there are more links
Is there an example of this somewhere? The DIH Wiki refers to it, but
I don't see an example of it.
I have:
<entity name="nytSportsFeed"
pk="link"
url="http://feeds1.nytimes.com/nyt/rss/Sports
"
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/
item"
dataSource="rss"
transformer="RegexTransformer,DateFormatTransformer">
<field column="source" xpath="/rss/channel/
title" commonField="true" />
<field column="source-link" xpath="/rss/
channel/link" commonField="true" />
<field column="title" xpath="/rss/channel/
item/title" />
<field column="id" xpath="/rss/channel/item/
guid" />
<field column="link" xpath="/rss/channel/item/
link" />
<!-- Use the RegexTransformer to strip out ads -->
<field column="description" xpath="/rss/
channel/item/description" regex="<a.*?</a>" replaceWith=""/>
<field column="category" xpath="/rss/channel/
item/category" />
<!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
<field column="pubDate" xpath="/rss/channel/item/pubDate"
dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
</entity>
And I want to take the value from the link column and go get the
contents of that link and index them into a "body" field.
I'm not sure how to link in the sub-entity.
Thanks,
Grant
On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll
<gsing...@apache.org> wrote:
Many RSS feeds contain a <link> to some full article. How can I
have the
DIH get the RSS feed and then have it go and fetch the content at
the link?
Thanks,
Grant