Finally getting back to this...

On Sep 17, 2009, at 12:28 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

2009/9/17 Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>:
it is possible to have a sub entity which has XPathEntityProcessor
which can use the link ar the url

This may not be a good solution.

But you can use the $hasMore and $nextUrl options of
XPathEntityProcessor to recursively loop if there are more links

Is there an example of this somewhere? The DIH Wiki refers to it, but I don't see an example of it.

I have:
 <entity name="nytSportsFeed"
                                pk="link"
url="http://feeds1.nytimes.com/nyt/rss/Sports "
                                processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/ item"
            dataSource="rss"
        transformer="RegexTransformer,DateFormatTransformer">
<field column="source" xpath="/rss/channel/ title" commonField="true" /> <field column="source-link" xpath="/rss/ channel/link" commonField="true" /> <field column="title" xpath="/rss/channel/ item/title" /> <field column="id" xpath="/rss/channel/item/ guid" /> <field column="link" xpath="/rss/channel/item/ link" />
      <!-- Use the RegexTransformer to strip out ads -->
<field column="description" xpath="/rss/ channel/item/description" regex="&lt;a.*?&lt;/a&gt;" replaceWith=""/> <field column="category" xpath="/rss/channel/ item/category" />
      <!-- 'Sun, 18 May 2008 11:23:11 +0000' -->
<field column="pubDate" xpath="/rss/channel/item/pubDate" dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss Z" />
    </entity>

And I want to take the value from the link column and go get the contents of that link and index them into a "body" field.

I'm not sure how to link in the sub-entity.

Thanks,
Grant



On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll <gsing...@apache.org> wrote:
Many RSS feeds contain a <link> to some full article. How can I have the DIH get the RSS feed and then have it go and fetch the content at the link?

Thanks,
Grant



Reply via email to