On a side note ... it would be nice if your data source could also be
the result of a script (instead of trying to hack around it w/
JdbcDataSource) ...
Something similar to what ScriptTransformer does ...
(http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
)
An example would be:
<dataSource type="ScriptDataSource" name="outerloop"
script="outerloop.js" />
(The script would basically contain just a callback - getData(String
query) that results in an array set or might set values on it's
children, etc)
- Jon
On Nov 3, 2008, at 12:40 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
Hi Lance,
I guess I got your problem
So you wish to create docs for both entities (as suggested by Jon
Baer). So the best solution would be to create two root entities. The
first one should be the outer and write a transformer to store all the
urls into the db . The JdbcDataSource can do inserts/update too (the
method is same getData()). The second entity can read from db and
create docs (see Jon baer's suggestion) using the
XPathEntityProcessor as a sub-entity
--Noble
On Mon, Nov 3, 2008 at 9:44 AM, Noble Paul നോബിള്
नोब्ळ्
<[EMAIL PROTECTED]> wrote:
Hi Lance,
Do a full import w/o debug and let us know if my suggestion worked
(rootEntity="false" ) . If it didn't , I can suggest u something else
(Writing a Transformer )
On Sun, Nov 2, 2008 at 8:13 AM, Noble Paul നോബിള്
नोब्ळ्
<[EMAIL PROTECTED]> wrote:
If you wish to create 1 doc per inner entity the set
rootEntity="false" for the entity outer.
The exception is because the url is wrong
On Sat, Nov 1, 2008 at 10:30 AM, Lance Norskog <[EMAIL PROTECTED]>
wrote:
I wrote a nested HttpDataSource RSS poller. The outer loop reads
an rss feed
which contains N links to other rss feeds. The nested loop then
reads each
one of those to create documents. (Yes, this is an obnoxious
thing to do.)
Let's say the outer RSS feed gives 10 items. Both feeds use the
same
structure: /rss/channel with a <title> node and then N <item>
nodes inside
the channel. This should create two separate XML streams with two
separate
Xpath iterators, right?
<entity name="outer" http stuff>
<field column="name" xpath="/rss/channel/title" />
<field column="url" xpath="/rss/channel/item/link"/>
<entity name="inner" http stuff url="${outer.url}" pk="title" >
<field column="title" xpath="/rss/channel/item/title" />
</entity>
</entity>
This does indeed walk each url from the outer feed and then fetch
the inner
rss feed. Bravo!
However, I found two separate problems in xpath iteration. They
may be
related. The first problem is that it only stores the first
document from
each "inner" feed. Each feed has several documents with different
title
fields but it only grabs the first.
The other is an off-by-one bug. The outer loop iterates through
the 10 items
and then tries to pull an 11th. It then gives this exception
trace:
INFO: Created URL to: [inner url]
Oct 31, 2008 11:21:20 PM
org.apache.solr.handler.dataimport.HttpDataSource
getData
SEVERE: Exception thrown while getting data
java.net.MalformedURLException: no protocol: null/account.rss
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org
.apache
.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
a:90)
at
org
.apache
.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
a:47)
at
org.apache.solr.handler.dataimport.DebugLogger
$2.getData(DebugLogger.java:18
3)
at
org
.apache
.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntit
yProcessor.java:210)
at
org
.apache
.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEn
tityProcessor.java:180)
at
org
.apache
.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityP
rocessor.java:160)
at
org
.apache
.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
285)
...
Oct 31, 2008 11:21:20 PM
org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: album document :
SolrInputDocumnt[{name=name(1.0)={Groups of stuff}}]
org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in
invoking url null Processing Document # 11
at
org
.apache
.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
a:115)
at
org
.apache
.solr.handler.dataimport.HttpDataSource.getData(HttpDataSource.jav
a:47)
--
--Noble Paul
--
--Noble Paul
--
--Noble Paul