Re: DIH Http input bug - problem with two-level RSS walker

2008-11-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
If you wish to create 1 doc per inner entity the set rootEntity="false" for the entity outer. The exception is because the url is wrong On Sat, Nov 1, 2008 at 10:30 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > I wrote a nested HttpDataSource RSS poller. The outer loop reads an rss feed > which c

Re: DIH Http input bug - problem with two-level RSS walker

2008-11-01 Thread Jon Baer
Another idea is to use create the logic you need and dump to a temp MySQL table and then fetch the feeds, that has worked pretty nicely for me, it removes the need for the outer feed to do the work. @ first I could not figure out if this was a bug or feature ... Something like ... proce

Re: DIH http input xpath syntax

2008-11-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
The parser is Stax. But the XPath implementation is custom. Certain XPath features are hard to implement in streaming way There is not documentation yet. You can access attributes like /root/a/b/@a attribute values can be checked like /root/a/[EMAIL PROTECTED]/x or /root/a/[EMAIL PROTECTED]@m='n']

Re: TermVectorComponent for tag generation?

2008-11-01 Thread Grant Ingersoll
On Nov 1, 2008, at 3:04 PM, Jon Baer wrote: On Nov 1, 2008, at 1:16 PM, Grant Ingersoll wrote: How do you propose to distinguish those words from the other ones? ** They are field values from other documents But so are many other words from that document, what separates out [Lucene,

DIH http input xpath syntax

2008-11-01 Thread Lance Norskog
The wiki page for the DIH handler mentions that the XML is parsed with a streaming parser and that the xpath parser only handles a subset of the xpath syntax. Which streaming parser is it and where would I find this subset documented? I tried a few things like the "the first entry" and "length of

RE: DIH Http input bug - problem with two-level RSS walker

2008-11-01 Thread Norskog, Lance
The inner entity drills down and gets more detail about each item in the outer loop. It creates one document. -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Friday, October 31, 2008 10:24 PM To: solr-user@lucene.apache.org Subject: Re: DIH Http input bug -

Re: TermVectorComponent for tag generation?

2008-11-01 Thread Jon Baer
On Nov 1, 2008, at 1:16 PM, Grant Ingersoll wrote: How do you propose to distinguish those words from the other ones? ** They are field values from other documents The problem you are addressing is often called keyword extraction. In general, it 's a difficult problem, but you may have d

Re: TermVectorComponent for tag generation?

2008-11-01 Thread Grant Ingersoll
How do you propose to distinguish those words from the other ones? The problem you are addressing is often called keyword extraction. In general, it 's a difficult problem, but you may have domain knowledge that can help. On Oct 31, 2008, at 6:35 PM, Jon Baer wrote: Well for example in