Re: Need guidance on schema type

2010-05-28 Thread Lance Norskog
Both use the same HTML stripper. The DIH lets you run multiple documents in parallel in one request if that helps. On Thu, May 27, 2010 at 9:32 AM, Blargy wrote: > > There will never be any need to search the actual HTML (tags, markup, etc) so > as far as functionality goes it seems like the DIH

Re: Need guidance on schema type

2010-05-27 Thread Blargy
There will never be any need to search the actual HTML (tags, markup, etc) so as far as functionality goes it seems like the DIH HTMLStripTransformer is the way to go. Are there any significant performance differences between the two? -- View this message in context: http://lucene.472066.n3.nab

Re: Need guidance on schema type

2010-05-26 Thread Lance Norskog
If you use the stripping filter, the stored text is the original HTML. You can then highlight text inside the HTML. If you use the stripping DIH transformer, you will store the stripped text. It will be somewhat smaller. You can highlight the stripped text blobs, but you can't highlight the origina