Both use the same HTML stripper. The DIH lets you run multiple
documents in parallel in one request if that helps.
On Thu, May 27, 2010 at 9:32 AM, Blargy wrote:
>
> There will never be any need to search the actual HTML (tags, markup, etc) so
> as far as functionality goes it seems like the DIH
There will never be any need to search the actual HTML (tags, markup, etc) so
as far as functionality goes it seems like the DIH HTMLStripTransformer is
the way to go.
Are there any significant performance differences between the two?
--
View this message in context:
http://lucene.472066.n3.nab
If you use the stripping filter, the stored text is the original HTML.
You can then highlight text inside the HTML. If you use the stripping
DIH transformer, you will store the stripped text. It will be somewhat
smaller. You can highlight the stripped text blobs, but you can't
highlight the origina