>
> >
> > -Original Message-
> > From: Lance Norskog [mailto:goks...@gmail.com]
> > Sent: 09 March 2010 04:36
> > To: solr-user@lucene.apache.org
> > Subject: Re: HTML encode extracted docs
> >
> > A Tika integration with the DataIm
;
>
> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com]
> Sent: 09 March 2010 04:36
> To: solr-user@lucene.apache.org
> Subject: Re: HTML encode extracted docs
>
> A Tika integration with the DataImportHandler is in the Solr trunk.
> With this
-user@lucene.apache.org
Subject: Re: HTML encode extracted docs
A Tika integration with the DataImportHandler is in the Solr trunk.
With this, you can copy the raw HTML into different fields and process
one copy with Tika.
If it's just straight HTML, would the HTMLStripCharFilter be good
A Tika integration with the DataImportHandler is in the Solr trunk.
With this, you can copy the raw HTML into different fields and process
one copy with Tika.
If it's just straight HTML, would the HTMLStripCharFilter be good enough?
http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2