Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-06-01 Thread Damian Bursztyn
> > > > > -Original Message- > > From: Lance Norskog [mailto:goks...@gmail.com] > > Sent: 09 March 2010 04:36 > > To: solr-user@lucene.apache.org > > Subject: Re: HTML encode extracted docs > > > > A Tika integration with the DataIm

Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-13 Thread Lance Norskog
; > > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: 09 March 2010 04:36 > To: solr-user@lucene.apache.org > Subject: Re: HTML encode extracted docs > > A Tika integration with the DataImportHandler is in the Solr trunk. > With this

RE: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-09 Thread Mark Roberts
-user@lucene.apache.org Subject: Re: HTML encode extracted docs A Tika integration with the DataImportHandler is in the Solr trunk. With this, you can copy the raw HTML into different fields and process one copy with Tika. If it's just straight HTML, would the HTMLStripCharFilter be good

Re: HTML encode extracted docs

2010-03-08 Thread Lance Norskog
A Tika integration with the DataImportHandler is in the Solr trunk. With this, you can copy the raw HTML into different fields and process one copy with Tika. If it's just straight HTML, would the HTMLStripCharFilter be good enough? http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2