Okay, thanks for the info. On Fri, May 4, 2012 at 4:42 PM, Jack Krupansky <j...@basetechnology.com>wrote:
> Evidently there was a problem with highlighting of HTML that is supposedly > fixed in Solr 3.6 and trunk: > > https://issues.apache.org/**jira/browse/SOLR-42<https://issues.apache.org/jira/browse/SOLR-42> > > > -- Jack Krupansky > > -----Original Message----- From: okayndc > Sent: Friday, May 04, 2012 4:35 PM > > To: solr-user@lucene.apache.org > Subject: Re: how to present html content in browse > > Is it possible to return the HTML field highlighted? > > On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky <j...@basetechnology.com>** > wrote: > > 1. The raw html field (call it, "text_html") would be a "string" type >> field that is "stored" but not "indexed". This is the field you direct DIH >> to output to. This is the field you would return in your search results >> with the HTML to be displayed. >> >> 2. The stripped field (call it, "text_stripped") would be a "text" type >> field (where "text" is a field type you add that uses the HTML strip char >> filter as shown below) that is not "stored" but is "indexed. Add a >> CopyField to your schema that copies from the raw html field to the >> stripped field (say, "text_html" to "text_stripped".) >> >> For reference on HTML strip (HTMLStripCharFilterFactory), see: >> http://wiki.apache.org/solr/****AnalyzersTokenizersTokenFilter****s<http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s> >> <http://wiki.apache.org/**solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters> >> > >> >> >> Which has: >> >> <fieldtype name="text" class="solr.TextField"> >> <analyzer> >> <charFilter class="solr.****HTMLStripCharFilterFactory"/> >> <charFilter class="solr.****MappingCharFilterFactory" >> mapping="mapping-** >> ISOLatin1Accent.txt"/> >> <tokenizer class="solr.****StandardTokenizerFactory"/> >> <filter class="solr.****LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory"****/> >> <filter class="solr.****PorterStemFilterFactory"/> >> >> </analyzer> >> </fieldtype> >> >> Although, you might want to call that field type "text_stripped" to avoid >> confusion with a simple text field >> >> You can add HTMLStripCharFilterFactory to some other field type that you >> might want to use, but this "charFilter" needs to be before the >> "tokenizer". The "text" field type above is just an example. >> >> -- Jack Krupansky >> >> -----Original Message----- From: okayndc >> Sent: Friday, May 04, 2012 1:01 PM >> To: solr-user@lucene.apache.org >> Subject: Re: how to present html content in browse >> >> >> Hello, >> >> I'm having a hard time understanding this, and I had this same question. >> >> When using DIH should the HTML field be stored in the raw HTML string >> field >> or the stripped field? >> Also what source field(s) need to be copied and to what destination? >> >> Thanks >> >> >> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog <goks...@gmail.com> wrote: >> >> Make two fields, one with stores the stripped HTML and another that >> >>> stores the parsed HTML. You can use <copyField> so that you do not >>> have to submit the html page twice. >>> >>> You would mark the stripped field 'indexed=true stored=false' and the >>> full text field the other way around. The full text field should be a >>> String type. >>> >>> On Thu, May 3, 2012 at 1:04 PM, srini <softtec...@gmail.com> wrote: >>> > I am indexing records from database using DIH. The content of my record >>> is in >>> > html format. When I use browse >>> > I would like to show the content in html format, not in text format. > >>> Any >>> > ideas? >>> > >>> > -- >>> > View this message in context: >>> http://lucene.472066.n3.**nabb**le.com/how-to-present-**<http://nabble.com/how-to-present-**> >>> html-content-in-browse-****tp3960327.html<http://lucene.** >>> 472066.n3.nabble.com/how-to-**present-html-content-in-** >>> browse-tp3960327.html<http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html> >>> > >>> >>> > Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> Lance Norskog >>> goks...@gmail.com >>> >>> >>> >> >