Okay, thanks for the info.

On Fri, May 4, 2012 at 4:42 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> Evidently there was a problem with highlighting of HTML that is supposedly
> fixed in Solr 3.6 and trunk:
>
> https://issues.apache.org/**jira/browse/SOLR-42<https://issues.apache.org/jira/browse/SOLR-42>
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: okayndc
> Sent: Friday, May 04, 2012 4:35 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: how to present html content in browse
>
> Is it possible to return the HTML field highlighted?
>
> On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky <j...@basetechnology.com>**
> wrote:
>
>  1. The raw html field (call it, "text_html") would be a "string" type
>> field that is "stored" but not "indexed". This is the field you direct DIH
>> to output to. This is the field you would return in your search results
>> with the HTML to be displayed.
>>
>> 2. The stripped field (call it, "text_stripped") would be a "text" type
>> field (where "text" is a field type you add that uses the HTML strip char
>> filter as shown below) that is not "stored" but is "indexed. Add a
>> CopyField to your schema that copies from the raw html field to the
>> stripped field (say, "text_html" to "text_stripped".)
>>
>> For reference on HTML strip (HTMLStripCharFilterFactory), see:
>> http://wiki.apache.org/solr/****AnalyzersTokenizersTokenFilter****s<http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s>
>> <http://wiki.apache.org/**solr/**AnalyzersTokenizersTokenFilter**s<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>
>> >
>>
>>
>> Which has:
>>
>> <fieldtype name="text" class="solr.TextField">
>>  <analyzer>
>>  <charFilter class="solr.****HTMLStripCharFilterFactory"/>
>>  <charFilter class="solr.****MappingCharFilterFactory"
>> mapping="mapping-**
>> ISOLatin1Accent.txt"/>
>>  <tokenizer class="solr.****StandardTokenizerFactory"/>
>>  <filter class="solr.****LowerCaseFilterFactory"/>
>>  <filter class="solr.StopFilterFactory"****/>
>>  <filter class="solr.****PorterStemFilterFactory"/>
>>
>>  </analyzer>
>> </fieldtype>
>>
>> Although, you might want to call that field type "text_stripped" to avoid
>> confusion with a simple text field
>>
>> You can add HTMLStripCharFilterFactory to some other field type that you
>> might want to use, but this "charFilter" needs to be before the
>> "tokenizer". The "text" field type above is just an example.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: okayndc
>> Sent: Friday, May 04, 2012 1:01 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: how to present html content in browse
>>
>>
>> Hello,
>>
>> I'm having a hard time understanding this, and I had this same question.
>>
>> When using DIH should the HTML field be stored in the raw HTML string
>> field
>> or the stripped field?
>> Also what source field(s) need to be copied and to what destination?
>>
>> Thanks
>>
>>
>> On Thu, May 3, 2012 at 10:15 PM, Lance Norskog <goks...@gmail.com> wrote:
>>
>>  Make two fields, one with stores the stripped HTML and another that
>>
>>> stores the parsed HTML. You can use <copyField> so that you do not
>>> have to submit the html page twice.
>>>
>>> You would mark the stripped field 'indexed=true stored=false' and the
>>> full text field the other way around. The full text field should be a
>>> String type.
>>>
>>> On Thu, May 3, 2012 at 1:04 PM, srini <softtec...@gmail.com> wrote:
>>> > I am indexing records from database using DIH. The content of my record
>>> is in
>>> > html format. When I use browse
>>> > I would like to show the content in html format, not in text format. >
>>> Any
>>> > ideas?
>>> >
>>> > --
>>> > View this message in context:
>>> http://lucene.472066.n3.**nabb**le.com/how-to-present-**<http://nabble.com/how-to-present-**>
>>> html-content-in-browse-****tp3960327.html<http://lucene.**
>>> 472066.n3.nabble.com/how-to-**present-html-content-in-**
>>> browse-tp3960327.html<http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html>
>>> >
>>>
>>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>>
>>>
>>
>

Reply via email to