> I have problem related to html tag.
> 
> Basically in database some column carry html tage, for
> example"
> <p> Hello how are you? </p>
> I am indexing same as it is in index.
> 
> I am filtering solr supported special character at query
> time.
> 
> now the problem is when I am searching by "p" then result
> is
> *<p> Hello how are you? </p>*
> I dont want to search in html tag content?
> 
> please help?

You can remove html tags in analysis phase with HTMLStripCharFilterFactory[1]. 
With this, searching p wont return *<p> Hello how are you? </p>* anymore.
But when you search hello, returned document will still contain <p> tags.

If you do not want this behavior ( what only *Hello how are you?* ) and you are 
using DIH, you can use HTMLStripTransformer[2]

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
[2]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer


      

Reply via email to