> I have problem related to html tag. > > Basically in database some column carry html tage, for > example" > <p> Hello how are you? </p> > I am indexing same as it is in index. > > I am filtering solr supported special character at query > time. > > now the problem is when I am searching by "p" then result > is > *<p> Hello how are you? </p>* > I dont want to search in html tag content? > > please help?
You can remove html tags in analysis phase with HTMLStripCharFilterFactory[1]. With this, searching p wont return *<p> Hello how are you? </p>* anymore. But when you search hello, returned document will still contain <p> tags. If you do not want this behavior ( what only *Hello how are you?* ) and you are using DIH, you can use HTMLStripTransformer[2] [1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory [2]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer