Ah, I read your post too fast and ignored the title. Sorry 'bout that. Erick
On Mon, Jan 11, 2010 at 2:55 PM, darniz <rnizamud...@edmunds.com> wrote: > > Well thats the whole discussion we are talking about. > I had the impression that the html tags are filtered and then the field is > stored without tags. But looks like the html tags are removed and terms are > indexed purely for indexing, and the actual text is stored in raw format. > > Lets say for example if i enter a field like > <field name="body"><p>honda car road review</field> > When i do analysis on the body field the html filter removes the <p> tag > and > indexed works honda, car, road, review. But when i fetch body field to > display in my document it returns <p>honda car road review > > I hope i make sense. > thanks > darniz > > > > Erick Erickson wrote: > > > > This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>shows you > > many > > of the SOLR analyzers and filters. Would one of > > the various *HTMLStrip* stuff work? > > > > HTH > > ERick > > > > On Mon, Jan 11, 2010 at 2:44 PM, darniz <rnizamud...@edmunds.com> wrote: > > > >> > >> Thanks we were having the saem issue. > >> We are trying to store article content and we are strong a field like > >> <p>This article is for blah </p>. > >> Wheni see the analysis.jsp page it does strip out the <p> tags and is > >> indexed. but when we fetch the document it returns the field with the > <p> > >> tags. > >> From solr point of view, its correct but our issue is that this kind of > >> html > >> tags is screwing up our display of our page. Is there an easy way to > >> esure > >> how to strip out hte html tags, or do we have to take care of manually. > >> > >> Thanks > >> Rashid > >> > >> > >> aseem cheema wrote: > >> > > >> > Alright. It turns out that escapedTags is not for what I thought it is > >> > for. > >> > The problem that I am having with HTMLStripCharFilterFactory is that > >> > it strips the html while indexing the field, but not while storing the > >> > field. That is why what is see in analysis.jsp, which is index > >> > analysis, does not match what gets stored... because.. well HTML is > >> > stripped only for indexing. Makes so much sense. > >> > > >> > Thanks to Ryan McKinley for clarifying this. > >> > Aseem > >> > > >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema <aseemche...@gmail.com> > >> > wrote: > >> >> I am trying to post a document with the following content using > SolrJ: > >> >> <center>content</center> > >> >> I need the xml/html tags to be ignored. Even though this works fine > in > >> >> analysis.jsp, this does not work with SolrJ, as the client escapes > the > >> >> < and > with < and > and HTMLStripCharFilterFactory does not > >> >> strip those escaped tags. How can I achieve this? Any ideas will be > >> >> highly appreciated. > >> >> > >> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is > >> >> there a way to get that to work? > >> >> Thanks > >> >> -- > >> >> Aseem > >> >> > >> > > >> > > >> > > >> > -- > >> > Aseem > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html > Sent from the Solr - User mailing list archive at Nabble.com. > >