On Wed, Jan 13, 2010 at 7:48 AM, Lance Norskog wrote:
> You can do this stripping in the DataImportHandler. You would have to
> write your own stripping code using regular expresssions.
Note that DIH has a HTMLStripTransformer which wraps Solr's HTMLStripReader.
--
Regards,
Shalin Shekhar Man
Makes so much sense.
>>> >> >
>>> >> > Thanks to Ryan McKinley for clarifying this.
>>> >> > Aseem
>>> >> >
>>> >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema
>>>
>>> >> > wrote:
>>> >
t with the following content using
>> SolrJ:
>> >> >> content
>> >> >> I need the xml/html tags to be ignored. Even though this works fine
>> in
>> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes
>> the
>> >> &
l tags to be ignored. Even though this works fine
> in
> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes
> the
> >> >> < and > with < and > and HTMLStripCharFilterFactory does not
> >> >> strip those escaped tags. H
: stored without tags. But looks like the html tags are removed and terms are
: indexed purely for indexing, and the actual text is stored in raw format.
Correct. Analysis is all about "indexing" it has nothing to do with
"stored" content.
You can write UpdateProcessors that modify the content
th < and > and HTMLStripCharFilterFactory does not
>> >> strip those escaped tags. How can I achieve this? Any ideas will be
>> >> highly appreciated.
>> >>
>> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>>
is escapedTags in HTMLStripCharFilterFactory constructor. Is
> >> there a way to get that to work?
> >> Thanks
> >> --
> >> Aseem
> >>
> >
> >
> >
> > --
> > Aseem
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
ow can I achieve this? Any ideas will be
>> highly appreciated.
>>
>> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>> there a way to get that to work?
>> Thanks
>> --
>> Aseem
>>
>
>
>
> --
> Aseem
>
>
-
Alright. It turns out that escapedTags is not for what I thought it is for.
The problem that I am having with HTMLStripCharFilterFactory is that
it strips the html while indexing the field, but not while storing the
field. That is why what is see in analysis.jsp, which is index
analysis, does not m
I am trying to post a document with the following content using SolrJ:
content
I need the xml/html tags to be ignored. Even though this works fine in
analysis.jsp, this does not work with SolrJ, as the client escapes the
< and > with < and > and HTMLStripCharFilterFactory does not
strip those escap
10 matches
Mail list logo