Thanks a lot! I thought I'd looked on this page but didn't see this one, not 
sure why.

I greatly appreciate it!

Ron

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Sunday, February 20, 2011 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: XML Stripping from DIH

Ron,

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "Olson, Ron" <rol...@lbpc.com>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Sent: Fri, February 18, 2011 4:05:15 PM
> Subject: XML Stripping from DIH
>
> Hi all-
>
> I have some XML in a database that I am trying to index and  store; I am
>interested in the various pieces of text, but none of the tags. I've  been
>trying to figure out a way to strip all the tags out, but haven't found
>anything within Solr to do so; the XML parser seems to want XPath to get the
>various element values, when all I want is to turn the whole thing into one 
>blob
>of text, regardless of whether it makes any "contextual" sense.
>
> Is there  something in Solr to do this, or is it something I'd have to write
>myself (which  I'm willing to do if necessary)?
>
> Thanks for any  info,
>
> Ron
>
> DISCLAIMER: This electronic message, including any  attachments, files or
>documents, is intended only for the addressee and may  contain CONFIDENTIAL,
>PROPRIETARY or LEGALLY PRIVILEGED information.  If  you are not the intended
>recipient, you are hereby notified that any use,  disclosure, copying or
>distribution of this message or any of the information  included in or with it
>is  unauthorized and strictly prohibited.  If  you have received this message 
>in
>error, please notify the sender immediately by  reply e-mail and permanently
>delete and destroy this message and its  attachments, along with any copies
>thereof. This message does not create any  contractual obligation on behalf of
>the sender or Law Bulletin Publishing  Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

Reply via email to