Thanks a lot! I thought I'd looked on this page but didn't see this one, not sure why.
I greatly appreciate it! Ron -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Sunday, February 20, 2011 5:59 AM To: solr-user@lucene.apache.org Subject: Re: XML Stripping from DIH Ron, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: "Olson, Ron" <rol...@lbpc.com> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Fri, February 18, 2011 4:05:15 PM > Subject: XML Stripping from DIH > > Hi all- > > I have some XML in a database that I am trying to index and store; I am >interested in the various pieces of text, but none of the tags. I've been >trying to figure out a way to strip all the tags out, but haven't found >anything within Solr to do so; the XML parser seems to want XPath to get the >various element values, when all I want is to turn the whole thing into one >blob >of text, regardless of whether it makes any "contextual" sense. > > Is there something in Solr to do this, or is it something I'd have to write >myself (which I'm willing to do if necessary)? > > Thanks for any info, > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or >documents, is intended only for the addressee and may contain CONFIDENTIAL, >PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended >recipient, you are hereby notified that any use, disclosure, copying or >distribution of this message or any of the information included in or with it >is unauthorized and strictly prohibited. If you have received this message >in >error, please notify the sender immediately by reply e-mail and permanently >delete and destroy this message and its attachments, along with any copies >thereof. This message does not create any contractual obligation on behalf of >the sender or Law Bulletin Publishing Company. > Thank you. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.