Another way to index XML data is to use the normal Solr XML updater and wrap your XML documents inside CDATA blocks.
On Mon, Sep 28, 2009 at 2:12 AM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 <peter.th...@navy.mil> wrote: > With a basically default install of the trunk version of solr 1.4 > when trying to index an xml file, it appears that the xml tags > seem to get stripped when indexed. > > If the tag names and their frequenicies are important to me for search > purposes could someone tell me what > my options are to not have solr strip out xml tags? > for example > > if I have and xml tag of > <tag1> hello </tag1> > I'd like to see tag1 appear twice as a term and count as 2 is some > termFrequency vector. > > I was trying out the examples from this link > http://wiki.apache.org/solr/ExtractingRequestHandler > > and sending in an xml file. > > Would I need to modify some exsiting code or is it just a configuration > to not strip out xml tags in processing? > > -Peter > > > > > > > > ****************************************************************** > > Peter Thung > > Software Developer > > IBS Project Technical Lead -Web Developer > > > > Code 56340 - Net-centric ISR Development Branch > > Joint & National ISR Systems Division > > Inteligence, Surveillance and Reconnaissance Department > > US Navy Space & Naval Warfare Systems Center Pacific (SSC PAC) > > Topside Campus, Bldg A33, room 0055 > > 53560 Hull Street, San Diego, CA 92152 > > > > UNCLASS Email: peter.th...@navy.mil > > SIPRNET Email: thu...@spawar.navy.smil.mil > > COMM (Primary): (619) 553-6513 > > COMM (Secondary):(619) 553-0777 > > FAX: (619) 553-1586 > > ****************************************************************** > > > > > -- Lance Norskog goks...@gmail.com