Another way to index XML data is to use the normal Solr XML updater
and wrap your XML documents inside CDATA blocks.

On Mon, Sep 28, 2009 at 2:12 AM, Thung, Peter C CIV
SPAWARSYSCEN-PACIFIC, 56340 <peter.th...@navy.mil> wrote:
> With a basically default install of the trunk version of solr 1.4
> when trying to index an xml file, it appears that the xml tags
> seem to get stripped when indexed.
>
> If the tag names and their frequenicies are important to me for search
> purposes could someone tell me what
> my options are to not have solr strip out xml tags?
> for example
>
> if I have and xml tag of
> <tag1> hello </tag1>
> I'd like to see tag1 appear twice as a term and count as 2 is some
> termFrequency vector.
>
> I was trying out the examples from this link
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> and sending in an xml file.
>
> Would I need to modify some exsiting code or is it just a configuration
> to not strip out xml tags in processing?
>
> -Peter
>
>
>
>
>
>
>
> ******************************************************************
>
> Peter Thung
>
> Software Developer
>
> IBS Project Technical Lead -Web Developer
>
>
>
> Code 56340  - Net-centric ISR Development Branch
>
> Joint & National ISR Systems Division
>
> Inteligence, Surveillance and Reconnaissance Department
>
> US Navy Space & Naval Warfare Systems Center Pacific (SSC PAC)
>
> Topside Campus, Bldg A33, room 0055
>
> 53560 Hull Street, San Diego, CA 92152
>
>
>
> UNCLASS Email: peter.th...@navy.mil
>
> SIPRNET Email: thu...@spawar.navy.smil.mil
>
> COMM (Primary): (619) 553-6513
>
> COMM (Secondary):(619) 553-0777
>
> FAX: (619) 553-1586
>
> ******************************************************************
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to