loading XML docbook files into solr

Derek Werthmuller Sat, 26 Feb 2011 07:37:28 -0800

I've been working on this for a while an seem to hit a wall.  The error
messages aren't complete enought to give guidance why importing a sample
docbook document
into solr is not working.
I'm using the curl tool to post the xml file and receive a non error message
but the document count doesn't increase and the *:* returns no results
still.
The docbook document has a attribute id and this is mapped to the uniquekey
in the schema.xml file.  But it seems this may be the issue still.  Its not
clear
how the field names map to the XML.  Do they only map to attributes?  or do
they map to elements?   How to you differentiate?
Can field names in the schema.xml file have xpath statements?


Are there other important sections of the solrconfig that could be keeping
this from working?

We want to maintain much of the document structure so we have more control
over the searching.

Here is what the docbook XML looks like:  (tried setting the uniquekey to id
and docid but no go either way)

<book label="issuebriefs" id="proi">
        <docid>245</docid>
    <titleabbrev>Advancing Return on Investment Analysis for Government IT:
A Pu
blic Value Framework </titleabbrev>
    <chapter>
        <title>Advancing Return on Investment Analysis for Government IT: A
Publ
ic Value Framework</title>
        <para>
            <mediaobject>
                <imageobject>
                    <imagedata
fileref="/publications/annualreports/ar2006/image
s/public-value.jpg" format="jpg" contentdepth="157" contentwidth="216"
align="le
ft"/>
                </imageobject>
                <textobject>
                    <phrase>Public Value Illustration</phrase>
                </textobject>
            </mediaobject>
....
..

Here is the section of the schema.xml  
        <field name="id" type="string" indexed="true" stored="true"
multiValued="false" required="true" />
        <field name="titleabbrev" type="text" indexed="true" stored="true"
/>
        <field name="title" type="text" indexed="true" stored="true" />
        
        <field name="para" type="text" indexed="true" stored="true" />
        <field name="ulink" type="string" indexed="true" stored="true" />
        <field name="listitem" type="text" indexed="true" stored="true" />
        
        <field name="all_text" type="text" indexed="true" stored="false"
multiValued="true" />

       <copyField source="title" dest="all_text" />
        <copyField source="para" dest="all_text" />
        <copyField source="listitem" dest="all_text" />
        <copyField source="titleabbrev" dest="all_text" />


 </fields>

 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a
required field
   -->
 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent
-->
 <defaultSearchField>all_text</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="OR"/>

</schema>

Load command results.

$ ./postfile.sh 
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">56</int></lst>
</response>
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">15</int></lst>
</response>


Thanks
        Derek

loading XML docbook files into solr

Reply via email to