Hi,

 

I wish to index well formed xml documents as they are.

I have a database filled with MARCXML records. An example of these looks like 
this:

 

        <record

            ns0:schemaLocation="http://www.loc.gov/MARC21/slim 
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd";

            xmlns="http://www.loc.gov/MARC21/slim"; 
xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance";>

            <leader>00000nam  22      a 4500</leader>

            <controlfield tag="001">000500000</controlfield>

            <controlfield tag="005">20050826220257.0</controlfield>

            <controlfield tag="008">000710s1998    xx      r     000 0 dut 
d</controlfield>

            <datafield ind1=" " ind2=" " tag="040">

                <subfield code="a">Univ</subfield>

            </datafield>

            <datafield ind1="1" ind2=" " tag="100">

                <subfield code="a">van Wetten, J. W.</subfield>

            </datafield>

            <datafield ind1="1" ind2="3" tag="245">

                <subfield code="a">De positie van vrouwen in de asielprocedure 
/</subfield>

                <subfield code="c">J.W. van Wetten, N. Dijkhof, F. 
Heide.</subfield>

            </datafield>

        </record>

 

The idea is to create Lucene indexes on specific MARC fields and store the 
complete MARC record in Lucene 'as is'. In the presentation layer of my 
application I would then have this complete MARC record at hand, and as such 
have full flexibility on which MARC fields to display. So I want to create the 
following record through XSLT and feed this to SOLR. 

 

<doc>

<field name="title">De positie van vrouwen in de asielprocedure</field>

<field name="author">van Wetten, J. W.</field>

...

<field name="originalRecord">

  <record

            ns0:schemaLocation="http://www.loc.gov/MARC21/slim 
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd";

            xmlns="http://www.loc.gov/MARC21/slim"; 
xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance";>

            <leader>00000nam  22      a 4500</leader>

            <controlfield tag="001">000500000</controlfield>

            <controlfield tag="005">20050826220257.0</controlfield>

            <controlfield tag="008">000710s1998    xx      r     000 0 dut 
d</controlfield>

            <datafield ind1=" " ind2=" " tag="040">

                <subfield code="a">UGent</subfield>

            </datafield>

            <datafield ind1="1" ind2=" " tag="100">

                <subfield code="a">van Wetten, J. W.</subfield>

            </datafield>

            <datafield ind1="1" ind2="3" tag="245">

                <subfield code="a">De positie van vrouwen in de asielprocedure 
/</subfield>

                <subfield code="c">J.W. van Wetten, N. Dijkhof, F. 
Heide.</subfield>

            </datafield>

        </record>

</field>

</doc>

 

I have the following in my schema.xml:

 

<field name="author" type="text" indexed="true" stored="true" 
termVectors="true"/>

<field name="title" type="text" indexed="true" stored="true" 
termVectors="true"/>

<field name="originalRecord" type="text" indexed="false" stored="true"/>

 

 

SOLR has of course a problem with the XML in the 'originalRecord' field. 

Is there a solution to this? Has anyone done this before? 

 

Thanks a lot.

Benoit.

 

 

=============================

PAUWELS Benoit

Université Libre de Bruxelles - Libraries

Head of Automation

Av. F.D. Roosevelt 50, CP 180

1050 BRUSSELS

Belgium

Tel: + 32 2 650 23 91

Fax: + 32 2 650 23 91

=============================

 

 

Reply via email to