Hi,
I wish to index well formed xml documents as they are.
I have a database filled with MARCXML records. An example of these looks like
this:
<record
ns0:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
xmlns="http://www.loc.gov/MARC21/slim"
xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance">
<leader>00000nam 22 a 4500</leader>
<controlfield tag="001">000500000</controlfield>
<controlfield tag="005">20050826220257.0</controlfield>
<controlfield tag="008">000710s1998 xx r 000 0 dut
d</controlfield>
<datafield ind1=" " ind2=" " tag="040">
<subfield code="a">Univ</subfield>
</datafield>
<datafield ind1="1" ind2=" " tag="100">
<subfield code="a">van Wetten, J. W.</subfield>
</datafield>
<datafield ind1="1" ind2="3" tag="245">
<subfield code="a">De positie van vrouwen in de asielprocedure
/</subfield>
<subfield code="c">J.W. van Wetten, N. Dijkhof, F.
Heide.</subfield>
</datafield>
</record>
The idea is to create Lucene indexes on specific MARC fields and store the
complete MARC record in Lucene 'as is'. In the presentation layer of my
application I would then have this complete MARC record at hand, and as such
have full flexibility on which MARC fields to display. So I want to create the
following record through XSLT and feed this to SOLR.
<doc>
<field name="title">De positie van vrouwen in de asielprocedure</field>
<field name="author">van Wetten, J. W.</field>
...
<field name="originalRecord">
<record
ns0:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
xmlns="http://www.loc.gov/MARC21/slim"
xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance">
<leader>00000nam 22 a 4500</leader>
<controlfield tag="001">000500000</controlfield>
<controlfield tag="005">20050826220257.0</controlfield>
<controlfield tag="008">000710s1998 xx r 000 0 dut
d</controlfield>
<datafield ind1=" " ind2=" " tag="040">
<subfield code="a">UGent</subfield>
</datafield>
<datafield ind1="1" ind2=" " tag="100">
<subfield code="a">van Wetten, J. W.</subfield>
</datafield>
<datafield ind1="1" ind2="3" tag="245">
<subfield code="a">De positie van vrouwen in de asielprocedure
/</subfield>
<subfield code="c">J.W. van Wetten, N. Dijkhof, F.
Heide.</subfield>
</datafield>
</record>
</field>
</doc>
I have the following in my schema.xml:
<field name="author" type="text" indexed="true" stored="true"
termVectors="true"/>
<field name="title" type="text" indexed="true" stored="true"
termVectors="true"/>
<field name="originalRecord" type="text" indexed="false" stored="true"/>
SOLR has of course a problem with the XML in the 'originalRecord' field.
Is there a solution to this? Has anyone done this before?
Thanks a lot.
Benoit.
=============================
PAUWELS Benoit
Université Libre de Bruxelles - Libraries
Head of Automation
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS
Belgium
Tel: + 32 2 650 23 91
Fax: + 32 2 650 23 91
=============================